Key Takeaways: Nonparametric Methods
One-Sentence Summary
Nonparametric methods — including the sign test, Wilcoxon rank-sum (Mann-Whitney U), Wilcoxon signed-rank, and Kruskal-Wallis tests — replace raw data values with ranks to compare groups without requiring normality, making them ideal for ordinal data, small samples, and datasets with heavy outliers, at a cost of only about 5% efficiency when normality actually holds.
Core Concepts at a Glance
| Concept |
Definition |
Why It Matters |
| Nonparametric test |
A hypothesis test that does not assume a specific probability distribution (e.g., normality) for the population |
Broadens your toolkit to handle ordinal data, small non-normal samples, and outlier-heavy data |
| Rank-based methods |
Tests that convert raw data values to ranks (1st, 2nd, 3rd...) before analysis |
Ranks are unaffected by outliers and don't require equal intervals — the key insight behind most nonparametric tests |
| Distribution-free |
A synonym for nonparametric: the test works regardless of the population distribution's shape |
Removes the most common roadblock to valid inference — the normality assumption |
| Power tradeoff |
Nonparametric tests lose ~5% efficiency vs. parametric tests when normality holds, but can be more powerful when it doesn't |
The "insurance premium" for robustness is remarkably small — and sometimes you get more power, not less |
When to Use Each Test
| Scenario |
Parametric Test |
Nonparametric Alternative |
Use Nonparametric When... |
| Two independent groups |
Two-sample t-test (Ch.16) |
Wilcoxon rank-sum / Mann-Whitney U |
Small $n$ + non-normal; ordinal data; heavy outliers |
| Paired / matched data |
Paired t-test (Ch.16) |
Wilcoxon signed-rank test |
Small $n$ + non-normal differences; ordinal data |
| Paired data (simplest) |
Paired t-test (Ch.16) |
Sign test |
Very small $n$; can only determine direction |
| Three or more groups |
One-way ANOVA (Ch.20) |
Kruskal-Wallis test |
Small $n$ per group + non-normal; ordinal data |
| Test |
What It Uses |
What It Ignores |
| Sign test |
Direction of differences (+/-) |
Magnitude of differences |
| Wilcoxon signed-rank |
Direction AND rank of differences |
Exact size of differences |
| Mann-Whitney U |
Relative ordering (ranks) of all observations |
Exact values |
| Kruskal-Wallis |
Mean ranks across $k$ groups |
Exact values |
| t-test / ANOVA |
Exact values (means, variances) |
Nothing — uses all information |
More information used = more power (when assumptions hold). Less information used = more robustness (when assumptions fail).
The Ranking Procedure
- Combine all observations from all groups into a single list
- Sort from smallest to largest
- Assign ranks 1, 2, 3, ... from smallest to largest
- Handle ties with average ranks (midranks)
- Return each rank to its original group
Quick check: Sum of all ranks = $N(N+1)/2$
Test Procedures
Sign Test (Paired Data)
- Compute differences $d_i$
- Drop zeros
- Count positives ($n^+$) and negatives ($n^-$)
- Under $H_0$: $n^+ \sim \text{Binomial}(n, 0.5)$
- p-value from binomial distribution
Mann-Whitney U (Two Independent Groups)
- Combine and rank all $N = n_1 + n_2$ observations
- Compute rank sums: $W_1$, $W_2$
- Compute: $U_1 = W_1 - n_1(n_1+1)/2$, $U_2 = W_2 - n_2(n_2+1)/2$
- Check: $U_1 + U_2 = n_1 \times n_2$
- p-value from Mann-Whitney distribution or normal approximation
Wilcoxon Signed-Rank (Paired Data)
- Compute differences $d_i$
- Drop zeros
- Rank absolute differences $|d_i|$
- Apply signs to ranks
- Compute $W^+ = \sum$ positive ranks, $W^- = \sum$ negative ranks
- Check: $W^+ + W^- = n(n+1)/2$
- Test statistic: $W = \min(W^+, W^-)$
Kruskal-Wallis (Three or More Groups)
- Combine and rank all $N$ observations
- Compute mean rank $\bar{R}_i$ for each group
- Test statistic: $H = \frac{12}{N(N+1)} \sum n_i (\bar{R}_i - \bar{R})^2$
- Under $H_0$: $H \sim \chi^2(k-1)$
- If significant: pairwise Mann-Whitney U with Bonferroni correction
Key Python Code
from scipy import stats
# Sign test (via binomial test)
p_value = stats.binomtest(n_positive, n_total, 0.5,
alternative='two-sided').pvalue
# Mann-Whitney U test
stat, p_value = stats.mannwhitneyu(group1, group2,
alternative='two-sided')
# Wilcoxon signed-rank test
stat, p_value = stats.wilcoxon(x, y,
alternative='two-sided')
# Kruskal-Wallis test
H_stat, p_value = stats.kruskal(group1, group2, group3)
Excel: Limited Support
Excel's Data Analysis ToolPak does not include nonparametric tests. Options:
- Manual calculation using RANK.AVG()
- Free add-in: Real Statistics Resource Pack
- Recommended: Use Python for nonparametric analysis
The Power Tradeoff
| Condition |
Which Test Wins? |
| Data are normal |
Parametric — but only by ~5% |
| Data are heavy-tailed or skewed |
Nonparametric — sometimes dramatically |
| Data are ordinal |
Nonparametric — parametric is technically inappropriate |
| Large samples ($n > 30$) |
Parametric — CLT makes normality less critical |
| Small samples ($n < 15$) |
Nonparametric — can't assess or rely on normality |
Asymptotic Relative Efficiency (normal data): ~0.955 for all three rank-based tests vs. their parametric counterparts. That's a 4.5% efficiency loss — roughly equivalent to needing one extra observation per group of 20.
Common Mistakes
| Mistake |
Correction |
| "Nonparametric = assumption-free" |
Independence is still required. Similar distribution shapes are needed for median interpretation. |
| "Always use nonparametric to be safe" |
When assumptions hold, parametric tests are more powerful. Match the method to your data. |
| "Mann-Whitney tests medians" |
It tests stochastic dominance — whether one group's values tend to be larger. It compares medians only when distributions have similar shapes. |
| Running pairwise tests after non-significant Kruskal-Wallis |
Same rule as ANOVA: no post-hoc tests after a non-significant omnibus test. |
| Choosing the test that gives the smaller p-value |
Choose based on data type and assumptions, not results. |
Decision Flowchart Summary
Is your data ordinal (1-5 scale, rankings)?
→ YES: Use nonparametric
→ NO: Continue
Is n < 15 per group AND data clearly non-normal?
→ YES: Use nonparametric
→ NO: Continue
Are there heavy outliers that distort means?
→ YES: Use nonparametric
→ NO: Parametric tests are likely fine
Still unsure?
→ Run BOTH. If they agree, report the one matching
your data type. If they disagree, investigate why.
Connections
| Connection |
Details |
| Ch.10 (Normality assessment) |
QQ-plots and Shapiro-Wilk test help you decide whether normality holds — and therefore whether to go nonparametric |
| Ch.15 (One-sample t-test) |
The Wilcoxon signed-rank test on deviations from a hypothesized median is the nonparametric analogue |
| Ch.16 (Two-sample t-test) |
Mann-Whitney U is the direct nonparametric alternative; paired t-test → Wilcoxon signed-rank |
| Ch.18 (Bootstrap) |
Bootstrap is another assumption-light approach; nonparametric tests use ranks, bootstrap uses resampling |
| Ch.20 (ANOVA) |
Kruskal-Wallis is the nonparametric ANOVA; both compare groups, but K-W uses ranks |
| Ch.22 (Regression) |
Spearman's rank correlation is the nonparametric alternative to Pearson's $r$ — same ranking logic |