Further Reading: Analysis of Variance (ANOVA)
Books
For Deeper Understanding
Douglas C. Montgomery, Design and Analysis of Experiments, 10th edition (2019) The gold standard textbook on experimental design and ANOVA. Montgomery covers one-way ANOVA, factorial designs, blocking, nested designs, and repeated measures with exceptional clarity. Chapters 3-4 provide a thorough treatment of one-way ANOVA that extends everything in this chapter, including multiple comparison procedures (Tukey, Dunnett, Scheffé, Fisher's LSD) and their relative merits. If you take a second statistics course, this is likely your textbook.
George W. Cobb, Introduction to Design and Analysis of Experiments (1998) Cobb's textbook takes a unique approach: he builds ANOVA entirely from the decomposition of variability, making the $SS_T = SS_B + SS_W$ identity the central organizing principle. His writing is unusually clear and conceptual — he spends less time on formulas and more time on why ANOVA works. The first three chapters are worth reading even if you never take another course in experimental design.
Jacob Cohen, Statistical Power Analysis for the Behavioral Sciences, 2nd edition (1988) The definitive reference on effect sizes and power analysis. Chapter 8 covers ANOVA specifically, including the conventions for $\eta^2$ and Cohen's $f$ that we used in this chapter. Cohen's discussion of "small, medium, and large" effects — and his warnings about treating these benchmarks too rigidly — is essential reading for any researcher. Previously recommended in Chapter 17 for Cohen's $d$.
Andy Field, Discovering Statistics Using IBM SPSS Statistics, 6th edition (2024) Field's treatment of ANOVA (Chapters 12-14) is simultaneously rigorous and entertaining — he uses examples involving "sexy cats" and the "beer-goggles effect" to teach factorial ANOVA in a way that students actually remember. His coverage of assumption checking and robust alternatives is more thorough than most introductory texts. Available in R and Python editions as well.
For the Conceptually Curious
Charles Wheelan, Naked Statistics: Stripping the Dread from the Data (2013) Wheelan's treatment of ANOVA is brief but insightful. He focuses on the intuition — why comparing variability is the right approach to comparing means — and avoids getting bogged down in calculations. A good complement to this chapter if the formulas felt overwhelming. Previously recommended for Chapters 12, 13, and 18.
David Salsburg, The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century (2001) A beautifully written history of statistics that devotes several chapters to Ronald Fisher and the development of ANOVA. Fisher invented ANOVA while working at the Rothamsted Experimental Station, analyzing agricultural experiments on crop yields. Understanding that ANOVA was born from the practical needs of farming — comparing multiple crop varieties under varying conditions — gives the method a human dimension that formulas alone can't capture.
Articles and Papers
Fisher, R. A. (1921). "On the 'Probable Error' of a Coefficient of Correlation Deduced from a Small Sample." Metron, 1, 3-32. While nominally about correlation, this paper contains one of Fisher's earliest presentations of the analysis of variance idea. It's historically significant as the genesis of the F-distribution and the decomposition of sums of squares.
Fisher, R. A. (1925). Statistical Methods for Research Workers. Edinburgh: Oliver and Boyd. The book that introduced ANOVA to the world. Fisher's original presentation is terse and assumes considerable mathematical sophistication, but reading even the first few chapters gives you a sense of how revolutionary the ideas were. Available free online through Project Gutenberg and various archives.
Levene, H. (1960). "Robust Tests for Equality of Variances." In I. Olkin (Ed.), Contributions to Probability and Statistics: Essays in Honor of Harold Hotelling (pp. 278-292). The original paper introducing Levene's test for homogeneity of variance. The test works by computing an ANOVA on the absolute deviations of each observation from its group mean (or median) — so it's actually an ANOVA on deviations, used to check an ANOVA assumption. Elegant in its recursiveness.
Tukey, J. W. (1949). "Comparing Individual Means in the Analysis of Variance." Biometrics, 5(2), 99-114. The paper introducing the Honestly Significant Difference (HSD) method. Tukey (who also invented the box plot, as you learned in Chapter 6) developed this method specifically to solve the post-hoc comparison problem in ANOVA. His key insight: the distribution of the range of sample means (the studentized range distribution) provides exact critical values for all pairwise comparisons simultaneously.
Welch, B. L. (1951). "On the Comparison of Several Mean Values: An Alternative Approach." Biometrika, 38(3/4), 330-336. Welch's extension of his famous two-sample correction (which you encountered in Chapter 16) to the $k$-sample case. Welch's ANOVA adjusts the degrees of freedom when group variances are unequal, analogous to how Welch's t-test handles unequal variances in two-group comparisons.
Scheffé, H. (1953). "A Method for Judging All Contrasts in the Analysis of Variance." Biometrika, 40(1/2), 87-104. Scheffé's method is more conservative than Tukey's HSD but more flexible — it allows testing not just pairwise comparisons but any contrast (linear combination of means). If you need to compare, say, the average of Groups A and B against Group C, Scheffé's method is the right tool. Beyond the scope of this course, but worth knowing about.
Online Resources
Penn State STAT 500: "Analysis of Variance" https://online.stat.psu.edu/stat500/lesson/10
A free, well-structured tutorial that covers one-way ANOVA with worked examples, assumption checking, and post-hoc tests. Includes interactive applets that let you manipulate between-group and within-group variability to see how the F-statistic changes — an excellent way to build intuition for the threshold concept.
Statquest: "ANOVA, Clearly Explained" https://www.youtube.com/watch?v=oOuu8IBd-yo
Josh Starmer's YouTube channel provides a visual, intuition-first explanation of ANOVA. His treatment of $SS_B$, $SS_W$, and their decomposition uses animations that many students find more intuitive than static formulas. Previously recommended for hypothesis testing (Chapter 13) and regression (preview).
Seeing Theory: ANOVA Visualization https://seeing-theory.brown.edu/
An interactive visualization from Brown University that lets you draw samples from multiple populations and watch the ANOVA table update in real time. Particularly useful for understanding how the F-statistic changes as you increase the between-group differences or the within-group variability.
Connections to Future Chapters
Chapter 21 (Nonparametric Methods): When ANOVA assumptions fail — especially normality with small samples — the Kruskal-Wallis test provides a nonparametric alternative. It replaces raw data values with ranks and tests whether the rank distributions differ across groups. Think of it as the multi-group extension of the Wilcoxon rank-sum test.
Chapters 22-23 (Regression): The decomposition $SS_T = SS_B + SS_W$ reappears as $SS_T = SS_{\text{Regression}} + SS_{\text{Residual}}$ in regression. The $R^2$ in regression is conceptually identical to $\eta^2$ in ANOVA — both measure the proportion of total variability explained by the predictor(s). In fact, running a regression with a single categorical predictor (using indicator/dummy variables) produces exactly the same F-statistic and p-value as the ANOVA.
Chapter 26 (Statistics and AI): Machine learning models are often compared using ANOVA-like methods. When evaluating whether different algorithms produce significantly different prediction accuracy scores, data scientists use ANOVA (or its nonparametric counterpart) to determine whether observed performance differences are real or attributable to random variation in the data.
Historical Note: Fisher, Agriculture, and the Birth of ANOVA
Ronald A. Fisher (1890-1962) developed ANOVA in the 1920s while working at the Rothamsted Experimental Station in Hertfordshire, England — a research institution dedicated to improving agricultural productivity. Fisher's problem was exactly the one motivating this chapter: he needed to compare crop yields across multiple varieties of wheat, planted in fields with varying soil quality.
His insight was that the total variation in crop yields could be decomposed into variation between wheat varieties (the signal he cared about) and variation within varieties due to soil differences, weather, and other uncontrolled factors (the noise). This decomposition allowed him to determine whether observed yield differences were real or could be explained by natural variability.
Fisher's agricultural origins explain some ANOVA terminology that persists to this day. The "treatments" in an experiment, the "blocks" in a randomized block design, and the "plots" in a split-plot design all come from farming. The "F-distribution" is named in Fisher's honor — though Fisher himself called it the "variance ratio" distribution.
Fisher went on to develop the broader theory of experimental design, including randomization, blocking, and factorial experiments. His 1925 book Statistical Methods for Research Workers and his 1935 book The Design of Experiments laid the foundation for how experiments are designed and analyzed across virtually every scientific field.
The irony is that Fisher's most powerful statistical tools — designed to detect real effects amid noise — are also the tools most susceptible to abuse when researchers engage in the kind of multiple testing and p-hacking discussed in Chapter 17. Fisher himself was untroubled by the $p = 0.05$ threshold that later became a rigid orthodoxy; he saw p-values as one piece of evidence among many, not as a yes/no verdict.
Understanding this history enriches your appreciation of ANOVA as more than just a formula — it's a framework for thinking about evidence, variation, and the careful design of studies that can distinguish signal from noise.