Quiz: Analysis of Variance (ANOVA)

Q: With 5 groups, how many pairwise t-tests would be needed to compare every pair? (a) 5 (b) 10 (c) 15 (d) 20

(b) 10. The number of pairwise comparisons is . With 10 tests at , the probability of at least one false positive is — a 40% chance of declaring a "significant" difference when none exists.

Q: In ANOVA, the F-statistic equals: (a) (b) (c) (d)

(b) . The F-statistic is the ratio of the mean square between groups to the mean square within groups. We use mean squares (not raw sums of squares) because we need to adjust for degrees of freedom — the between and within components have different numbers of "free" pieces of information.

Q: If the null hypothesis is true (all group means are equal), the expected value of the F-statistic is approximately: (a) 0 (b) 1 (c) (d) It depends on the sample size

(b) 1. When is true, both and estimate the same population variance . Their ratio () should be approximately 1, give or take random sampling variability. Values much larger than 1 suggest the group means are not all equal.

Q: The degrees of freedom for in a one-way ANOVA with groups are: (a) (b) (c) (d)

(c) . There are group means, but they're constrained by the grand mean, so only of them are free to vary. The denominator degrees of freedom () use : there are observations minus estimated group means. Together: .

Q: A one-way ANOVA with 4 groups and 10 observations per group (N = 40) produces and . What is the F-statistic? (a) 0.33 (b) 3.00 (c) 3.60 (d) 4.00

(d) 4.00. The between-group mean square is 4 times the within-group mean square, indicating that the group means vary more than you'd expect from within-group noise alone.

Q: A researcher runs a one-way ANOVA and gets , . What can she conclude? (a) All four group means are significantly different from each other (b) At least one group mean differs significantly from the others (c) The largest group mean is significantly different from the smallest (d) The first group is significantly different from the last group

(b) At least one group mean differs significantly from the others. The ANOVA is an omnibus test — it tells you that the group means are not all equal, but it doesn't tell you which groups differ. To determine specific pairwise differences, you need a post-hoc test like Tukey's HSD.

Q: Eta-squared () in ANOVA measures: (a) The probability that the null hypothesis is true (b) The proportion of total variability explained by group membership (c) The average difference between group means (d) The number of groups that are significantly different

(b) The proportion of total variability explained by group membership. ranges from 0 to 1. An of 0.14 means that 14% of the total variability in the outcome variable is explained by which group an observation belongs to. Cohen's benchmarks: 0.01 = small, 0.06 = medium, 0.14 = large.

Contributors

Quiz: Analysis of Variance (ANOVA)

Test your understanding of the multiple comparisons problem, ANOVA mechanics, the F-statistic, assumptions, post-hoc tests, and effect sizes. Try to answer each question before revealing the answer.

1. The main reason we use ANOVA instead of multiple t-tests when comparing three or more groups is:

(a) ANOVA is faster to compute (b) Multiple t-tests inflate the family-wise Type I error rate (c) t-tests can only compare two means at a time (d) ANOVA doesn't require any assumptions

Answer

**(b) Multiple t-tests inflate the family-wise Type I error rate.** While (c) is technically true — each t-test compares two means — the *reason* this matters is (b). With $k$ groups and $\binom{k}{2}$ pairwise tests, the probability of at least one false positive grows far beyond $\alpha = 0.05$. ANOVA tests all groups simultaneously with a single test, keeping the Type I error rate at exactly $\alpha$.

2. With 5 groups, how many pairwise t-tests would be needed to compare every pair?

(a) 5 (b) 10 (c) 15 (d) 20

Answer

**(b) 10.** The number of pairwise comparisons is $\binom{5}{2} = \frac{5 \times 4}{2} = 10$. With 10 tests at $\alpha = 0.05$, the probability of at least one false positive is $1 - 0.95^{10} = 0.401$ — a 40% chance of declaring a "significant" difference when none exists.

3. In ANOVA, the F-statistic equals:

(a) $SS_{\text{Between}} / SS_{\text{Within}}$ (b) $MS_{\text{Between}} / MS_{\text{Within}}$ (c) $MS_{\text{Within}} / MS_{\text{Between}}$ (d) $SS_{\text{Total}} / (N - 1)$

Answer

**(b) $MS_{\text{Between}} / MS_{\text{Within}}$.** The F-statistic is the ratio of the mean square between groups to the mean square within groups. We use mean squares (not raw sums of squares) because we need to adjust for degrees of freedom — the between and within components have different numbers of "free" pieces of information.

4. If the null hypothesis is true (all group means are equal), the expected value of the F-statistic is approximately:

(a) 0 (b) 1 (c) $k - 1$ (d) It depends on the sample size

Answer

**(b) 1.** When $H_0$ is true, both $MS_B$ and $MS_W$ estimate the same population variance $\sigma^2$. Their ratio ($F = MS_B / MS_W$) should be approximately 1, give or take random sampling variability. Values much larger than 1 suggest the group means are not all equal.

5. Which of the following is the correct decomposition of variability in ANOVA?

(a) $SS_{\text{Between}} = SS_{\text{Total}} + SS_{\text{Within}}$ (b) $SS_{\text{Total}} = SS_{\text{Between}} \times SS_{\text{Within}}$ (c) $SS_{\text{Total}} = SS_{\text{Between}} + SS_{\text{Within}}$ (d) $SS_{\text{Within}} = SS_{\text{Total}} - SS_{\text{Between}} + SS_{\text{Error}}$

Answer

**(c) $SS_{\text{Total}} = SS_{\text{Between}} + SS_{\text{Within}}$.** This is the fundamental decomposition of ANOVA: total variability = between-group variability + within-group variability. It's an exact mathematical identity, not an approximation. Every unit of variability is accounted for — some explained by group membership, the rest by individual noise within groups.

6. The degrees of freedom for $MS_{\text{Between}}$ in a one-way ANOVA with $k$ groups are:

(a) $N - 1$ (b) $N - k$ (c) $k - 1$ (d) $k$

Answer

**(c) $k - 1$.** There are $k$ group means, but they're constrained by the grand mean, so only $k - 1$ of them are free to vary. The denominator degrees of freedom ($MS_W$) use $N - k$: there are $N$ observations minus $k$ estimated group means. Together: $(k - 1) + (N - k) = N - 1 = df_{\text{Total}}$.

7. A one-way ANOVA with 4 groups and 10 observations per group (N = 40) produces $SS_B = 300$ and $SS_W = 900$. What is the F-statistic?

(a) 0.33 (b) 3.00 (c) 3.60 (d) 4.00

Answer

**(d) 4.00.** $MS_B = SS_B / (k-1) = 300 / 3 = 100$ $MS_W = SS_W / (N-k) = 900 / 36 = 25$ $F = MS_B / MS_W = 100 / 25 = 4.00$ The between-group mean square is 4 times the within-group mean square, indicating that the group means vary more than you'd expect from within-group noise alone.

8. Which of the following is NOT an assumption of one-way ANOVA?

(a) Observations are independent within and across groups (b) Data within each group are approximately normally distributed (c) All groups have the same sample size (d) Population variances are approximately equal across groups

Answer

**(c) All groups have the same sample size.** ANOVA does not require equal group sizes (balanced design). It works with unequal sizes, though balanced designs make the test more robust to violations of the equal-variance assumption and are more powerful. The three actual assumptions are independence (a), normality (b), and equal variances (d).

9. Levene's test is used to check which ANOVA assumption?

(a) Independence (b) Normality (c) Equal variances (homogeneity of variance) (d) Random sampling

Answer

**(c) Equal variances (homogeneity of variance).** Levene's test has $H_0$: all group variances are equal. A significant result ($p < 0.05$) suggests the equal-variance assumption is violated. If Levene's test is significant, consider using Welch's ANOVA (which doesn't assume equal variances) or a nonparametric alternative.

10. A researcher runs a one-way ANOVA and gets $F(3, 56) = 5.41$, $p = 0.002$. What can she conclude?

(a) All four group means are significantly different from each other (b) At least one group mean differs significantly from the others (c) The largest group mean is significantly different from the smallest (d) The first group is significantly different from the last group

Answer

**(b) At least one group mean differs significantly from the others.** The ANOVA is an omnibus test — it tells you that the group means are not all equal, but it doesn't tell you *which* groups differ. To determine specific pairwise differences, you need a post-hoc test like Tukey's HSD.

11. After a significant ANOVA, the appropriate next step is usually:

(a) Run more ANOVAs with different subsets of groups (b) Conduct post-hoc pairwise comparisons (e.g., Tukey's HSD) (c) Conclude that the group with the highest mean is the "best" (d) Increase the sample size and retest

Answer

**(b) Conduct post-hoc pairwise comparisons (e.g., Tukey's HSD).** Post-hoc tests are designed to identify which specific pairs of groups differ while controlling the family-wise error rate. Running multiple individual t-tests without correction would inflate the false positive rate — the very problem ANOVA was designed to avoid.

12. Why should you NOT run post-hoc tests if the ANOVA result is not significant?

(a) Post-hoc tests are too computationally expensive for non-significant results (b) If ANOVA doesn't find overall differences, hunting for specific pairwise differences inflates false positives (c) Post-hoc tests require a significant F-statistic as input (d) Non-significant ANOVA means all group means are exactly equal

Answer

**(b) If ANOVA doesn't find overall differences, hunting for specific pairwise differences inflates false positives.** The two-stage approach — ANOVA first, then post-hoc — is designed to protect against false positives. The ANOVA serves as a "gatekeeper." If the omnibus test doesn't detect any differences, conducting pairwise tests amounts to fishing for significance, which undermines the family-wise error rate protection.

13. Tukey's HSD is preferred over the Bonferroni correction for ANOVA post-hoc tests because:

(a) Tukey's HSD never produces false positives (b) Tukey's HSD is designed specifically for pairwise comparisons and is generally less conservative (c) Bonferroni can only handle two groups (d) Tukey's HSD doesn't require a significant ANOVA result

Answer

**(b) Tukey's HSD is designed specifically for pairwise comparisons and is generally less conservative.** Bonferroni divides $\alpha$ by the number of tests, which becomes very strict as the number of comparisons grows. Tukey's HSD uses the studentized range distribution, which accounts for the joint distribution of all pairwise differences and is therefore less conservative (more powerful) while still controlling the family-wise error rate at $\alpha$.

14. Eta-squared ($\eta^2$) in ANOVA measures:

(a) The probability that the null hypothesis is true (b) The proportion of total variability explained by group membership (c) The average difference between group means (d) The number of groups that are significantly different

Answer

**(b) The proportion of total variability explained by group membership.** $\eta^2 = SS_B / SS_T$ ranges from 0 to 1. An $\eta^2$ of 0.14 means that 14% of the total variability in the outcome variable is explained by which group an observation belongs to. Cohen's benchmarks: 0.01 = small, 0.06 = medium, 0.14 = large.

15. An ANOVA yields $\eta^2 = 0.03$, $F(2, 297) = 4.59$, $p = 0.011$. The best interpretation is:

(a) There is a large, important difference among the groups (b) There is a statistically significant difference, but the effect size is small — group membership explains only 3% of the variability (c) The ANOVA is invalid because the effect size is too small to be significant (d) The researcher should increase the sample size and retest

Answer

**(b) There is a statistically significant difference, but the effect size is small — group membership explains only 3% of the variability.** This is the statistical vs. practical significance distinction from Chapter 17. With a large sample ($N = 300$), even small effects can reach statistical significance. The p-value says the differences are unlikely due to chance alone; the effect size says they account for very little of the overall variability. Whether 3% is "important" depends on context.

16. The F-distribution is:

(a) Symmetric and bell-shaped, like the normal distribution (b) Right-skewed, starting at 0, with a long right tail (c) Left-skewed, ending at 1 (d) Bimodal, with peaks at 0 and 1

Answer

**(b) Right-skewed, starting at 0, with a long right tail.** The F-statistic is a ratio of two positive quantities (mean squares), so it can never be negative. When $H_0$ is true, most F-values cluster near 1, but the distribution has a long right tail — occasionally producing large values by chance. We always test in the right tail: large F-values provide evidence against $H_0$.

17. A one-way ANOVA with $k = 3$ groups and $n = 10$ per group finds $F(2, 27) = 0.85$, $p = 0.439$. The researcher concludes: "The three groups are identical." What's wrong?

(a) Nothing — this is a correct interpretation (b) "Fail to reject $H_0$" doesn't mean the groups are identical; it means the data don't provide enough evidence of a difference (c) The F-statistic should be larger than 1 for a valid test (d) The degrees of freedom are wrong

Answer

**(b) "Fail to reject $H_0$" doesn't mean the groups are identical; it means the data don't provide enough evidence of a difference.** This is the same issue we discussed in Chapters 13 and 16: "fail to reject" is not the same as "accept $H_0$." The groups *might* be different, but with only $n = 10$ per group, the study may lack the power to detect the difference. An effect size calculation and power analysis would help clarify.

18. When ANOVA group sizes are equal ("balanced design"), the test is more robust to:

(a) Violations of normality only (b) Violations of equal variance only (c) Violations of both normality and equal variance (d) Violations of independence

Answer

**(c) Violations of both normality and equal variance.** Balanced designs provide the strongest protection against assumption violations. With equal group sizes, ANOVA maintains its Type I error rate even with moderate non-normality and unequal variances. When group sizes are unequal *and* variances are unequal, the test can become either too liberal or too conservative, depending on whether larger groups have larger or smaller variances. Independence violations are never fixable by balanced design.

19. Which of the following is the correct way to report a one-way ANOVA result?

(a) $F = 4.56$, significant (b) $F(3, 36) = 4.56$, $p = .008$, $\eta^2 = .28$ (c) ANOVA was significant at $p < .05$ (d) $F(36, 3) = 4.56$, $p = .008$

Answer

**(b) $F(3, 36) = 4.56$, $p = .008$, $\eta^2 = .28$.** Proper reporting includes: (1) the F-statistic with *both* degrees of freedom in parentheses (numerator, denominator), (2) the exact p-value (or "$p < .001$" for very small values), and (3) an effect size measure. Options (a) and (c) lack critical information. Option (d) has the degrees of freedom reversed — the between-group df always comes first.

20. The decomposition $SS_T = SS_B + SS_W$ in ANOVA is fundamentally the same idea as _____ in regression (Chapter 22 preview):

(a) The slope of the regression line (b) The decomposition of total variability into explained ($SS_{\text{Regression}}$) and unexplained ($SS_{\text{Residual}}$) components (c) The correlation coefficient (d) The prediction equation

Answer

**(b) The decomposition of total variability into explained ($SS_{\text{Regression}}$) and unexplained ($SS_{\text{Residual}}$) components.** This is the threshold concept of this chapter — and it extends far beyond ANOVA. In regression (Chapters 22-23), $SS_T = SS_{\text{Regression}} + SS_{\text{Residual}}$, and $R^2 = SS_{\text{Regression}} / SS_T$ — the direct analogue of $\eta^2 = SS_B / SS_T$. The idea that total variability can be decomposed into "explained" and "unexplained" components is the foundation of virtually all statistical modeling.