Quiz: Chi-Square Tests: Categorical Data Analysis
Test your understanding of chi-square goodness-of-fit tests, tests of independence, expected frequencies, conditions, Cramer's V, and standardized residuals. Try to answer each question before revealing the answer.
1. The chi-square test is designed for:
(a) Comparing two population means (b) Analyzing the relationship between two numerical variables (c) Analyzing categorical data — counts or frequencies across categories (d) Testing whether data follow a normal distribution
Answer
**(c) Analyzing categorical data — counts or frequencies across categories.** The chi-square test works with *counts* of observations falling into categories, not with numerical measurements like means or correlations. This is the fundamental conceptual shift from the $t$-tests and $z$-tests of Chapters 13-18. Options (a) and (b) describe $t$-tests/regression; option (d) describes a normality test like Shapiro-Wilk.2. In the chi-square formula $\chi^2 = \sum (O - E)^2 / E$, we divide by $E$ because:
(a) It makes the statistic easier to compute (b) It ensures the chi-square statistic is always between 0 and 1 (c) It standardizes each term relative to the expected count, so deviations in large and small categories are comparable (d) It corrects for degrees of freedom
Answer
**(c) It standardizes each term relative to the expected count, so deviations in large and small categories are comparable.** A difference of 10 between observed and expected means very different things if you expected 20 (50% off) versus 200 (5% off). Dividing by $E$ puts each category's contribution on a comparable scale. The chi-square statistic is *not* bounded between 0 and 1 (option b), and the division has nothing to do with degrees of freedom (option d).3. A chi-square goodness-of-fit test with 4 categories has how many degrees of freedom?
(a) 4 (b) 3 (c) 2 (d) It depends on the sample size
Answer
**(b) 3** For a goodness-of-fit test, $df = k - 1$, where $k$ is the number of categories. With 4 categories, $df = 4 - 1 = 3$. The intuition: if you know 3 of the 4 counts and the total, the fourth count is determined. Degrees of freedom do not depend on sample size for chi-square tests.4. A test of independence on a $3 \times 5$ contingency table has how many degrees of freedom?
(a) 14 (b) 8 (c) 15 (d) 7
Answer
**(b) 8** For a test of independence, $df = (r-1)(c-1) = (3-1)(5-1) = 2 \times 4 = 8$. This reflects the number of cells that are "free" once the row and column totals are fixed.5. Which of the following is the correct formula for expected frequencies in a test of independence?
(a) $E = n / k$ (total divided by number of categories) (b) $E = (\text{row total} \times \text{column total}) / \text{grand total}$ (c) $E = O \times p_0$ (d) $E = n \times p(1-p)$
Answer
**(b) $E = (\text{row total} \times \text{column total}) / \text{grand total}$** This formula distributes counts across cells in proportion to both row and column marginals. Under independence, knowing the row a case is in should not change its probability of being in any particular column — so the expected frequency is determined by the marginal proportions. Option (a) applies only to a goodness-of-fit test with equal expected proportions. Options (c) and (d) are related to proportion tests.6. The chi-square distribution is:
(a) Symmetric and bell-shaped, like the normal distribution (b) Right-skewed, with the shape depending on degrees of freedom (c) Left-skewed, with values always negative (d) Uniform between 0 and 1
Answer
**(b) Right-skewed, with the shape depending on degrees of freedom.** Since $\chi^2$ is a sum of squared terms, it can never be negative, and it clusters near zero when observed and expected counts are close. The distribution is right-skewed, especially with small degrees of freedom. As $df$ increases, it becomes more symmetric but remains non-negative. This is why chi-square tests are always right-tailed.7. The key condition for chi-square tests to be valid is:
(a) The data must come from a normal distribution (b) The sample size must be at least 30 (c) All expected frequencies must be at least 5 (d) All observed frequencies must be at least 5
Answer
**(c) All expected frequencies must be at least 5.** This is the most commonly tested condition — and the most commonly confused one. The condition applies to *expected* counts, not *observed* counts. An observed count of 0 is perfectly fine (and may be the most interesting finding!) as long as the expected count for that cell is at least 5. Chi-square tests do *not* require normality of the raw data (option a) or a minimum sample size of 30 (option b).8. What should you do if some expected frequencies are less than 5?
(a) Proceed anyway — the test is robust to this violation (b) Combine categories to increase expected frequencies, use Fisher's exact test, or use a simulation approach (c) Switch to a $t$-test instead (d) Add a continuity correction and proceed
Answer
**(b) Combine categories to increase expected frequencies, use Fisher's exact test, or use a simulation approach.** Small expected counts make the chi-square approximation unreliable. Combining categories (e.g., merging "AB-" and "B-" blood types) can fix this, as can Fisher's exact test (for $2 \times 2$ tables) or a permutation/simulation test (Chapter 18's ideas applied to chi-square). The $t$-test (option c) is for numerical data, not categorical. A continuity correction exists (Yates's correction) but is generally not the recommended approach.9. A researcher tests whether a die is fair and gets $\chi^2 = 2.4$ with $df = 5$. The p-value is approximately 0.79. What does this mean?
(a) The die is definitely fair (b) There is strong evidence that the die is unfair (c) The observed outcomes are consistent with a fair die — the small deviations from equal counts could easily occur by chance (d) The test is invalid because the chi-square value is too small
Answer
**(c) The observed outcomes are consistent with a fair die — the small deviations from equal counts could easily occur by chance.** A large p-value (0.79) means the observed data are very typical of what we'd see if the die were fair. We fail to reject $H_0$. But this does *not* prove the die is fair (option a) — it's possible the die has a slight bias that our sample was too small to detect. A small chi-square value is *not* a problem for validity (option d); it simply indicates good fit between observed and expected counts.10. In James's bail decision study, $\chi^2 = 25.48$, $df = 3$, $p < 0.001$. This result tells us:
(a) Race causes differences in bail decisions (b) There is a statistically significant association between race and bail decisions, but the test doesn't identify which groups differ most or establish causation (c) Black defendants are always denied bail (d) The effect size is large
Answer
**(b) There is a statistically significant association between race and bail decisions, but the test doesn't identify which groups differ most or establish causation.** The chi-square test answers one question: is there an association? It says *yes* ($p < 0.001$). But it doesn't tell us *where* the association is (we need standardized residuals for that), doesn't establish causation (the data are observational), and doesn't measure effect size (we need Cramer's V). This is the key "common mistake" from Section 19.8 — the chi-square test tells you *that*, not *where*.11. Cramer's V for a chi-square test is analogous to:
(a) The p-value (b) The degrees of freedom (c) Cohen's $d$ for a $t$-test — it measures effect size independent of sample size (d) The chi-square critical value
Answer
**(c) Cohen's $d$ for a $t$-test — it measures effect size independent of sample size.** Just as Cohen's $d$ measures the standardized difference between means regardless of $n$, Cramer's $V$ measures the strength of association between categorical variables regardless of $n$. Both answer "how big?" rather than "is there an effect?" (which is the p-value's job). Cramer's V ranges from 0 (no association) to 1 (perfect association), with benchmarks of 0.10 (small), 0.30 (medium), and 0.50 (large).12. If you double the sample size (keeping all proportions the same), what happens to $\chi^2$ and Cramer's V?
(a) Both double (b) $\chi^2$ doubles; Cramer's V stays the same (c) $\chi^2$ stays the same; Cramer's V is halved (d) Both stay the same
Answer
**(b) $\chi^2$ doubles; Cramer's V stays the same.** The chi-square statistic is proportional to sample size: if all counts double, each $(O-E)^2/E$ term doubles (since both $O$ and $E$ double, making the numerator $4\times$ larger and the denominator $2\times$ larger). But Cramer's V divides by $n$: $V = \sqrt{\chi^2 / (n \cdot (k-1))}$. Doubling both $\chi^2$ and $n$ leaves $V$ unchanged. This is precisely why we need $V$ — the chi-square statistic alone is not a valid measure of effect size.13. A standardized residual of $+3.2$ for the cell "Rural / High Absenteeism" means:
(a) 3.2% of rural students had high absenteeism (b) Far more rural students had high absenteeism than expected under independence — 3.2 standard deviations above the expected count (c) 3.2 more students than expected had high absenteeism (d) The chi-square statistic for this cell alone is 3.2
Answer
**(b) Far more rural students had high absenteeism than expected under independence — 3.2 standard deviations above the expected count.** A standardized residual of $+3.2$ means the observed count in that cell is 3.2 standard deviations *above* the expected count. Values beyond $\pm 2$ are considered notably different from independence. This cell is a major contributor to the overall chi-square statistic and identifies a specific pattern: rural students are overrepresented in the high-absenteeism category.14. Which Python function performs a chi-square test of independence?
(a) scipy.stats.chi2_contingency()
(b) scipy.stats.chisquare()
(c) scipy.stats.ttest_ind()
(d) scipy.stats.f_oneway()
Answer
**(a) `scipy.stats.chi2_contingency()`** This function takes a 2D array (contingency table) and returns the chi-square statistic, p-value, degrees of freedom, and expected frequencies. Option (b), `chisquare()`, performs the *goodness-of-fit* test (one categorical variable against expected proportions). Option (c) is for independent samples $t$-tests. Option (d) is for one-way ANOVA (Chapter 20).15. In Excel, the function CHISQ.TEST(observed_range, expected_range) returns:
(a) The chi-square statistic (b) The p-value (c) Cramer's V (d) The degrees of freedom
Answer
**(b) The p-value.** Excel's `CHISQ.TEST` function returns the p-value directly, not the chi-square statistic itself. To get the chi-square statistic in Excel, you need to compute $(O-E)^2/E$ for each cell and sum them manually.16. For a $2 \times 2$ contingency table, the chi-square test and the two-proportion $z$-test:
(a) Always give different results (b) Are completely unrelated tests (c) Give the same p-value, with $\chi^2 = z^2$ (the chi-square statistic equals the square of the $z$-statistic) (d) Give the same p-value only if the sample sizes are equal
Answer
**(c) Give the same p-value, with $\chi^2 = z^2$ (the chi-square statistic equals the square of the $z$-statistic).** For $2 \times 2$ tables, the chi-square test of independence is mathematically equivalent to the two-tailed two-proportion $z$-test. The chi-square statistic is the square of the $z$-statistic, and squaring converts the two-tailed normal p-value to the right-tailed chi-square p-value. This means the $z$-test was a special case of the chi-square test all along.17. A company tests whether customer complaints are uniformly distributed across four product lines. With $\chi^2 = 12.8$ and $df = 3$:
(a) Use a chi-square table or Python to determine the p-value (b) The complaints are perfectly uniformly distributed (c) The p-value is approximately 0.005, providing strong evidence against uniform distribution (d) Both (a) and (c)
Answer
**(d) Both (a) and (c).** Using Python: `1 - stats.chi2.cdf(12.8, 3)` gives $p \approx 0.005$. This is well below 0.05, providing strong evidence that complaints are *not* uniformly distributed across the four product lines. The company should investigate which product line(s) have disproportionately many (or few) complaints.18. A goodness-of-fit test asks whether the observed distribution _, while a test of independence asks whether two variables _.
(a) matches an expected distribution; are correlated (b) is normal; are independent (c) matches a specified distribution; are associated (or independent) (d) has equal proportions; have equal means
Answer
**(c) matches a specified distribution; are associated (or independent).** The goodness-of-fit test compares a single categorical variable's observed distribution to a hypothesized distribution. The test of independence asks whether two categorical variables are related. Note that "correlated" (option a) technically applies to numerical variables; for categorical variables we say "associated." Option (d) mixes in mean comparisons, which require $t$-tests or ANOVA.19. Which of the following is NOT a valid use of the chi-square test?
(a) Testing whether the distribution of M&M colors matches the company's stated proportions (b) Testing whether a patient's treatment group (Drug vs. Placebo) is independent of their outcome (Improved vs. Not Improved) (c) Testing whether the mean income differs across three education levels (d) Testing whether the distribution of majors at a university differs from the national distribution
Answer
**(c) Testing whether the mean income differs across three education levels.** This question involves comparing *means* of a numerical variable (income) across groups — that's an ANOVA problem (Chapter 20), not a chi-square problem. The chi-square test handles *counts* of categorical outcomes. Options (a) and (d) are goodness-of-fit problems; option (b) is a test of independence. All work with categorical data.20. Alex's analysis showed that subscription tier and genre preference are associated ($V = 0.188$). Free users disproportionately watched comedy (residual = $+3.55$), while Premium users disproportionately watched documentaries (residual = $+2.50$). Which conclusion is most appropriate?
(a) StreamVibe should remove comedy from the Free tier to encourage upgrades (b) There is a modest but real association between subscription tier and genre preference, suggesting different tiers attract users with different content interests; targeted content strategies could leverage this pattern (c) Premium subscribers watch documentaries because they pay more (d) The association is too small to be useful ($V < 0.2$)