Key Takeaways: Chi-Square Tests: Categorical Data Analysis

One-Sentence Summary

Chi-square tests compare observed categorical frequencies to expected frequencies — either testing whether a single variable follows a specified distribution (goodness-of-fit) or whether two categorical variables are related (test of independence) — using the chi-square statistic $\chi^2 = \sum (O - E)^2 / E$ with effect sizes measured by Cramer's V and specific departures identified by standardized residuals.

Core Concepts at a Glance

Concept Definition Why It Matters
Chi-square statistic $\chi^2 = \sum (O_i - E_i)^2 / E_i$ — total standardized discrepancy between observed and expected counts Measures how far the data depart from the null hypothesis across all categories simultaneously
Goodness-of-fit test Tests whether a single categorical variable matches a specified distribution Determines if a distribution of outcomes deviates from expectations (e.g., disease cases vs. population shares)
Test of independence Tests whether two categorical variables are associated Determines if knowing one categorical variable provides information about another (e.g., race and bail decisions)
Cramer's V $V = \sqrt{\chi^2 / (n \cdot (k-1))}$ — effect size scaled from 0 to 1 Measures strength of association independently of sample size; answers "how strong?" rather than "is there?"

The Chi-Square Goodness-of-Fit Test

Step by Step

  1. State hypotheses: - $H_0$: The variable follows the specified distribution - $H_a$: The variable does not follow the specified distribution
  2. Calculate expected frequencies: $E_i = n \times p_i$ where $p_i$ is the hypothesized proportion for category $i$
  3. Check conditions: All expected counts $\geq 5$
  4. Compute the test statistic: $\chi^2 = \sum (O_i - E_i)^2 / E_i$
  5. Find the p-value: Use the chi-square distribution with $df = k - 1$
  6. Decide and interpret: Reject or fail to reject $H_0$; state conclusion in context

Key Python Code

from scipy import stats

observed = [47, 52, 68, 33]
expected_proportions = [0.22, 0.30, 0.28, 0.20]
n = sum(observed)

chi2, p = stats.chisquare(observed,
                          f_exp=[p * n for p in expected_proportions])
print(f"Chi-square: {chi2:.3f}, p-value: {p:.4f}")

Excel

=CHISQ.TEST(observed_range, expected_range) returns the p-value directly.

The Chi-Square Test of Independence

Step by Step

  1. State hypotheses: - $H_0$: The two variables are independent - $H_a$: The two variables are not independent
  2. Calculate expected frequencies: $E_{ij} = (\text{Row total} \times \text{Column total}) / \text{Grand total}$
  3. Check conditions: All expected counts $\geq 5$
  4. Compute the test statistic: $\chi^2 = \sum (O_{ij} - E_{ij})^2 / E_{ij}$ (sum over all cells)
  5. Find the p-value: Use the chi-square distribution with $df = (r-1)(c-1)$
  6. Compute Cramer's V: $V = \sqrt{\chi^2 / (n \cdot (k-1))}$ where $k = \min(r, c)$
  7. Compute standardized residuals: $(O - E) / \sqrt{E}$ to identify WHERE the association is
  8. Decide and interpret: Report significance, effect size, and specific patterns

Key Python Code

from scipy import stats
import numpy as np

observed = np.array([
    [142, 58],
    [108, 92],
    [87, 63],
    [43, 7]
])

chi2, p, dof, expected = stats.chi2_contingency(observed)

# Cramer's V
n = observed.sum()
k = min(observed.shape) - 1
V = np.sqrt(chi2 / (n * k))

# Standardized residuals
residuals = (observed - expected) / np.sqrt(expected)

Expected Frequency Formulas

Test Formula What It Means
Goodness-of-fit $E_i = n \times p_i$ Total count times hypothesized proportion for category $i$
Test of independence $E_{ij} = \frac{R_i \times C_j}{n}$ Row total times column total divided by grand total

Degrees of Freedom

Test Formula Example
Goodness-of-fit $df = k - 1$ 5 categories $\rightarrow$ $df = 4$
Test of independence $df = (r-1)(c-1)$ $4 \times 2$ table $\rightarrow$ $df = 3$

Effect Size: Cramer's V

$$V = \sqrt{\frac{\chi^2}{n \cdot (k-1)}} \quad \text{where } k = \min(r, c)$$

$V$ Value Interpretation
0.10 Small association
0.30 Medium association
0.50 Large association

Why not just use $\chi^2$? The chi-square statistic is proportional to sample size. Double $n$ (keeping proportions fixed) and $\chi^2$ doubles, but $V$ stays the same. Cramer's V separates "how strong is the association?" from "how much data do we have?"

Standardized Residuals

$$\text{Standardized residual} = \frac{O_{ij} - E_{ij}}{\sqrt{E_{ij}}}$$

  • Values beyond $\pm 2$ indicate notable departures from independence
  • Positive residuals: more observations than expected in that cell
  • Negative residuals: fewer observations than expected in that cell
  • These identify WHERE the association lives — the chi-square statistic only tells you THAT it exists

Conditions for Chi-Square Tests

Condition What It Means What to Do If Violated
Random sampling or assignment Data are representative Results may not generalize
Independent observations Each case contributes to one cell only Redesign data collection
Expected counts $\geq 5$ Chi-square approximation is reliable Combine categories, Fisher's exact test, or simulation

Common Misconceptions

Misconception Reality
"Chi-square tells me where the association is" It only tells you that one exists; use standardized residuals for location
"The condition is about observed counts $\geq 5$" No — the condition is about expected counts $\geq 5$; observed counts of 0 are fine
"A significant chi-square proves causation" Chi-square tests associations; causation requires experimental design
"Chi-square works for numerical data" Chi-square is for categorical data (counts/frequencies); use $t$-tests or ANOVA for numerical data
"A larger $\chi^2$ always means a stronger association" Not if sample sizes differ; use Cramer's V for comparable effect sizes
"Chi-square tests can be two-tailed" Chi-square tests are always right-tailed (large values = evidence against $H_0$)

Connection to Other Tests

Situation What You Learned Before What You Learned Now
One categorical variable, 2 categories One-proportion $z$-test (Ch.14) Goodness-of-fit gives the same result ($\chi^2 = z^2$)
Two groups, binary outcome Two-proportion $z$-test (Ch.16) Test of independence on $2 \times 2$ table gives the same result
Multiple categories or groups No single test available Chi-square handles any number of categories and groups
Continuous data, 2 groups Two-sample $t$-test (Ch.16) Still use $t$-test — chi-square is for categorical data
Continuous data, 3+ groups Coming in Ch.20 ANOVA (same logic: observed vs. expected variation)

How This Chapter Connects

This Chapter Builds On Leads To
Chi-square statistic Hypothesis testing framework (Ch.13), contingency tables (Ch.8) ANOVA's $F$-statistic uses similar "observed vs. expected" logic (Ch.20)
Expected frequencies Probability and independence (Ch.8, Ch.9) ANOVA expected values under $H_0$ (Ch.20)
Cramer's V Cohen's $d$ and effect sizes (Ch.17) Correlation coefficient as effect size (Ch.22)
Residual analysis Identifying patterns in data (Ch.5, Ch.6) Residual analysis in regression (Ch.22-23)

The Key Themes

Theme 2: Categorical data often describes people. Every contingency table in this chapter classified people — by region, race, subscription tier, genre preference. The categories we choose shape what we can discover, and the categories institutions choose can reinforce or challenge existing power structures. The chi-square test reveals disparities; human judgment determines what to do about them.

Theme 1: Statistics as a superpower (via Theme 2). The chi-square test gave Maya the evidence she needed to demonstrate that disease burden falls unevenly across regions ($p < 0.001$). It gave James the data to show that bail decisions are associated with race ($p < 0.001$, $V = 0.206$). Without formal testing, these patterns might be dismissed as anecdotal. With chi-square tests, they become quantified, documented, and actionable.

The One Thing to Remember

If you forget everything else from this chapter, remember this:

The chi-square test compares observed categorical frequencies to expected frequencies. The goodness-of-fit test asks whether one categorical variable matches a specified distribution. The test of independence asks whether two categorical variables are related. Both use the formula $\chi^2 = \sum (O - E)^2 / E$ — a sum of standardized squared deviations that measures the total discrepancy between reality and expectation. Large values of $\chi^2$ indicate that the data don't fit the null hypothesis. But the chi-square statistic alone doesn't tell you WHERE the association is (use standardized residuals) or HOW STRONG it is (use Cramer's V). Always check that all expected frequencies are at least 5, and remember that chi-square tests establish association, not causation.

Key Terms

Term Definition
Chi-square test A hypothesis test that compares observed categorical frequencies to expected frequencies using $\chi^2 = \sum (O - E)^2 / E$; used for both goodness-of-fit and independence testing
Goodness-of-fit test A chi-square test that determines whether a single categorical variable follows a specified distribution; $df = k - 1$
Test of independence A chi-square test that determines whether two categorical variables are associated; applied to a contingency table; $df = (r-1)(c-1)$
Observed frequency The actual count of observations in a category or cell; what the data show
Expected frequency The count predicted by the null hypothesis; for independence: $E = (R \times C) / n$; for goodness-of-fit: $E = n \times p_i$
Chi-square distribution A right-skewed probability distribution used as the reference distribution for chi-square tests; shape determined by degrees of freedom; always non-negative
Cramer's V An effect size measure for chi-square tests, scaled from 0 (no association) to 1 (perfect association): $V = \sqrt{\chi^2 / (n \cdot (k-1))}$; analogous to Cohen's $d$ for means
Contingency table (revisited) A two-way table showing frequencies for combinations of two categorical variables; first introduced in Ch.8 for descriptive analysis, now used as the data structure for chi-square tests of independence