Further Reading: Chi-Square Tests: Categorical Data Analysis

Books

For Deeper Understanding

Alan Agresti, An Introduction to Categorical Data Analysis, 3rd edition (2019) The definitive textbook on categorical data analysis, written by one of the field's most respected scholars. Agresti covers chi-square tests, Fisher's exact test, odds ratios, logistic regression, and log-linear models — essentially the complete toolkit for analyzing categorical data. Chapters 1-3 are accessible at our level and provide excellent additional examples with clear exposition. The later chapters connect to logistic regression (our Chapter 24).

Frederick Mosteller and Robert Rourke, Sturdy Statistics: Nonparametrics and Order Statistics (1973) A classic introduction to distribution-free methods, including chi-square tests. Mosteller and Rourke are gifted explainers who make abstract ideas concrete through physical analogies and careful worked examples. While some notation is dated, the intuition they build is timeless. Particularly strong on the goodness-of-fit test.

David Moore, George McCabe, and Bruce Craig, Introduction to the Practice of Statistics, 10th edition (2021) Chapter 9 provides one of the clearest introductory treatments of chi-square tests available. Moore's emphasis on understanding concepts before formulas aligns well with our approach. The worked examples span genetics, market research, and social science — a good complement to our public health and criminal justice focus.

Larry Wasserman, All of Statistics: A Concise Course in Statistical Inference (2004) Chapter 10 covers chi-square tests with mathematical precision, including the derivation of the chi-square distribution as the sum of squared standard normal random variables. For students curious about why the chi-square formula works (not just that it works), Wasserman's treatment connects the dots between the formula and the underlying probability theory. Requires probability background from Chapters 8-10 of our textbook.

Charles Wheelan, Naked Statistics: Stripping the Dread from the Data (2013) Wheelan's chapter on inference includes an accessible discussion of chi-square tests that strips away the mathematical formalism and focuses on the core idea: comparing what you see to what you'd expect. His examples are drawn from everyday life and are particularly effective for students who find the mathematical notation intimidating. Previously recommended for hypothesis testing (Chapter 13) and confidence intervals (Chapter 12).

For the Applied Researcher

Andy Field, Discovering Statistics Using IBM SPSS Statistics, 5th edition (2018) Chapter 19 (coincidentally) covers chi-square tests with Field's trademark humor and thoroughness. While the software focus is SPSS rather than Python, the conceptual explanations, common-mistake warnings, and real-data examples are universally valuable. Field's discussion of standardized residuals and adjusted standardized residuals is particularly detailed.

Agresti, A., and Franklin, C., Statistics: The Art and Science of Learning from Data, 5th edition (2018) An excellent alternative introductory treatment that places chi-square tests in the broader context of categorical data analysis. Their discussion of the connection between the $z$-test for proportions and the $\chi^2$ test (the $\chi^2 = z^2$ relationship) is particularly clear.

Articles and Papers

Pearson, K. (1900). "On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling." Philosophical Magazine, 50(302), 157-175. The paper that introduced the chi-square test. Karl Pearson's original work is dense and uses unfamiliar notation, but the introduction is worth reading for historical perspective. Pearson was one of the founders of modern statistics, and the chi-square test remains one of his most enduring contributions — over 120 years later, the test is taught in virtually every introductory statistics course in the world.

Fisher, R. A. (1922). "On the interpretation of $\chi^2$ from contingency tables, and the calculation of P." Journal of the Royal Statistical Society, 85(1), 87-94. Fisher corrected Pearson's original degrees of freedom calculation (Pearson used $df = rc - 1$ instead of $df = (r-1)(c-1)$) and established the correct test. The Pearson-Fisher debate about degrees of freedom was one of the most consequential disagreements in the history of statistics. Fisher's paper resolved it — though Pearson never accepted the correction.

Cramér, H. (1946). Mathematical Methods of Statistics. Princeton University Press. The book that introduced what we now call Cramer's V (Cramér himself called it $\phi_c$). Cramér's treatment of categorical data analysis is mathematically rigorous but remarkably clear. For students interested in the theoretical foundations of Cramer's V and its relationship to the chi-square statistic, Chapter 21 of Cramér's book is the original source.

Agresti, A. (2002). "Categorical Data Analysis." In International Encyclopedia of the Social & Behavioral Sciences, pp. 1466-1470. A concise overview of the entire field of categorical data analysis, from chi-square tests through logistic regression and log-linear models. Useful for understanding where chi-square tests fit in the broader landscape of methods for categorical data. Available through many university libraries.

Cochran, W. G. (1954). "Some methods for strengthening the common $\chi^2$ tests." Biometrics, 10(4), 417-451. The classic paper on the expected-frequency-of-5 rule. Cochran examined when the chi-square approximation breaks down and concluded that expected frequencies below 5 can lead to unreliable p-values. His recommendation — that no more than 20% of expected frequencies should be below 5, and none should be below 1 — is more nuanced than the simplified "all $\geq 5$" rule we teach, but the simplified version is a safe conservative guideline.

Online Resources

Interactive Tools

StatKey: Chi-Square Test Module http://www.lock5stat.com/StatKey/ StatKey includes a chi-square test module where you can enter your own contingency table, see the expected frequencies computed automatically, and watch the chi-square statistic calculated cell by cell. You can also compare the chi-square result to a simulation-based result (connecting Chapter 18's ideas to this chapter's formula-based approach). The visual display of the chi-square distribution with the observed test statistic marked is particularly helpful for building intuition about p-values.

Seeing Theory — Frequency Inference Module https://seeing-theory.brown.edu/frequentist-inference/ Brown University's interactive visualization includes modules for chi-square tests where you can manipulate observed frequencies and watch the chi-square statistic change in real time. The visual representation of the chi-square distribution's shape across different degrees of freedom is excellent for understanding why df matters.

Art of Stat: Chi-Square Tests https://artofstat.com/web-apps This suite of web apps includes a clean, intuitive interface for both goodness-of-fit and independence tests. Enter your data, press a button, and see the full analysis including expected frequencies, chi-square statistic, p-value, and a visualization. Ideal for checking your hand calculations or exploring "what if" scenarios.

GeoGebra: Chi-Square Distribution Explorer https://www.geogebra.org/m/RBDqEFBa An interactive tool that lets you visualize how the chi-square distribution changes with degrees of freedom and explore critical values and p-values graphically. Useful for building intuition about the right-skewed shape and its dependence on df.

Video Resources

StatQuest with Josh Starmer: "Chi-Square Tests" (YouTube) Josh Starmer provides clear, energetic walkthroughs of both the goodness-of-fit test and the test of independence. His step-by-step calculation of expected frequencies and the chi-square statistic is particularly well-paced. He also has a separate video on the chi-square distribution itself that's worth watching first.

Khan Academy: "Chi-Square Distribution Introduction" and "Chi-Square Test for Association" (khanacademy.org) Sal Khan's deliberate, step-by-step approach works well for chi-square tests, which involve multiple computational steps. The chi-square distribution video provides useful visual intuition, and the test-of-association walkthrough includes a complete worked example with expected frequency calculations. Watch both in sequence.

3Blue1Brown: "But What Is a Chi-Square Distribution?" (YouTube) Grant Sanderson's visual approach illuminates the mathematical origins of the chi-square distribution — it arises as the sum of squared standard normal random variables. This connection explains why the chi-square formula works (each $(O-E)^2/E$ term approximates a squared standard normal under $H_0$) and provides deep geometric intuition for more advanced students.

jbstatistics: "Chi-Square Tests" (YouTube) Jeremy Balka provides a mathematically precise yet accessible treatment. His comparison of the goodness-of-fit test and the test of independence in a single video helps clarify how the two tests relate. He also covers the connection between the $2 \times 2$ chi-square test and the two-proportion $z$-test.

Crash Course Statistics: "Chi-Square Tests" (YouTube) A fast-paced overview that situates chi-square tests in the broader context of hypothesis testing. The real-world examples (genetics, taste tests, social science) complement the examples in our chapter. Good for a quick review or an alternative explanation.

Technical Resources

SciPy Documentation: scipy.stats.chisquare and scipy.stats.chi2_contingency - chisquare: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chisquare.html - chi2_contingency: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2_contingency.html

The official documentation for both functions used in this chapter. The chi2_contingency page is particularly useful — it explains the correction parameter (Yates's continuity correction for $2 \times 2$ tables), the lambda_ parameter (for alternative test statistics like the G-test), and the return values.

# Quick reference
from scipy import stats
import numpy as np

# Goodness-of-fit
chi2, p = stats.chisquare(observed_counts, f_exp=expected_counts)

# Independence
chi2, p, dof, expected = stats.chi2_contingency(contingency_table)

statsmodels: Association Measures https://www.statsmodels.org/stable/contingency_tables.html The statsmodels library provides additional tools for analyzing contingency tables, including Cramer's V (via cramers_v), the odds ratio, and the risk ratio. For students who want more sophisticated analysis of $2 \times 2$ tables — including confidence intervals for the odds ratio — statsmodels extends what scipy offers.

from scipy.stats.contingency import association
# Cramer's V directly (scipy >= 1.7)
V = association(contingency_table, method='cramer')

pandas.crosstab() for Creating Contingency Tables https://pandas.pydata.org/docs/reference/api/pandas.crosstab.html In practice, you'll often start with a raw DataFrame (one row per observation) rather than a pre-computed contingency table. The pd.crosstab() function creates a contingency table from two columns of a DataFrame, which you can then pass to chi2_contingency().

import pandas as pd

# From raw data to contingency table to chi-square test
ct = pd.crosstab(df['race'], df['bail_decision'])
chi2, p, dof, expected = stats.chi2_contingency(ct.values)

Historical Context

Karl Pearson and the Birth of the Chi-Square Test (1900) Karl Pearson developed the chi-square test while studying the fit of data to theoretical distributions at University College London. His 1900 paper is considered one of the foundational contributions to modern statistics. Pearson was working on the problem of whether observed data were consistent with theoretical predictions — exactly the goodness-of-fit question. He showed that the sum of squared standardized deviations follows a specific distribution (which he called the "chi-square distribution"), enabling formal hypothesis testing for categorical data for the first time.

The Pearson-Fisher Feud Pearson initially calculated degrees of freedom incorrectly for the test of independence. R. A. Fisher published a correction in 1922, showing that the correct df for an $r \times c$ table is $(r-1)(c-1)$, not $rc - 1$ as Pearson had claimed. Pearson refused to accept the correction, leading to one of the most bitter feuds in the history of statistics. The dispute was deeply personal — Fisher had worked in Pearson's laboratory and the two men detested each other. Fisher was right about the degrees of freedom, and his correction is universally accepted today.

Gregor Mendel and Goodness-of-Fit (1866) Before the chi-square test existed, Mendel published his famous pea plant genetics experiments and reported ratios remarkably close to his predicted 3:1 and 9:3:3:1 ratios. After the chi-square test was developed, statisticians retroactively tested Mendel's data and found them too good — the chi-square values were suspiciously small, suggesting the data may have been adjusted (consciously or unconsciously) to match expectations. This remains one of the most famous cases in the history of science of data that are "too good to be true."

What's Coming Next

Chapter 20 introduces Analysis of Variance (ANOVA) — a method for comparing means across three or more groups. Key resources to preview:

StatQuest: "One-Way ANOVA" (YouTube) — clear visual walkthrough of the $F$-test
Khan Academy: "ANOVA" (khanacademy.org) — step-by-step introduction to the decomposition of variability
SciPy documentation: scipy.stats.f_oneway — the Python function for one-way ANOVA

The conceptual parallel to this chapter is striking: chi-square asks "is there more variation in counts than expected?" ANOVA asks "is there more variation in means than expected?" Both compare observed variation to null-hypothesis variation. If you understood the chi-square test, ANOVA will feel like a natural extension.

Chapter 21 introduces nonparametric methods — distribution-free alternatives to the $t$-test and ANOVA that make fewer assumptions about the data. Some nonparametric tests are closely related to chi-square tests; for example, the Kruskal-Wallis test (a nonparametric alternative to one-way ANOVA) is essentially a chi-square test applied to ranked data.