Further Reading: Lies, Damn Lies, and Statistics: Ethical Data Practice

Books

For Deeper Understanding

Darrell Huff, How to Lie with Statistics (1954) The original — and still one of the best — guides to statistical deception. Written in 1954 but astonishingly relevant today, Huff catalogs the tricks of misleading statistics: the well-chosen average, the gee-whiz graph, the truncated axis, the sample with built-in bias. At under 150 pages, it's the most accessible introduction to statistical skepticism ever written. Some of its examples are dated, and the book doesn't address modern issues like algorithmic bias or data privacy, but the core lessons are timeless. If you read one book about statistical deception, make it this one.

Joel Best, Damned Lies and Statistics: Untangling Numbers in the News (2001) Where Huff focuses on deliberate deception, Best focuses on the more common problem: statistics that become distorted as they travel through the media and policy ecosystem. He traces how a single statistic can be created, misquoted, amplified, and ultimately believed by millions — even when it's wildly wrong. His analysis of how "the number of children killed by guns doubles every year" (a claim that, if true, would mean more children killed than humans who ever lived) is a masterclass in statistical critical thinking.

Joel Best, More Damned Lies and Statistics: How Numbers Confuse Public Issues (2004) The sequel to Damned Lies, focusing on how statistics are used and misused in public policy debates. Best analyzes claims about crime, education, health, and the economy, showing how the same data can be presented to support wildly different conclusions. Particularly relevant to this chapter's treatment of cherry-picking and misleading denominators.

Carl T. Bergstrom and Jevin D. West, Calling Bullshit: The Art of Skepticism in a Data-Driven World (2020) A modern, comprehensive guide to detecting statistical and quantitative nonsense. Bergstrom (a biologist) and West (an information scientist) draw on examples from social media, news reporting, scientific publications, and corporate communications to teach readers how to identify and challenge dubious data claims. The chapter on "Unfair Comparisons" connects directly to Simpson's paradox, and the chapter on "Big Data" addresses algorithmic fairness. Based on a wildly popular University of Washington course of the same name.

Cathy O'Neil, Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy (2016) O'Neil — a former quantitative analyst — makes the case that algorithms used in hiring, policing, lending, and education systematically disadvantage the poor and minorities. She introduces the concept of "weapons of math destruction" (WMDs): models that are opaque, scalable, and harmful. Each chapter examines a different domain where algorithms cause harm, from for-profit college advertising to predictive policing. Essential reading for understanding the ethical dimensions of algorithmic decision-making explored in Case Study 2.

Virginia Eubanks, Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor (2018) Eubanks examines three case studies in which data-driven systems — an automated eligibility system in Indiana, a predictive model for child welfare in Allegheny County, and coordinated entry for homeless services in Los Angeles — disproportionately harm poor and marginalized communities. The book argues that digital systems don't just reflect existing inequality; they amplify it. Eubanks brings the human stories behind the data to life in a way that directly embodies Theme 2 of this textbook.

Safiya Umoja Noble, Algorithms of Oppression: How Search Engines Reinforce Racism (2018) Noble examines how Google's search algorithms reinforce racial and gender stereotypes, with a focus on how searches related to Black women return results steeped in sexism, pornography, and dehumanization. The book challenges the myth of algorithmic neutrality and makes a compelling case that search results reflect — and reinforce — societal power structures.

For the Conceptually Curious

Judea Pearl and Dana Mackenzie, The Book of Why: The New Science of Cause and Effect (2018) Pearl — a Turing Award-winning computer scientist — argues that statistics has been crippled by its inability to distinguish correlation from causation, and presents his framework of causal diagrams and do-calculus as the solution. The book is technically demanding in places but provides the deepest available treatment of why correlation vs. causation matters — both statistically and ethically. The chapter on Simpson's paradox is the most comprehensive treatment available.

Ruha Benjamin, Race After Technology: Abolitionist Tools for the New Jim Code (2019) Benjamin coins the term "the New Jim Code" to describe how seemingly neutral technologies — from health apps to risk assessment tools to social media algorithms — reinforce racial hierarchies. The book goes beyond documenting bias to propose frameworks for "abolitionist" technology design that actively works against structural racism. Directly relevant to the discussion of proxy variables and training data bias in James's case study.

Latanya Sweeney, Simple Demographics Often Identify People Uniquely (2000) The foundational research showing that 87% of the U.S. population can be uniquely identified by date of birth, zip code, and gender. Sweeney demonstrated the vulnerability of "anonymized" data by re-identifying the medical records of Massachusetts Governor William Weld. This 24-page paper changed the field of data privacy and directly informs this chapter's discussion of re-identification risks.

Articles and Papers

Bickel, P. J., Hammel, E. A., and O'Connell, J. W. (1975). "Sex Bias in Graduate Admissions: Data from Berkeley." Science, 187(4175), 398-404. The original analysis of UC Berkeley's graduate admissions data that revealed Simpson's paradox in one of the most famous examples in statistical history. The authors found that while women had a lower overall admission rate, they were admitted at equal or higher rates in most individual departments. The discrepancy was driven by women's tendency to apply to more competitive departments. This paper is a model of careful, ethical statistical analysis.

Chouldechova, A. (2017). "Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments." Big Data, 5(2), 153-163. The mathematical proof that three natural fairness criteria — calibration, equal false positive rates, and equal false negative rates — cannot all be simultaneously satisfied when base rates differ between groups. This result is foundational for understanding the fairness impossibility discussed in Case Study 2 and has profound implications for anyone building or evaluating prediction algorithms.

Kleinberg, J., Mullainathan, S., and Raghavan, M. (2016). "Inherent Trade-Offs in the Fair Determination of Risk Scores." arXiv:1609.05807. An independent and simultaneous derivation of the fairness impossibility result, approaching it from a computer science perspective. The paper rigorously demonstrates that calibration and balance (equal error rates across groups) are fundamentally incompatible when base rates differ. Accessible to readers comfortable with basic probability.

Angwin, J., Larson, J., Mattu, S., and Kirchner, L. (2016). "Machine Bias." ProPublica, May 23. The investigative report that brought algorithmic bias in criminal justice to widespread public attention. ProPublica's analysis of the COMPAS recidivism prediction tool found that Black defendants were almost twice as likely to be incorrectly flagged as high risk compared to white defendants. The article sparked an ongoing debate about algorithmic fairness that directly informs James's analysis in this chapter. (Note: Northpointe, the company that created COMPAS, disputed ProPublica's methodology, arguing that their model was equally calibrated across races — illustrating the fairness impossibility in practice.)

Gelman, A., and Loken, E. (2013). "The Garden of Forking Paths: Why Multiple Comparisons Can Be a Problem, Even When There Is No 'Fishing Expedition' or 'p-Hacking' and the Research Hypothesis Was Posited Ahead of Time." Working paper, Department of Statistics, Columbia University. A subtle and important paper arguing that the problem of multiple comparisons extends beyond deliberate p-hacking. Even well-intentioned researchers make analysis decisions that depend on the data — choosing which outliers to remove, which covariates to include, how to define variables — and these decisions can inflate false positive rates even when no one is deliberately fishing for significance. The paper coined the phrase "garden of forking paths" used throughout Chapters 13, 17, and 27.

John, L. K., Loewenstein, G., and Prelec, D. (2012). "Measuring the Prevalence of Questionable Research Practices With Incentives for Truth Telling." Psychological Science, 23(5), 524-532. A survey of over 2,000 psychologists that revealed the shocking prevalence of questionable research practices: 58% admitted to deciding whether to exclude data after looking at its effect on results, 46% admitted to selectively reporting studies, and 35% admitted to reporting unexpected findings as if they had been predicted. This paper helped launch the reform movement in psychology and provides the empirical basis for this chapter's discussion of QRPs.

Sweeney, L. (2013). "Discrimination in Online Ad Delivery." Communications of the ACM, 56(5), 44-54. Sweeney found that Google ads for arrest records were significantly more likely to appear for searches of Black-identifying names than white-identifying names. The study demonstrates how algorithmic systems can discriminate even without explicit racial inputs, through the mechanism of proxy variables and historical patterns in click-through data. Directly relevant to the proxy variable discussion in James's case study.

Kramer, A. D. I., Guillory, J. E., and Hancock, J. T. (2014). "Experimental Evidence of Massive-Scale Emotional Contagion through Social Networks." Proceedings of the National Academy of Sciences, 111(24), 8788-8790. The Facebook emotional contagion study — one of the most controversial experiments in the history of social science. The study showed that manipulating News Feed content affected users' own emotional expressions, but the ethical backlash over its lack of informed consent was far more impactful than its statistical findings. Essential primary source for the discussion in Section 27.6.

Online Resources

Open Science Framework (OSF) https://osf.io/

The primary platform for pre-registration in the social and behavioral sciences. Researchers can publicly register their hypotheses, methods, and analysis plans before collecting data. The site also hosts open data, open materials, and pre-prints. If you want to practice pre-registration for your own projects, OSF provides free accounts and step-by-step guides.

AsPredicted.org https://aspredicted.org/

A simpler alternative to OSF for pre-registration. Researchers answer eight questions about their study (hypothesis, dependent variable, conditions, sample size, exclusion criteria, etc.) and receive a time-stamped, private registration that can be made public when the paper is submitted. Used by many researchers who find OSF's full registration form too detailed.

American Statistical Association Ethical Guidelines for Statistical Practice (2022 revision) https://www.amstat.org/your-career/ethical-guidelines-for-statistical-practice

The ASA's official ethical guidelines, covering professional integrity, responsibilities to subjects, employers, research colleagues, and the public. The 2022 revision includes expanded guidance on algorithmic fairness, data privacy, and reproducibility. These guidelines form the foundation for the personal code of ethics exercise in Section 27.11.

ProPublica's Machine Bias Series https://www.propublica.org/series/machine-bias

ProPublica's ongoing investigative series on algorithmic bias across criminal justice, healthcare, insurance, and other domains. The original 2016 COMPAS investigation is here, along with follow-up analyses and responses from algorithm developers. Essential reading for anyone interested in the real-world consequences of the statistical concepts discussed in this chapter.

The Markup https://themarkup.org/

A nonprofit news organization dedicated to investigating technology and its impact on society. Their reporting on algorithmic accountability, data privacy, and digital discrimination provides real-world examples of the ethical issues discussed in this chapter. The "Citizen Browser" project, which monitors how Facebook's algorithm serves content differently to different users, is particularly relevant.

Fairness and Machine Learning (Barocas, Hardt, and Narayanan) https://fairmlbook.org/

A free online textbook covering the technical and conceptual foundations of algorithmic fairness. The chapters on "Classification" and "Causality" provide rigorous treatments of the fairness impossibility theorem and the role of causal reasoning in fair prediction. More technical than this chapter but essential for anyone who wants to go deeper.

Retraction Watch https://retractionwatch.com/

A blog that tracks retractions of scientific papers and investigates the reasons behind them — from honest errors to data fabrication to p-hacking. The site maintains a database of retracted papers and provides case studies that illustrate the real-world consequences of the questionable research practices discussed in this chapter.

Data & Society Research Institute https://datasociety.net/

A research institute focused on the social implications of data-centric technologies. Their reports on algorithmic accountability, media manipulation, and data-driven discrimination provide policy-relevant analyses that bridge the gap between technical research and public understanding. Reports are free and accessible to a general audience.

Historical and Ethical Resources

Centers for Disease Control and Prevention (CDC): The Tuskegee Timeline https://www.cdc.gov/tuskegee/timeline.htm

The CDC's own timeline of the Tuskegee Syphilis Study, from its inception in 1932 through the 1997 presidential apology. Includes primary documents, photographs, and links to the Belmont Report and subsequent regulations. Essential primary source for understanding the historical context of modern research ethics.

Office for Human Research Protections (OHRP): The Belmont Report https://www.hhs.gov/ohrp/regulations-and-policy/belmont-report/

The full text of the Belmont Report (1979), which established the three ethical principles — Respect for Persons, Beneficence, and Justice — that govern all human subjects research in the United States. At only 10 pages, it's remarkably readable and provides the philosophical foundation for IRB review, informed consent requirements, and the ethical frameworks discussed in Section 27.8.

International Association of Privacy Professionals (IAPP) https://iapp.org/

The largest organization for privacy professionals, offering resources on GDPR, CCPA, and other data protection regulations. Their "Daily Dashboard" newsletter tracks privacy-related developments worldwide, and their certification programs (CIPP, CIPM) are the industry standard for data privacy expertise.