Further Reading: Conditional Probability and Bayes' Theorem
Books
For Deeper Understanding
Sharon Bertsch McGrayne, The Theory That Would Not Die: How Bayes' Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy (2011) The most accessible history of Bayes' theorem ever written. McGrayne traces Bayesian thinking from Reverend Thomas Bayes's posthumous 1763 paper through its use in World War II codebreaking, Cold War submarine tracking, and modern AI. No equations required — this is pure narrative. If you want to understand why Bayes' theorem was controversial for 200 years and how it became the foundation of modern AI, start here.
Gerd Gigerenzer, Calculated Risks: How to Know When Numbers Deceive You (2002) Gigerenzer is the cognitive psychologist who championed the natural frequency approach you learned in Section 9.7. This book is a masterclass in medical risk communication. Gigerenzer shows that doctors, patients, and lawyers systematically misunderstand probabilities — and that natural frequencies almost always fix the confusion. The chapters on mammography screening and DNA evidence are directly relevant to this chapter's case studies. Essential reading for anyone going into health, law, or public policy.
Nate Silver, The Signal and the Noise: Why So Many Predictions Fail — but Some Don't (2012) Silver, the statistician behind FiveThirtyEight, devotes a full chapter to Bayesian reasoning and shows how it applies to weather forecasting, election prediction, poker, and earthquake prediction. His writing is conversational and example-rich. Particularly useful for understanding how Bayesian updating works in practice — not just with a single evidence-posterior update, but with continuous streams of information.
For the Mathematically Curious
Richard McElreath, Statistical Rethinking: A Bayesian Course with Examples in R and Stan, 2nd edition (2020) Don't read this now — it's a graduate-level textbook. But bookmark it for later. McElreath teaches all of statistics from a Bayesian perspective, starting from the foundations you've learned in this chapter and building to sophisticated models. His writing is remarkably clear for technical material, and the free lecture videos on YouTube are among the best statistics lectures available. When you're ready to go deeper, this is the gold standard.
E.T. Jaynes, Probability Theory: The Logic of Science (2003) The philosophical foundation of Bayesian probability, written by a physicist who argued that probability should be understood as an extension of logic, not just a measure of frequency. Dense, opinionated, and brilliant. For advanced readers who want to understand why some statisticians treat probability as "degree of belief" rather than "long-run frequency" — and why that distinction matters.
Articles and Papers
Gigerenzer, G., & Hoffrage, U. (1995). "How to improve Bayesian reasoning without instruction: Frequency formats." Psychological Review, 102(4), 684-704. The landmark paper that demonstrated the power of natural frequencies. Gigerenzer and Hoffrage showed that when medical information was presented as natural frequencies rather than conditional probabilities, the proportion of physicians giving correct Bayesian answers jumped from about 10% to about 46%. This paper changed how risk is communicated in medicine. If you only read one academic paper related to this chapter, make it this one.
Casscells, W., Schoenberger, A., & Graboys, T. B. (1978). "Interpretation by physicians of clinical laboratory results." New England Journal of Medicine, 299(18), 999-1001. A famous study that asked physicians: "If a disease has a 1/1000 prevalence and a test has a 5% false positive rate with 100% sensitivity, what is the probability that a person with a positive test has the disease?" Only 18% of physicians gave the correct answer (about 2%). Most answered 95%. This 1978 paper first documented the base rate fallacy among medical professionals and remains relevant today.
Thompson, W. C., & Schumann, E. L. (1987). "Interpretation of statistical evidence in criminal trials: The prosecutor's fallacy and the defense attorney's fallacy." Law and Human Behavior, 11(3), 167-187. The paper that formally named and analyzed the prosecutor's fallacy. Thompson and Schumann also identified the defense attorney's fallacy — the opposite error of dismissing statistical evidence by arguing that one matching suspect among many proves nothing. Both fallacies are failures to apply Bayes' theorem correctly.
Royal Statistical Society (2001). "Letter from the President of the Royal Statistical Society regarding the use of statistical evidence in court cases." The RSS's unprecedented public statement in response to the Sally Clark case (Case Study 2). The letter explains, in accessible terms, why multiplying SIDS probabilities and treating the result as the probability of innocence constituted a fundamental misuse of statistics. Available online and worth reading as an example of statisticians speaking to the public.
Online Resources
Interactive Tools
Seeing Theory — Bayesian Inference Chapter https://seeing-theory.brown.edu/bayesian-inference/ A beautiful interactive visualization of Bayes' theorem from Brown University. You can adjust the prior, likelihood, and false alarm rate and watch the posterior change in real time. The visual representation of the "update" process makes the abstract formula concrete.
3Blue1Brown — "Bayes' theorem, the geometry of changing beliefs" https://www.youtube.com/watch?v=HZGCoVF3YvM Grant Sanderson's 15-minute visual explanation of Bayes' theorem is one of the most watched math videos on YouTube. The geometric intuition — representing probabilities as areas — makes the formula genuinely intuitive. If the algebra of Bayes' theorem isn't clicking for you, watch this video.
3Blue1Brown — "The medical test paradox, and redesigning Bayes' rule" https://www.youtube.com/watch?v=lG4VkPoG3ko A follow-up video specifically about medical testing and the base rate problem. Sanderson introduces the "Bayes factor" approach and shows why natural frequencies work so well. Pairs perfectly with Section 9.7 and Case Study 1.
StatQuest — "Bayes' Theorem, Clearly Explained!!!" https://www.youtube.com/watch?v=9wCnvr7Xw4E Josh Starmer's characteristically enthusiastic walkthrough of Bayes' theorem with a disease testing example. StatQuest videos are great for reviewing concepts — short, clear, and focused on one idea at a time.
For Practice
Khan Academy — Conditional Probability and Bayes' Theorem https://www.khanacademy.org/math/statistics-probability/probability-library#conditional-probability-independence Free, self-paced practice problems with immediate feedback. Start with the conditional probability exercises and work up to the Bayes' theorem section. The hint system is helpful when you're stuck.
Brilliant.org — Bayesian Statistics Course https://brilliant.org/courses/bayesian-statistics/ A more interactive, puzzle-based approach to Bayesian reasoning. Requires a subscription but offers a free trial. Good for students who learn better through challenges than through reading.
Connection to Upcoming Chapters
The concepts from Chapter 9 provide the foundation for several critical ideas later in the course:
-
Chapter 10 (Probability Distributions): You'll extend conditional probability from discrete events to continuous distributions. The idea that "knowing something changes the probability" becomes the basis for probability density functions and cumulative distribution functions.
-
Chapter 11 (Sampling Distributions): The concept of conditional probability underlies the sampling distribution — the distribution of a statistic given repeated sampling from the same population. When you ask "what's the probability of getting a sample mean this extreme?" you're asking a conditional probability question.
-
Chapter 13 (Hypothesis Testing): The p-value is a conditional probability: $P(\text{data this extreme} \mid \text{null hypothesis is true})$. The entire hypothesis testing framework is built on the same conditional logic you learned here — and the base rate fallacy helps explain why p-values are so widely misinterpreted.
-
Chapter 14 (Inference for Proportions): When you test whether a sample proportion differs from a hypothesized value, you're implicitly using the same framework: prior belief (null hypothesis) → evidence (sample data) → updated conclusion (reject or fail to reject).
-
Chapter 24 (Logistic Regression): Logistic regression models the conditional probability $P(Y = 1 \mid X)$ — the probability of a "yes" outcome given the predictors. The entire model is a conditional probability machine, and its connection to Bayes' theorem is direct and deep.
-
Chapter 26 (Statistics and AI): The Naive Bayes classifier, Bayesian neural networks, and large language models all build on the Bayesian updating framework. Your understanding of priors, likelihoods, and posteriors from this chapter is the conceptual foundation for understanding how AI systems reason about uncertainty.
A Final Thought
Bayes' theorem was first published in 1763 — more than 260 years ago — by Reverend Thomas Bayes, a Presbyterian minister who never published the result during his lifetime. His friend Richard Price found the paper among Bayes's effects and sent it to the Royal Society. For the next two centuries, Bayesian thinking was controversial, dismissed by many statisticians as subjective and unscientific.
Today, Bayes' theorem runs the world. It filters your email, powers your GPS, recommends your next show, helps diagnose diseases, and guides autonomous vehicles. Every large language model, every recommendation engine, every fraud detection system is Bayesian at its core.
The irony is beautiful: the most "practical" idea in modern technology was conceived by an 18th-century clergyman and dismissed by professionals for 200 years. Never underestimate the long arc of a good idea.