Key Takeaways: Probability — The Foundation of Inference
One-Sentence Summary
Probability provides the mathematical language for quantifying uncertainty, using a small set of rules — complement, addition, and multiplication — that underpin every statistical inference, AI prediction, and data-driven decision that follows.
Core Concepts at a Glance
| Concept | Definition | Why It Matters |
|---|---|---|
| Probability | A number between 0 and 1 measuring how likely an event is to occur | The foundation for all statistical inference |
| Three approaches | Classical (equally likely outcomes), relative frequency (data-driven), subjective (expert judgment) | Different situations call for different approaches; all follow the same rules |
| Law of large numbers | As trials increase, observed proportions approach the true probability | Why more data gives better estimates; why casinos always win long-term |
| Complement rule | P(not A) = 1 − P(A) | Turns hard "at least one" problems into easy "none" problems |
| Contingency tables | Two-way tables showing frequencies for combinations of categorical variables | The bridge from data to probability; the format for joint and marginal probabilities |
The Three Approaches to Probability
| Approach | Formula / Method | Best For | Example |
|---|---|---|---|
| Classical | $P(A) = \frac{\text{favorable outcomes}}{\text{total equally likely outcomes}}$ | Games of chance, simple random processes | Rolling dice, drawing cards |
| Relative Frequency | $P(A) \approx \frac{\text{times A occurred}}{\text{total trials}}$ | Situations with historical data | Shooting percentages, defect rates |
| Subjective | Expert assessment based on evidence and judgment | One-time events, complex predictions | Election forecasts, outbreak risk |
Probability Rules Quick Reference
Rule 1: Boundaries
$$0 \leq P(A) \leq 1$$
- P(A) = 0 means impossible
- P(A) = 1 means certain
Rule 2: All Outcomes Sum to 1
$$\sum P(\text{all outcomes}) = 1$$
Rule 3: Complement
$$\boxed{P(\text{not } A) = 1 - P(A)}$$
When to use: When calculating "at least one" or "not A" is easier than calculating P(A) directly.
Rule 4: Addition Rule
Mutually exclusive events (no overlap): $$P(A \text{ or } B) = P(A) + P(B)$$
General (any events): $$\boxed{P(A \text{ or } B) = P(A) + P(B) - P(A \text{ and } B)}$$
When to use: Any time you need P(A or B). Always subtract the overlap unless you know the events are mutually exclusive.
Rule 5: Multiplication Rule (Independent Events)
$$\boxed{P(A \text{ and } B) = P(A) \times P(B)}$$
When to use: When events are independent (one doesn't affect the other). Always verify independence before using this rule.
Decision Guide: Which Rule Do I Need?
What probability are you calculating?
│
├── P(not A)?
│ └── COMPLEMENT RULE: P(not A) = 1 − P(A)
│
├── P(A OR B)?
│ ├── Are A and B mutually exclusive?
│ │ ├── YES → P(A or B) = P(A) + P(B)
│ │ └── NO → P(A or B) = P(A) + P(B) − P(A and B)
│ └── TIP: If you have a contingency table, count directly
│ and divide by the grand total to verify
│
├── P(A AND B)?
│ ├── Are A and B independent?
│ │ ├── YES → P(A and B) = P(A) × P(B)
│ │ └── NO → Need conditional probability (Ch. 9)
│ └── TIP: In a contingency table, this is cell ÷ grand total
│
└── P(at least one)?
└── COMPLEMENT TRICK:
P(at least one) = 1 − P(none)
Often combine with multiplication rule for P(none)
Key Distinctions
Mutually Exclusive vs. Independent
| Mutually Exclusive | Independent | |
|---|---|---|
| Meaning | A and B CANNOT both happen | Knowing A doesn't change P(B) |
| P(A and B) | = 0 | = P(A) × P(B) |
| Asks | "Can these co-occur?" | "Do these influence each other?" |
| If both have P > 0 | They CANNOT be independent | They might or might not be mutually exclusive |
| Example | Rolling 2 and 5 on one die | Rolling a 2 on one die, flipping heads on a coin |
Critical point: Mutually exclusive events (with non-zero probability) are ALWAYS dependent. Knowing A happened tells you B didn't — that's information, which means dependence.
Contingency Table Probability Cheat Sheet
Given a contingency table with two categorical variables:
| B | not B | Total | |
|---|---|---|---|
| A | a | b | a+b |
| not A | c | d | c+d |
| Total | a+c | b+d | n |
| Probability | Formula | Name |
|---|---|---|
| P(A) | (a+b) / n | Marginal probability |
| P(B) | (a+c) / n | Marginal probability |
| P(A and B) | a / n | Joint probability |
| P(A or B) | (a+b+c) / n = P(A)+P(B)−P(A and B) | Addition rule |
| P(not A) | (c+d) / n = 1 − P(A) | Complement |
Common Misconceptions
| Misconception | Reality |
|---|---|
| "The coin is due for tails after 5 heads" | Each flip is independent; the coin has no memory (gambler's fallacy) |
| "Two remaining options means 50/50" | Only if both outcomes are equally likely (the Monty Hall trap) |
| "More data always means exact probabilities" | More data gives better estimates; the true probability may never be known exactly |
| "Mutually exclusive means independent" | The opposite — mutually exclusive events (with P > 0) are always dependent |
| "Probability predicts individual events" | Probability describes long-run patterns, not individual outcomes |
The Law of Large Numbers — What It Says and Doesn't Say
| It DOES Say | It Does NOT Say |
|---|---|
| Proportions converge to the true probability as n increases | You'll get exactly 50% heads in any specific set of flips |
| More data → more reliable estimates | The universe "corrects" for streaks |
| Long-run averages are predictable | Individual events are predictable |
| Casinos always win over millions of bets | Any particular gambler will lose |
Python Quick Reference
import numpy as np
import pandas as pd
# --- Simulation ---
np.random.seed(42)
# Coin flip simulation (1 = heads, 0 = tails)
flips = np.random.choice([0, 1], size=10000)
prop_heads = np.mean(flips) # Proportion of heads
# Die roll simulation
rolls = np.random.randint(1, 7, size=10000)
prop_six = np.mean(rolls == 6) # Proportion of sixes
# --- Contingency Tables ---
# Create from a DataFrame
contingency = pd.crosstab(df['var1'], df['var2'], margins=True)
# Joint probabilities (all cells / grand total)
joint_probs = pd.crosstab(df['var1'], df['var2'],
margins=True, normalize='all')
# Row-wise proportions (conditional probabilities preview)
row_probs = pd.crosstab(df['var1'], df['var2'],
margins=True, normalize='index')
# --- Counting ---
from math import comb, factorial
comb(23, 2) # "23 choose 2" = 253 (number of pairs)
factorial(5) # 5! = 120
Key Terms
| Term | Definition |
|---|---|
| Probability | A number between 0 and 1 measuring how likely an event is to occur |
| Event | A collection of one or more outcomes of interest |
| Sample space | The set of all possible outcomes of a random process |
| Outcome | A single result of a random process |
| Classical probability | P(A) = favorable outcomes / total equally likely outcomes |
| Relative frequency | The proportion of times an event occurs over many trials |
| Law of large numbers | As trials increase, the relative frequency approaches the true probability |
| Addition rule | P(A or B) = P(A) + P(B) − P(A and B) |
| Multiplication rule | P(A and B) = P(A) × P(B) for independent events |
| Mutually exclusive | Events that cannot both occur simultaneously |
| Independent events | Events where knowing one occurred doesn't change the probability of the other |
| Complement | The event that A does NOT occur; P(not A) = 1 − P(A) |
| Contingency table | A two-way table showing frequencies for combinations of two categorical variables |
| Joint probability | The probability that two events occur simultaneously; cell count / grand total |
| Gambler's fallacy | The mistaken belief that past random events influence future independent events |
The One Thing to Remember
If you forget everything else from this chapter, remember this:
Probability is the language of uncertainty — and uncertainty is not a flaw. It's the raw material of every statistical inference you'll ever make. The complement, addition, and multiplication rules are your entire toolkit for basic probability. Master them, and you're ready for everything that follows: conditional probability, distributions, sampling, confidence intervals, hypothesis tests. It all starts here.