Key Takeaways: Probability — The Foundation of Inference

Contributors

Key Takeaways: Probability — The Foundation of Inference

One-Sentence Summary

Probability provides the mathematical language for quantifying uncertainty, using a small set of rules — complement, addition, and multiplication — that underpin every statistical inference, AI prediction, and data-driven decision that follows.

Core Concepts at a Glance

Concept	Definition	Why It Matters
Probability	A number between 0 and 1 measuring how likely an event is to occur	The foundation for all statistical inference
Three approaches	Classical (equally likely outcomes), relative frequency (data-driven), subjective (expert judgment)	Different situations call for different approaches; all follow the same rules
Law of large numbers	As trials increase, observed proportions approach the true probability	Why more data gives better estimates; why casinos always win long-term
Complement rule	P(not A) = 1 − P(A)	Turns hard "at least one" problems into easy "none" problems
Contingency tables	Two-way tables showing frequencies for combinations of categorical variables	The bridge from data to probability; the format for joint and marginal probabilities

The Three Approaches to Probability

Approach	Formula / Method	Best For	Example
Classical	$P(A) = \frac{\text{favorable outcomes}}{\text{total equally likely outcomes}}$	Games of chance, simple random processes	Rolling dice, drawing cards
Relative Frequency	$P(A) \approx \frac{\text{times A occurred}}{\text{total trials}}$	Situations with historical data	Shooting percentages, defect rates
Subjective	Expert assessment based on evidence and judgment	One-time events, complex predictions	Election forecasts, outbreak risk

Probability Rules Quick Reference

Rule 1: Boundaries

$$0 \leq P(A) \leq 1$$

P(A) = 0 means impossible
P(A) = 1 means certain

Rule 2: All Outcomes Sum to 1

$$\sum P(\text{all outcomes}) = 1$$

Rule 3: Complement

$$\boxed{P(\text{not } A) = 1 - P(A)}$$

When to use: When calculating "at least one" or "not A" is easier than calculating P(A) directly.

Rule 4: Addition Rule

Mutually exclusive events (no overlap): $$P(A \text{ or } B) = P(A) + P(B)$$

General (any events): $$\boxed{P(A \text{ or } B) = P(A) + P(B) - P(A \text{ and } B)}$$

When to use: Any time you need P(A or B). Always subtract the overlap unless you know the events are mutually exclusive.

Rule 5: Multiplication Rule (Independent Events)

$$\boxed{P(A \text{ and } B) = P(A) \times P(B)}$$

When to use: When events are independent (one doesn't affect the other). Always verify independence before using this rule.

Decision Guide: Which Rule Do I Need?

What probability are you calculating?
│
├── P(not A)?
│   └── COMPLEMENT RULE: P(not A) = 1 − P(A)
│
├── P(A OR B)?
│   ├── Are A and B mutually exclusive?
│   │   ├── YES → P(A or B) = P(A) + P(B)
│   │   └── NO  → P(A or B) = P(A) + P(B) − P(A and B)
│   └── TIP: If you have a contingency table, count directly
│       and divide by the grand total to verify
│
├── P(A AND B)?
│   ├── Are A and B independent?
│   │   ├── YES → P(A and B) = P(A) × P(B)
│   │   └── NO  → Need conditional probability (Ch. 9)
│   └── TIP: In a contingency table, this is cell ÷ grand total
│
└── P(at least one)?
    └── COMPLEMENT TRICK:
        P(at least one) = 1 − P(none)
        Often combine with multiplication rule for P(none)

Key Distinctions

Mutually Exclusive vs. Independent

	Mutually Exclusive	Independent
Meaning	A and B CANNOT both happen	Knowing A doesn't change P(B)
P(A and B)	= 0	= P(A) × P(B)
Asks	"Can these co-occur?"	"Do these influence each other?"
If both have P > 0	They CANNOT be independent	They might or might not be mutually exclusive
Example	Rolling 2 and 5 on one die	Rolling a 2 on one die, flipping heads on a coin

Critical point: Mutually exclusive events (with non-zero probability) are ALWAYS dependent. Knowing A happened tells you B didn't — that's information, which means dependence.

Contingency Table Probability Cheat Sheet

Given a contingency table with two categorical variables:

	B	not B	Total
A	a	b	a+b
not A	c	d	c+d
Total	a+c	b+d	n

Probability	Formula	Name
P(A)	(a+b) / n	Marginal probability
P(B)	(a+c) / n	Marginal probability
P(A and B)	a / n	Joint probability
P(A or B)	(a+b+c) / n = P(A)+P(B)−P(A and B)	Addition rule
P(not A)	(c+d) / n = 1 − P(A)	Complement

Common Misconceptions

Misconception	Reality
"The coin is due for tails after 5 heads"	Each flip is independent; the coin has no memory (gambler's fallacy)
"Two remaining options means 50/50"	Only if both outcomes are equally likely (the Monty Hall trap)
"More data always means exact probabilities"	More data gives better estimates; the true probability may never be known exactly
"Mutually exclusive means independent"	The opposite — mutually exclusive events (with P > 0) are always dependent
"Probability predicts individual events"	Probability describes long-run patterns, not individual outcomes

The Law of Large Numbers — What It Says and Doesn't Say

It DOES Say	It Does NOT Say
Proportions converge to the true probability as n increases	You'll get exactly 50% heads in any specific set of flips
More data → more reliable estimates	The universe "corrects" for streaks
Long-run averages are predictable	Individual events are predictable
Casinos always win over millions of bets	Any particular gambler will lose

Python Quick Reference

import numpy as np
import pandas as pd

# --- Simulation ---
np.random.seed(42)

# Coin flip simulation (1 = heads, 0 = tails)
flips = np.random.choice([0, 1], size=10000)
prop_heads = np.mean(flips)   # Proportion of heads

# Die roll simulation
rolls = np.random.randint(1, 7, size=10000)
prop_six = np.mean(rolls == 6)  # Proportion of sixes

# --- Contingency Tables ---
# Create from a DataFrame
contingency = pd.crosstab(df['var1'], df['var2'], margins=True)

# Joint probabilities (all cells / grand total)
joint_probs = pd.crosstab(df['var1'], df['var2'],
                          margins=True, normalize='all')

# Row-wise proportions (conditional probabilities preview)
row_probs = pd.crosstab(df['var1'], df['var2'],
                        margins=True, normalize='index')

# --- Counting ---
from math import comb, factorial
comb(23, 2)     # "23 choose 2" = 253 (number of pairs)
factorial(5)    # 5! = 120

Key Terms

Term	Definition
Probability	A number between 0 and 1 measuring how likely an event is to occur
Event	A collection of one or more outcomes of interest
Sample space	The set of all possible outcomes of a random process
Outcome	A single result of a random process
Classical probability	P(A) = favorable outcomes / total equally likely outcomes
Relative frequency	The proportion of times an event occurs over many trials
Law of large numbers	As trials increase, the relative frequency approaches the true probability
Addition rule	P(A or B) = P(A) + P(B) − P(A and B)
Multiplication rule	P(A and B) = P(A) × P(B) for independent events
Mutually exclusive	Events that cannot both occur simultaneously
Independent events	Events where knowing one occurred doesn't change the probability of the other
Complement	The event that A does NOT occur; P(not A) = 1 − P(A)
Contingency table	A two-way table showing frequencies for combinations of two categorical variables
Joint probability	The probability that two events occur simultaneously; cell count / grand total
Gambler's fallacy	The mistaken belief that past random events influence future independent events

The One Thing to Remember

If you forget everything else from this chapter, remember this:

Probability is the language of uncertainty — and uncertainty is not a flaw. It's the raw material of every statistical inference you'll ever make. The complement, addition, and multiplication rules are your entire toolkit for basic probability. Master them, and you're ready for everything that follows: conditional probability, distributions, sampling, confidence intervals, hypothesis tests. It all starts here.