Key Takeaways: Probability Distributions and the Normal Curve

Contributors

Key Takeaways: Probability Distributions and the Normal Curve

One-Sentence Summary

Probability distributions are mathematical models that describe how randomness behaves — from the binomial distribution for counting successes to the normal distribution that appears everywhere in nature — and the z-score transformation lets you convert any normal variable into a universal scale for finding exact probabilities, as long as you remember that the model is useful, not true.

Core Concepts at a Glance

Concept	Definition	Why It Matters
Random variable	A numerical outcome of a random process (discrete or continuous)	The foundation for all probability distributions
Probability distribution	A mathematical description of all possible values and their probabilities	Assigns probabilities to outcomes of random processes
Expected value	The long-run average $E(X) = \sum x \cdot P(X = x)$	The "center" of a probability distribution
Binomial distribution	Counts successes in $n$ independent trials with probability $p$	Models yes/no outcomes: coin flips, defective items, free throws
Normal distribution	Symmetric, bell-shaped continuous distribution defined by $\mu$ and $\sigma$	The most important distribution in statistics; foundation for inference
Standard normal	Normal with $\mu = 0$, $\sigma = 1$; the "universal currency"	One table (or function) handles all normal distributions
QQ-plot	Compares data quantiles to theoretical normal quantiles	The best visual tool for assessing normality

Distribution Comparison

Feature	Binomial	Normal
Type	Discrete	Continuous
Values	0, 1, 2, ..., $n$	Any real number ($-\infty$ to $+\infty$)
Parameters	$n$ (trials), $p$ (success probability)	$\mu$ (mean), $\sigma$ (standard deviation)
Shape	Varies (symmetric when $p = 0.5$; skewed otherwise)	Always symmetric (bell-shaped)
Probability	$P(X = k)$ from formula or PMF	$P(a < X < b)$ from area under curve
Python	`scipy.stats.binom`	`scipy.stats.norm`
Real example	Daria makes 4 of 10 three-pointers	Maya's blood pressure is between 110 and 130

The Binomial Distribution

Formula

$$\boxed{P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}}$$

Conditions (BINS)

Condition	Question to Ask	Violation Example
Binary	Only two outcomes per trial?	Customer buys A, B, C, or nothing
Independent	Trials don't affect each other?	Drawing cards without replacement
Number fixed	Number of trials predetermined?	"Keep going until 5 successes"
Same probability	$p$ constant across trials?	Player gets tired, accuracy drops

Quick Formulas

$$E(X) = np \qquad \sigma = \sqrt{np(1-p)}$$

Python

from scipy import stats

# P(X = k)
stats.binom.pmf(k, n, p)

# P(X <= k)
stats.binom.cdf(k, n, p)

# Mean and standard deviation
stats.binom.mean(n, p)
stats.binom.std(n, p)

The Normal Distribution

Properties

Bell-shaped and symmetric about $\mu$
Mean = median = mode
Defined entirely by $\mu$ and $\sigma$
68-95-99.7 rule holds exactly
Tails extend to infinity (never touch x-axis)
Total area under curve = 1

The Z-Score Transformation

$$\boxed{z = \frac{x - \mu}{\sigma}} \qquad \text{(transforms any } N(\mu, \sigma^2) \text{ to } N(0, 1)\text{)}$$

$$\boxed{x = \mu + z \cdot \sigma} \qquad \text{(transforms back to original scale)}$$

Z-Table Excerpt (Standard Normal Cumulative Probabilities)

$z$	0.00	0.02	0.04	0.06	0.08
-3.0	.0013	.0013	.0012	.0011	.0010
-2.5	.0062	.0059	.0055	.0052	.0049
-2.0	.0228	.0217	.0207	.0197	.0188
-1.5	.0668	.0643	.0618	.0594	.0571
-1.0	.1587	.1539	.1492	.1446	.1401
-0.5	.3085	.3015	.2946	.2877	.2810
0.0	.5000	.5080	.5160	.5239	.5319
0.5	.6915	.6985	.7054	.7123	.7190
1.0	.8413	.8461	.8508	.8554	.8599
1.5	.9332	.9357	.9382	.9406	.9429
2.0	.9772	.9783	.9793	.9803	.9812
2.5	.9938	.9941	.9945	.9948	.9951
3.0	.9987	.9987	.9988	.9989	.9990

Common Z-Scores and Probabilities

z-score	$P(Z \leq z)$	$P(Z > z)$	Meaning
-3.00	0.0013	0.9987	0.13% below
-2.00	0.0228	0.9772	2.28% below
-1.96	0.0250	0.9750	2.5% below (used for 95% CIs)
-1.645	0.0500	0.9500	5% below (used for 90% CIs)
-1.00	0.1587	0.8413	15.87% below
0.00	0.5000	0.5000	Exactly at the mean
1.00	0.8413	0.1587	84.13% below
1.645	0.9500	0.0500	95% below
1.96	0.9750	0.0250	97.5% below
2.00	0.9772	0.0228	97.72% below
2.33	0.9901	0.0099	99th percentile
2.576	0.9950	0.0050	99.5% below (used for 99% CIs)
3.00	0.9987	0.0013	99.87% below

Python Quick Reference

from scipy import stats

# --- Normal distribution ---
# P(X <= x)
stats.norm.cdf(x, loc=mu, scale=sigma)

# P(X > x)
1 - stats.norm.cdf(x, loc=mu, scale=sigma)

# P(a < X < b)
stats.norm.cdf(b, loc=mu, scale=sigma) - stats.norm.cdf(a, loc=mu, scale=sigma)

# Find x from probability (inverse/quantile)
stats.norm.ppf(probability, loc=mu, scale=sigma)

# --- Assessing normality ---
from scipy import stats
import matplotlib.pyplot as plt

# QQ-plot
stats.probplot(data, dist="norm", plot=plt)

# Shapiro-Wilk test
stat, p_value = stats.shapiro(data)

Assessing Normality: Decision Guide

Is the data approximately normal?
│
├── Step 1: HISTOGRAM
│   └── Does it look roughly bell-shaped and symmetric?
│       ├── NO → Probably not normal. Check why.
│       └── YES / MAYBE → Continue to Step 2.
│
├── Step 2: QQ-PLOT
│   └── Do points follow the diagonal line?
│       ├── Curves up at right → heavy right tail (right-skewed)
│       ├── Curves down at left → heavy left tail (left-skewed)
│       ├── S-shape at both ends → heavy tails (leptokurtic)
│       ├── Flat S-shape → light tails (platykurtic)
│       └── Close to line → approximately normal
│
└── Step 3: SHAPIRO-WILK TEST
    └── What is the p-value?
        ├── p < 0.05 → Evidence against normality
        │   └── BUT: For large n, trivial departures are detected
        ├── p > 0.05 → No evidence against normality
        │   └── BUT: Doesn't prove normality
        └── ALWAYS interpret alongside the QQ-plot

Continuity Correction

Discrete probability	Normal approximation
$P(X = k)$	$P(k - 0.5 \leq X \leq k + 0.5)$
$P(X \leq k)$	$P(X \leq k + 0.5)$
$P(X \geq k)$	$P(X \geq k - 0.5)$
$P(X < k)$	$P(X < k - 0.5)$
$P(X > k)$	$P(X > k + 0.5)$

When to use: Only when approximating a discrete distribution with a continuous one. Not needed for genuinely continuous data.

Common Misconceptions

Misconception	Reality
"My data must be normal to use statistics"	Many methods are robust to non-normality, especially with large samples (thanks to the CLT, Chapter 11)
"The Shapiro-Wilk test proves my data is normal"	A high p-value means consistent with normality, not proof of it
"A bell curve of scores means a bell curve of ability"	The distribution of scores reflects the test design, not a natural law
"The normal distribution means extreme values can't happen"	The tails never reach zero — extreme values are unlikely but not impossible
"The height of the PDF at a point is the probability of that value"	The PDF height is density, not probability. Only area gives probability
"If data isn't perfectly normal, I can't use the normal model"	"All models are wrong, but some are useful." Close enough is good enough.

The One Thing to Remember

If you forget everything else from this chapter, remember this:

The normal distribution is a model — the most useful one in statistics, but a model nonetheless. It works beautifully for heights, blood pressure, test scores, and measurement errors because those measurements result from many small, independent, additive effects. It fails for income, wealth, stock crashes, and social media virality because those phenomena involve multiplicative effects, heavy tails, and structural boundaries. Your job as a statistician isn't to assume normality — it's to check it (using QQ-plots and the Shapiro-Wilk test) and to know when it matters (small samples, individual predictions) and when it doesn't (large samples, sample means). George Box was right: all models are wrong, but some are useful. The normal distribution is the most useful wrong model you'll ever learn.

Key Terms

Term	Definition
Probability distribution	A mathematical description of all possible values a random variable can take and their probabilities
Random variable	A numerical outcome of a random process; discrete (countable values) or continuous (any value in a range)
Expected value	The long-run average of a random variable: $E(X) = \sum x \cdot P(X = x)$
Binomial distribution	The distribution of the count of successes in $n$ independent trials with probability $p$: $P(X=k) = \binom{n}{k}p^k(1-p)^{n-k}$
Normal distribution	A symmetric, bell-shaped continuous distribution defined by mean $\mu$ and standard deviation $\sigma$
Standard normal distribution	The normal distribution with $\mu = 0$ and $\sigma = 1$; denoted $Z \sim N(0, 1)$
Z-score	The number of standard deviations a value is from the mean: $z = (x - \mu)/\sigma$
Z-table	A table giving $P(Z \leq z)$ for the standard normal distribution
Probability density function (PDF)	A curve for continuous distributions where area under the curve = probability
Continuity correction	Adjusting by $\pm 0.5$ when approximating a discrete distribution with a continuous one
QQ-plot	A graph comparing data quantiles to theoretical normal quantiles; linearity suggests normality
Shapiro-Wilk test	A formal hypothesis test for normality; small p-value suggests non-normality