Key Takeaways: Probability Distributions and the Normal Curve

One-Sentence Summary

Probability distributions are mathematical models that describe how randomness behaves — from the binomial distribution for counting successes to the normal distribution that appears everywhere in nature — and the z-score transformation lets you convert any normal variable into a universal scale for finding exact probabilities, as long as you remember that the model is useful, not true.

Core Concepts at a Glance

Concept Definition Why It Matters
Random variable A numerical outcome of a random process (discrete or continuous) The foundation for all probability distributions
Probability distribution A mathematical description of all possible values and their probabilities Assigns probabilities to outcomes of random processes
Expected value The long-run average $E(X) = \sum x \cdot P(X = x)$ The "center" of a probability distribution
Binomial distribution Counts successes in $n$ independent trials with probability $p$ Models yes/no outcomes: coin flips, defective items, free throws
Normal distribution Symmetric, bell-shaped continuous distribution defined by $\mu$ and $\sigma$ The most important distribution in statistics; foundation for inference
Standard normal Normal with $\mu = 0$, $\sigma = 1$; the "universal currency" One table (or function) handles all normal distributions
QQ-plot Compares data quantiles to theoretical normal quantiles The best visual tool for assessing normality

Distribution Comparison

Feature Binomial Normal
Type Discrete Continuous
Values 0, 1, 2, ..., $n$ Any real number ($-\infty$ to $+\infty$)
Parameters $n$ (trials), $p$ (success probability) $\mu$ (mean), $\sigma$ (standard deviation)
Shape Varies (symmetric when $p = 0.5$; skewed otherwise) Always symmetric (bell-shaped)
Probability $P(X = k)$ from formula or PMF $P(a < X < b)$ from area under curve
Python scipy.stats.binom scipy.stats.norm
Real example Daria makes 4 of 10 three-pointers Maya's blood pressure is between 110 and 130

The Binomial Distribution

Formula

$$\boxed{P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}}$$

Conditions (BINS)

Condition Question to Ask Violation Example
Binary Only two outcomes per trial? Customer buys A, B, C, or nothing
Independent Trials don't affect each other? Drawing cards without replacement
Number fixed Number of trials predetermined? "Keep going until 5 successes"
Same probability $p$ constant across trials? Player gets tired, accuracy drops

Quick Formulas

$$E(X) = np \qquad \sigma = \sqrt{np(1-p)}$$

Python

from scipy import stats

# P(X = k)
stats.binom.pmf(k, n, p)

# P(X <= k)
stats.binom.cdf(k, n, p)

# Mean and standard deviation
stats.binom.mean(n, p)
stats.binom.std(n, p)

The Normal Distribution

Properties

  1. Bell-shaped and symmetric about $\mu$
  2. Mean = median = mode
  3. Defined entirely by $\mu$ and $\sigma$
  4. 68-95-99.7 rule holds exactly
  5. Tails extend to infinity (never touch x-axis)
  6. Total area under curve = 1

The Z-Score Transformation

$$\boxed{z = \frac{x - \mu}{\sigma}} \qquad \text{(transforms any } N(\mu, \sigma^2) \text{ to } N(0, 1)\text{)}$$

$$\boxed{x = \mu + z \cdot \sigma} \qquad \text{(transforms back to original scale)}$$

Z-Table Excerpt (Standard Normal Cumulative Probabilities)

$z$ 0.00 0.02 0.04 0.06 0.08
-3.0 .0013 .0013 .0012 .0011 .0010
-2.5 .0062 .0059 .0055 .0052 .0049
-2.0 .0228 .0217 .0207 .0197 .0188
-1.5 .0668 .0643 .0618 .0594 .0571
-1.0 .1587 .1539 .1492 .1446 .1401
-0.5 .3085 .3015 .2946 .2877 .2810
0.0 .5000 .5080 .5160 .5239 .5319
0.5 .6915 .6985 .7054 .7123 .7190
1.0 .8413 .8461 .8508 .8554 .8599
1.5 .9332 .9357 .9382 .9406 .9429
2.0 .9772 .9783 .9793 .9803 .9812
2.5 .9938 .9941 .9945 .9948 .9951
3.0 .9987 .9987 .9988 .9989 .9990

Common Z-Scores and Probabilities

z-score $P(Z \leq z)$ $P(Z > z)$ Meaning
-3.00 0.0013 0.9987 0.13% below
-2.00 0.0228 0.9772 2.28% below
-1.96 0.0250 0.9750 2.5% below (used for 95% CIs)
-1.645 0.0500 0.9500 5% below (used for 90% CIs)
-1.00 0.1587 0.8413 15.87% below
0.00 0.5000 0.5000 Exactly at the mean
1.00 0.8413 0.1587 84.13% below
1.645 0.9500 0.0500 95% below
1.96 0.9750 0.0250 97.5% below
2.00 0.9772 0.0228 97.72% below
2.33 0.9901 0.0099 99th percentile
2.576 0.9950 0.0050 99.5% below (used for 99% CIs)
3.00 0.9987 0.0013 99.87% below

Python Quick Reference

from scipy import stats

# --- Normal distribution ---
# P(X <= x)
stats.norm.cdf(x, loc=mu, scale=sigma)

# P(X > x)
1 - stats.norm.cdf(x, loc=mu, scale=sigma)

# P(a < X < b)
stats.norm.cdf(b, loc=mu, scale=sigma) - stats.norm.cdf(a, loc=mu, scale=sigma)

# Find x from probability (inverse/quantile)
stats.norm.ppf(probability, loc=mu, scale=sigma)

# --- Assessing normality ---
from scipy import stats
import matplotlib.pyplot as plt

# QQ-plot
stats.probplot(data, dist="norm", plot=plt)

# Shapiro-Wilk test
stat, p_value = stats.shapiro(data)

Assessing Normality: Decision Guide

Is the data approximately normal?
│
├── Step 1: HISTOGRAM
│   └── Does it look roughly bell-shaped and symmetric?
│       ├── NO → Probably not normal. Check why.
│       └── YES / MAYBE → Continue to Step 2.
│
├── Step 2: QQ-PLOT
│   └── Do points follow the diagonal line?
│       ├── Curves up at right → heavy right tail (right-skewed)
│       ├── Curves down at left → heavy left tail (left-skewed)
│       ├── S-shape at both ends → heavy tails (leptokurtic)
│       ├── Flat S-shape → light tails (platykurtic)
│       └── Close to line → approximately normal
│
└── Step 3: SHAPIRO-WILK TEST
    └── What is the p-value?
        ├── p < 0.05 → Evidence against normality
        │   └── BUT: For large n, trivial departures are detected
        ├── p > 0.05 → No evidence against normality
        │   └── BUT: Doesn't prove normality
        └── ALWAYS interpret alongside the QQ-plot

Continuity Correction

Discrete probability Normal approximation
$P(X = k)$ $P(k - 0.5 \leq X \leq k + 0.5)$
$P(X \leq k)$ $P(X \leq k + 0.5)$
$P(X \geq k)$ $P(X \geq k - 0.5)$
$P(X < k)$ $P(X < k - 0.5)$
$P(X > k)$ $P(X > k + 0.5)$

When to use: Only when approximating a discrete distribution with a continuous one. Not needed for genuinely continuous data.

Common Misconceptions

Misconception Reality
"My data must be normal to use statistics" Many methods are robust to non-normality, especially with large samples (thanks to the CLT, Chapter 11)
"The Shapiro-Wilk test proves my data is normal" A high p-value means consistent with normality, not proof of it
"A bell curve of scores means a bell curve of ability" The distribution of scores reflects the test design, not a natural law
"The normal distribution means extreme values can't happen" The tails never reach zero — extreme values are unlikely but not impossible
"The height of the PDF at a point is the probability of that value" The PDF height is density, not probability. Only area gives probability
"If data isn't perfectly normal, I can't use the normal model" "All models are wrong, but some are useful." Close enough is good enough.

The One Thing to Remember

If you forget everything else from this chapter, remember this:

The normal distribution is a model — the most useful one in statistics, but a model nonetheless. It works beautifully for heights, blood pressure, test scores, and measurement errors because those measurements result from many small, independent, additive effects. It fails for income, wealth, stock crashes, and social media virality because those phenomena involve multiplicative effects, heavy tails, and structural boundaries. Your job as a statistician isn't to assume normality — it's to check it (using QQ-plots and the Shapiro-Wilk test) and to know when it matters (small samples, individual predictions) and when it doesn't (large samples, sample means). George Box was right: all models are wrong, but some are useful. The normal distribution is the most useful wrong model you'll ever learn.

Key Terms

Term Definition
Probability distribution A mathematical description of all possible values a random variable can take and their probabilities
Random variable A numerical outcome of a random process; discrete (countable values) or continuous (any value in a range)
Expected value The long-run average of a random variable: $E(X) = \sum x \cdot P(X = x)$
Binomial distribution The distribution of the count of successes in $n$ independent trials with probability $p$: $P(X=k) = \binom{n}{k}p^k(1-p)^{n-k}$
Normal distribution A symmetric, bell-shaped continuous distribution defined by mean $\mu$ and standard deviation $\sigma$
Standard normal distribution The normal distribution with $\mu = 0$ and $\sigma = 1$; denoted $Z \sim N(0, 1)$
Z-score The number of standard deviations a value is from the mean: $z = (x - \mu)/\sigma$
Z-table A table giving $P(Z \leq z)$ for the standard normal distribution
Probability density function (PDF) A curve for continuous distributions where area under the curve = probability
Continuity correction Adjusting by $\pm 0.5$ when approximating a discrete distribution with a continuous one
QQ-plot A graph comparing data quantiles to theoretical normal quantiles; linearity suggests normality
Shapiro-Wilk test A formal hypothesis test for normality; small p-value suggests non-normality