Case Study 1: How Polls Predict Elections — The CLT Behind the Curtain

Contributors

Case Study 1: How Polls Predict Elections — The CLT Behind the Curtain

The Scenario

It's election night, and Sam Okafor is watching the results come in with his roommates. A news anchor announces: "With just 2% of precincts reporting, our statistical model projects that Candidate Martinez will win with 54% of the vote, plus or minus 3 percentage points."

"How can they possibly predict the outcome with 2% of the votes counted?" Sam's roommate Anika asks. "That's barely any data."

Sam smiles. He's been thinking about this since Chapter 4, when they studied the 1936 Literary Digest poll that surveyed 2.4 million people and still got the election wrong. Meanwhile, George Gallup surveyed just 50,000 and nailed it. Now, armed with the Central Limit Theorem, Sam can explain why.

"It's not about how much data you have," Sam says. "It's about how you collected it — and the CLT."

The Mathematics of Polling

How Polls Work

A well-conducted election poll does the following:

Defines the population: All likely voters in the relevant area (state, district, or nation).
Draws a random sample: Typically $n = 800$ to $n = 1{,}500$ likely voters, contacted by phone, online, or in person.
Measures the variable: Each person is asked who they'll vote for.
Computes the statistic: The sample proportion $\hat{p}$ who support each candidate.
Reports with a margin of error: The standard error determines the margin of error, which is almost always reported alongside the poll result.

The CLT is the engine behind step 5. Here's how.

Applying the CLT to Polling

Suppose a poll surveys $n = 1{,}000$ randomly selected likely voters and finds that $\hat{p} = 0.54$ (54%) support Candidate Martinez.

Step 1: Calculate the standard error.

Using $\hat{p}$ to estimate the SE:

$$\widehat{\text{SE}} = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} = \sqrt{\frac{0.54 \times 0.46}{1000}} = \sqrt{0.000248} \approx 0.016$$

Step 2: Verify CLT conditions.

$n\hat{p} = 1000 \times 0.54 = 540 \geq 10$ ✓
$n(1-\hat{p}) = 1000 \times 0.46 = 460 \geq 10$ ✓
Random sample assumed ✓

Step 3: Construct the margin of error.

The margin of error for 95% confidence is approximately $\pm 1.96 \times \text{SE}$:

$$\text{ME} = 1.96 \times 0.016 \approx 0.031$$

So the poll reports: 54% ± 3.1 percentage points, or equivalently, the true support is likely between 50.9% and 57.1%.

Step 4: Interpret.

Even at the low end of the range (50.9%), Candidate Martinez has majority support. The pollster has reason to project a Martinez victory — though a close one.

Why 1,000 People Can Represent 150 Million

This is the part that blows people's minds, and the CLT explains it perfectly.

The standard error $\sqrt{p(1-p)/n}$ depends on $n$, the sample size — but not on the population size $N$ (as long as the 10% condition is met). A poll of 1,000 randomly selected voters has roughly the same standard error whether the electorate is 150 million (national) or 500,000 (a congressional district).

This seems impossible. How can 1,000 people out of 150 million be enough?

The intuition: imagine a well-mixed pot of soup. To check whether it's properly seasoned, you take a taste — a single spoonful. You don't need to drink half the pot. A single, well-mixed spoonful tells you about the whole pot, whether the pot holds one gallon or one hundred gallons. The key is the mixing (randomization), not the ratio of spoon to pot.

Mathematically, each voter in a random sample provides independent information about the population. With 1,000 independent data points, the CLT guarantees that the sampling distribution of $\hat{p}$ is approximately normal with a standard error of about $\sqrt{0.5 \times 0.5 / 1000} \approx 0.016$, or about 1.6 percentage points. That's precise enough for most election calls.

The $\sqrt{n}$ Problem in Action

Here's a calculation that illuminates the diminishing returns:

Sample Size	Standard Error	Margin of Error (95%)
100	0.050	± 10.0 pp
400	0.025	± 5.0 pp
1,000	0.016	± 3.1 pp
1,600	0.013	± 2.5 pp
4,000	0.008	± 1.6 pp
10,000	0.005	± 1.0 pp

(Calculated with $p = 0.50$ for worst-case scenario; pp = percentage points)

Notice: - Going from 100 to 1,000 (10× more respondents) cuts the margin of error from 10 to 3.1 percentage points — a dramatic improvement. - Going from 1,000 to 10,000 (another 10× increase) only cuts it from 3.1 to 1.0 percentage points. - Getting below a 1-point margin requires enormous sample sizes.

This explains why most national polls survey between 800 and 1,500 people. Below 800, the margin of error is too large to be useful. Above 1,500, the marginal improvement isn't worth the cost. The sweet spot is determined by the $\sqrt{n}$ relationship.

When Polls Go Wrong

The 1936 Literary Digest Disaster (Revisited)

In Chapter 4, you learned about the Literary Digest's infamous 1936 presidential poll. They surveyed 2.4 million people and predicted Alf Landon would crush Franklin Roosevelt. Roosevelt won in a landslide.

The problem wasn't sample size — 2.4 million is enormous. The problem was sampling bias. The magazine sampled from car registrations and telephone directories, systematically excluding lower-income Americans who overwhelmingly supported Roosevelt.

The CLT lesson: The CLT guarantees that $\bar{x}$ (or $\hat{p}$) is centered on the population mean — of the population you're actually sampling from. If your sampling frame doesn't match the target population, the CLT faithfully produces a narrow confidence interval around the wrong value. A precise wrong answer is still wrong.

The Literary Digest's standard error was tiny (because $n$ was huge), but it was centered on the preferences of wealthy Americans, not all Americans.

2016 and 2020: Did the Polls Fail?

Modern polling controversies offer more nuanced lessons.

In 2016, national polls showed Hillary Clinton leading by about 3 percentage points. She won the popular vote by 2.1 points — within the margin of error. The national polls were actually quite accurate.

The "miss" was in state-level polls, particularly in Wisconsin, Michigan, and Pennsylvania, where polls underestimated Trump's support by 4-7 percentage points. Several factors contributed:

Late-deciding voters who disproportionately broke for Trump weren't captured in pre-election polls.
Response bias: Certain demographic groups were less likely to respond to polls, creating a non-random sample.
Correlated errors: If the polls in Wisconsin were wrong for a particular reason (e.g., undersampling non-college white voters), the same error likely affected Michigan and Pennsylvania. The errors weren't independent across states.

The CLT lesson: The CLT assumes random sampling and independence. When these assumptions are violated — through non-response bias or correlated errors — the standard error formula underestimates the true uncertainty. The polls weren't applying the CLT wrong; they were violating its conditions.

After 2020, the American Association for Public Opinion Research conducted a thorough autopsy and concluded that polls underestimated Republican support in key states again, likely due to differential non-response: Trump supporters were less likely to participate in polls. The sampling wasn't truly random.

The Bigger Picture: Precision vs. Accuracy

This case study illustrates a distinction that's crucial for the rest of this course:

Precision (small standard error) means your estimates don't vary much from sample to sample. The CLT and sample size determine precision.
Accuracy (estimates centered on the true value) means your estimates hit the right target. Randomization and good study design determine accuracy.

A poll can be precise but inaccurate (small margin of error, wrong center — like the Literary Digest). Or it can be accurate but imprecise (right center, wide margin — like a small but well-designed poll). Ideally, you want both: a large, random sample that gives a small standard error centered on the true value.

The CLT gives you precision. Random sampling gives you accuracy. You need both.

Discussion Questions

A polling organization decides to increase its sample from $n = 1{,}000$ to $n = 4{,}000$ to "get more accurate results." Will this necessarily improve accuracy? Improve precision? Both? Explain.
During the 2020 election, some analysts argued that because polls had a history of underestimating certain candidates, margins of error should be "doubled" when interpreting results. Is this a valid response to known biases? What would be a better approach?
An online poll on a news website gets 50,000 responses and finds that 72% oppose a proposed law. The website reports a margin of error of ±0.4 percentage points. Why should you be skeptical of this result despite the tiny margin of error?
Social media surveys often achieve very large sample sizes but have unknown sampling mechanisms. How does this relate to the CLT's requirement for random sampling? Can you apply the standard error formula to such surveys?
Some researchers have proposed "poll aggregation" — combining results from many polls to get a better estimate. Explain how the CLT supports this approach (Hint: think of each poll's result as a "sample" from the distribution of possible polls).

Python Challenge

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

# Simulate an election with true support of 52% for Candidate A
true_p = 0.52

# Run 1,000 simulated polls of different sizes
np.random.seed(42)
poll_sizes = [100, 400, 1000, 2500]

fig, axes = plt.subplots(2, 2, figsize=(12, 10))
axes = axes.ravel()

for idx, n in enumerate(poll_sizes):
    # Simulate 1,000 polls of size n
    poll_results = np.random.binomial(n, true_p, size=1000) / n

    ax = axes[idx]
    ax.hist(poll_results, bins=30, density=True, alpha=0.7,
            color='steelblue', edgecolor='white')

    # Theoretical normal curve
    se = np.sqrt(true_p * (1 - true_p) / n)
    x = np.linspace(poll_results.min(), poll_results.max(), 200)
    ax.plot(x, stats.norm.pdf(x, true_p, se), 'r-', linewidth=2)

    # Mark the "wrong call" zone (below 0.50)
    ax.axvline(0.50, color='darkred', linestyle=':', linewidth=2,
               label='50% threshold')
    ax.axvline(true_p, color='green', linestyle='--', linewidth=2,
               label=f'True p = {true_p}')

    # Calculate proportion of polls that would call it wrong
    wrong_calls = (poll_results < 0.50).mean()

    ax.set_title(f'n = {n:,}\nSE = {se:.3f}, '
                 f'Wrong calls: {wrong_calls:.1%}',
                 fontweight='bold')
    ax.set_xlabel('Poll Result ($\\hat{p}$)')
    ax.legend(fontsize=8)

plt.suptitle('Simulated Election Polls (True Support = 52%)',
             fontsize=14, fontweight='bold', y=1.02)
plt.tight_layout()
plt.show()

Questions for the simulation:

(a) As poll size increases, what happens to the proportion of polls that would "call the election wrong" (report $\hat{p} < 0.50$)?

(b) At what sample size does the wrong-call rate drop below 5%?

(c) If the true support were 50.5% instead of 52%, how would this change? Try modifying true_p and re-running.