Case Study 1: Polling, Margins of Error, and the Night That Changed Election Forecasting

Contributors

Case Study 1: Polling, Margins of Error, and the Night That Changed Election Forecasting

The Setup

On the evening of November 8, 2016, virtually every major election forecast gave Hillary Clinton an overwhelming probability of winning the U.S. presidential election. The New York Times' Upshot model had her at 85%. FiveThirtyEight was more cautious at 71%. The Huffington Post's model said 98%.

By midnight, Donald Trump had won.

The reaction was immediate and intense. "The polls were wrong!" became the dominant narrative. Pollsters were accused of incompetence, bias, or both. Public trust in polling — already low — cratered further.

But here's the question that matters for you, as a student who now understands proportion inference: Were the polls actually wrong? Or did people simply not understand what the polls were saying?

The answer, it turns out, is a little of both. And understanding how requires exactly the tools you've learned in this chapter.

What the Polls Actually Said

Let's look at the final national polling averages in 2016:

Source	Clinton	Trump	Clinton Lead	Margin of Error
RealClearPolitics Average	46.8%	43.6%	+3.2	±2-3% per poll
FiveThirtyEight Average	45.7%	41.6%	+4.1	±2-3% per poll
HuffPost Average	45.4%	41.8%	+3.6	±2-3% per poll

The final popular vote result: Clinton 48.2%, Trump 46.1% — a Clinton lead of 2.1%.

Was the National Polling Off?

The national polls predicted a Clinton lead of roughly 3-4 points. The actual lead was about 2 points. The error was about 1-2 percentage points.

Let's put that in context with a hypothesis test. If the true proportion supporting Clinton was 0.482, and a poll of 1,000 people showed $\hat{p} = 0.468$, was that "surprising"?

$$SE = \sqrt{\frac{0.482 \times 0.518}{1000}} = \sqrt{0.0002497} = 0.0158$$

$$z = \frac{0.468 - 0.482}{0.0158} = \frac{-0.014}{0.0158} = -0.89$$

A $z$-score of $-0.89$ corresponds to $P(Z \leq -0.89) = 0.187$. That's an 18.7% chance — well within normal sampling variation. The national polls were not surprisingly wrong.

Where the Real Problem Was: State Polls

The 2016 election isn't decided by the national popular vote — it's decided state by state through the Electoral College. And the state polls, particularly in the decisive states of Wisconsin, Michigan, and Pennsylvania, had larger errors:

State	Final Poll Average (Clinton Lead)	Actual Result	Polling Error
Wisconsin	Clinton +6.5	Trump +0.7	7.2 points
Michigan	Clinton +3.4	Trump +0.3	3.7 points
Pennsylvania	Clinton +1.9	Trump +0.7	2.6 points

A 7.2-point error in Wisconsin is far beyond the typical margin of error of ±3-4 points. Let's test it. If the true proportion supporting Clinton in Wisconsin was 0.464 (based on the actual result), and the poll of 800 likely voters showed $\hat{p} = 0.536$:

$$SE = \sqrt{\frac{0.464 \times 0.536}{800}} = \sqrt{0.000311} = 0.0176$$

$$z = \frac{0.536 - 0.464}{0.0176} = \frac{0.072}{0.0176} = 4.09$$

A $z$-score of 4.09 is way out in the tail — $P(Z \geq 4.09) \approx 0.00002$. This kind of error does not happen from random sampling variation alone. Something systematic was going on.

What Went Wrong: Bias, Not Variance

The key lesson from 2016 — and it connects directly to what you've learned in this chapter and in Chapter 4 — is the distinction between random error (which the margin of error measures) and systematic error (bias, which it does not).

Non-College-Educated White Voters Were Underrepresented

Post-election analyses by the American Association for Public Opinion Research (AAPOR) identified the primary culprit: non-college-educated white voters — a demographic that strongly favored Trump — were underrepresented in many state polls.

Why? Several factors:

Nonresponse bias: People without college degrees were less likely to agree to participate in polls. The people who answered the phone weren't representative of the people who didn't.
Inadequate weighting: Many polls weighted their samples by age, gender, and race, but not by education level. This meant that the college-educated respondents who did answer were effectively standing in for all white voters — including the non-college-educated voters who had very different political preferences.
Likely voter models: Some polls used "likely voter" screens that systematically excluded irregular voters who were mobilized by Trump's candidacy.

The Margin of Error Couldn't Detect This

This is the critical point for proportion inference: the margin of error formula $z^* \sqrt{\hat{p}(1-\hat{p})/n}$ assumes a random sample. If the sample is biased, the margin of error understates the true uncertainty.

A poll might report "Clinton leads 53.6% to 40.4% in Wisconsin, margin of error ±3.5%." The 95% CI would be (50.1%, 57.1%) for Clinton — entirely above 50%. It looks like a slam dunk.

But the margin of error only covers random sampling error. The systematic bias — the underrepresentation of Trump voters — was an additional, unmeasured source of error that the formula couldn't capture.

Connection to Chapter 4: Remember the Literary Digest poll of 1936? The magazine surveyed 2.4 million people and predicted Alf Landon would defeat Franklin Roosevelt in a landslide. They were spectacularly wrong because their sample was biased toward wealthier Americans. The margin of error on a sample of 2.4 million is tiny (±0.06%), but the bias was enormous. History rhymed in 2016.

The 2020 Correction — And New Problems

After 2016, pollsters made adjustments. They added education weighting. They expanded their sampling frames. They acknowledged the limitations of phone-based polling in an era when many people don't answer calls from unknown numbers.

But 2020 brought a new challenge: the COVID-19 pandemic. People who followed public health guidelines (mask-wearing, social distancing) were more likely to be at home, more likely to answer polling calls, and more likely to support Biden. Trump supporters, who were more likely to minimize the pandemic, were harder to reach.

The result: polls overestimated Biden's lead by an average of 3.9 points in the battleground states — a larger error than 2016.

Year	Average Battleground State Error	Direction
2012	2.3 points	Mixed
2016	3.3 points	Underestimated Trump
2020	3.9 points	Underestimated Trump

The pattern suggests a systematic issue with reaching certain voter populations, not just random variation.

What This Means for Proportion Inference

Lesson 1: The Margin of Error Is a Lower Bound on Uncertainty

The reported margin of error measures the minimum uncertainty — the part due to random sampling. The actual uncertainty is always at least that large, and often larger, because of bias, measurement error, and other systematic factors.

When you construct a confidence interval using $\hat{p} \pm z^* \sqrt{\hat{p}(1-\hat{p})/n}$, you're computing the uncertainty due to random sampling. If your sample isn't truly random, the actual uncertainty is larger.

Lesson 2: Probabilities Are Not Predictions

FiveThirtyEight gave Trump a 29% chance of winning. That doesn't mean they predicted Clinton would win — it means they said there was roughly a 1-in-3 chance Trump would win.

If a weather forecast says there's a 30% chance of rain, and it rains, the forecast wasn't "wrong." A 30% event happens 30% of the time. The problem was that many people (and many news outlets) interpreted "71% chance of Clinton winning" as "Clinton will definitely win."

This connects to the probability-as-long-run-frequency concept from Chapter 8. A 71% probability means that in many hypothetical re-runs of the election under similar conditions, Clinton would win about 71% of the time and Trump about 29%. The actual election was one realization from that distribution — and it fell in the 29% tail.

Lesson 3: Correlated Errors Across States

Most polling models assume that polling errors in different states are partially independent. But in 2016, the errors were correlated — the polls underestimated Trump support in Wisconsin, Michigan, Pennsylvania, Ohio, Iowa, and several other states simultaneously.

When errors are correlated, the probability of a surprise outcome is much higher than any single poll's margin of error would suggest. FiveThirtyEight's model partially accounted for this correlation (which is one reason their forecast was more cautious at 71% rather than 98%). Models that assumed independence were the ones that gave Clinton 95%+ probabilities.

For Your Analysis

When you're conducting proportion inference — whether on polling data, survey results, or any other sample — ask yourself these questions:

Is the sample truly random? If not, the margin of error formula understates the true uncertainty.
Who might be missing from the sample? Nonresponse bias is the most common form of bias in surveys.
Am I reporting the margin of error, or am I reporting a forecast? These are different things. A margin of error says "the true proportion is probably within this range." A forecast says "given this range, here's the probability of various outcomes."
Am I treating a probability as a certainty? A 71% probability is not a prediction. Treat it as what it is — an acknowledgment of uncertainty, not a statement of certainty.

Theme 4 Connection: Uncertainty Is Not Failure

The lesson of 2016 and 2020 is not that polls are useless — it's that uncertainty is real, and we need to take it seriously. A poll that says "52% ± 3%" is telling you something valuable: the true support level is probably between 49% and 55%. If you interpret that as "52% will definitely win," you're discarding the very information the poll is trying to give you.

The margin of error is not a failure of the polling method. It's the poll being honest about what it doesn't know. The failure in 2016 wasn't the margin of error — it was people ignoring it.

Discussion Questions

After 2016, some commentators argued that polls should be abolished because they're "always wrong." Based on your understanding of proportion inference, evaluate this argument.
FiveThirtyEight gave Trump a 29% chance of winning. HuffPost gave him a 2% chance. Both used polling data. Why were their forecasts so different? (Hint: the difference was in how they modeled correlated errors and uncertainty.)
If you were designing a poll for the 2028 presidential election, what changes would you make based on the lessons of 2016 and 2020? Consider sampling methods, weighting, and how you would communicate the results.
A politician says: "The polls had me down 5 points, and I won by 2 — the polls were completely wrong." Is a 7-point error "completely wrong" in the context of what you know about margins of error and bias? How would you respond?