Case Study: The 1936 Literary Digest Poll Disaster — Sampling Bias in Action

Contributors

Case Study: The 1936 Literary Digest Poll Disaster — Sampling Bias in Action

The Setup

It's the summer of 1936. The Great Depression is grinding into its seventh year. Unemployment hovers around 17%. Bread lines stretch around city blocks. And the biggest presidential election in a generation is approaching.

On one side: Franklin Delano Roosevelt, the incumbent president, architect of the New Deal — a sweeping set of government programs designed to pull America out of economic catastrophe. On the other side: Alf Landon, the Republican governor of Kansas, promising fiscal responsibility and a return to free-market principles.

The country is divided. The stakes are enormous. And The Literary Digest — one of America's most respected magazines — is about to make the prediction that will destroy its reputation forever.

The "Most Ambitious Poll in History"

The Literary Digest had a track record. The magazine had correctly predicted the winner of every presidential election since 1916 using its reader mail-in poll. Five elections in a row. Their method seemed bulletproof: send out millions of postcards, count the responses, predict the winner.

For the 1936 election, they went bigger than ever. They mailed survey postcards to approximately 10 million Americans — about one in every twelve adults in the country. The mailing list was drawn from three sources:

Telephone directories
Automobile registration records
Club membership lists and the magazine's own subscriber list

The response was massive: 2.4 million postcards were returned. It was, by far, the largest political survey ever conducted. The editors were confident.

On October 31, 1936, five days before the election, The Literary Digest published its prediction:

Alf Landon: 57% Franklin Roosevelt: 43%

Landon would win in a landslide, they declared. The magazine staked its century-long reputation on the prediction.

The Result

On November 3, 1936, Americans went to the polls. When the votes were counted:

Franklin Roosevelt: 62% Alf Landon: 38%

Roosevelt didn't just win. He won one of the largest landslides in American history, carrying 46 of 48 states. The Literary Digest's prediction was off by 19 percentage points — one of the worst polling failures in recorded history.

The magazine's credibility was destroyed. It folded in 1938.

What Went Wrong: The Anatomy of a Sampling Disaster

Problem 1: Selection Bias in the Sampling Frame

Remember where the mailing list came from: telephone directories, automobile registrations, and club memberships. In 1936, during the depths of the Depression, these were not representative of the American electorate.

Telephones: Only about 35% of American households had telephones in 1936. Phone ownership was concentrated among the middle and upper classes. Lower-income Americans — who were hit hardest by the Depression and most likely to support Roosevelt's New Deal programs — were systematically excluded.
Automobiles: Car ownership was even more skewed toward the wealthy. Families struggling to put food on the table were not buying cars.
Club memberships: These further biased the sample toward socially connected, higher-income Americans.

The 10 million people who received postcards were systematically wealthier than the average American voter. And wealthier Americans were more likely to support Landon and his anti-New Deal platform. The sampling frame was biased before a single postcard was opened.

Problem 2: Nonresponse Bias

Of the 10 million postcards mailed, only 2.4 million were returned — a 24% response rate. The 7.6 million people who didn't respond were not randomly distributed. Research suggests that:

People with stronger political opinions were more likely to respond
Landon supporters, motivated by opposition to the incumbent, were more likely to take the time to fill out and mail the postcard
Roosevelt supporters, perhaps feeling confident their candidate would win, may have been less motivated to respond

This pattern — where the people who respond to a survey are systematically different from those who don't — is nonresponse bias. It amplified the selection bias already present in the sampling frame.

The Combined Effect

Selection bias and nonresponse bias compounded each other:

Factor	Direction of Bias
Phone owners overrepresented	Shifted sample toward wealthier, anti-Roosevelt voters
Car owners overrepresented	Same direction
Club members overrepresented	Same direction
Higher response rate among Landon supporters	Same direction
Lower-income voters underrepresented	Missed the group most supportive of Roosevelt

Every source of bias pushed in the same direction — toward Landon. The result wasn't just wrong; it was wrong in a predictable, systematic way.

The Gallup Counterpoint

While The Literary Digest was mailing 10 million postcards, a young statistician named George Gallup was doing something different. Using scientific sampling methods — randomly selecting respondents from a representative cross-section of the population — he surveyed approximately 50,000 people.

His prediction: Roosevelt would win.

He was right. His sample was 48 times smaller than the Literary Digest's, but it was unbiased. Gallup had identified the Literary Digest's methodology as flawed before the election, and he publicly predicted that their poll would be wrong — and by roughly how much.

The 1936 election made George Gallup famous and launched the modern polling industry. It also established one of the most important principles in statistics:

A representative sample of thousands beats a biased sample of millions.

Connecting to Modern Data: Where Do We Still See This?

The Literary Digest disaster happened almost 90 years ago, but its lessons are more relevant than ever.

Online Polls and Viral Surveys

Every time a website, social media account, or news outlet runs a non-scientific poll, the same biases are at work. A Twitter/X poll about political preferences only captures the views of people who use that platform and follow the account that posted the poll and feel motivated enough to click a button. That's not a sample — it's a self-selected audience.

Product Reviews and Rating Systems

When you look at product reviews on Amazon or Yelp, you're seeing a voluntary response sample. People who had very strong experiences — either very positive or very negative — are more likely to write reviews. The "silent majority" of customers who had average experiences rarely bother. That 4.2-star average might not represent the typical customer's experience at all.

AI Training Data

Machine learning models are trained on data that's available — which is almost always a convenience sample. Large language models are trained on internet text, which overrepresents certain demographics, languages, and viewpoints. Image recognition models trained primarily on photos from Western countries perform worse on images from other parts of the world. These are modern versions of the Literary Digest problem: the data seems comprehensive (it's huge!), but it systematically misses important parts of the population.

Analysis Questions

1. The Fundamental Error: The Literary Digest editors believed that a larger sample was always better. What principle from this chapter directly contradicts that assumption?

Suggested Answer

**Sample quality matters more than sample size.** Bias is a systematic error that doesn't decrease with more observations. A biased sample of 2.4 million produces a more precise *wrong* answer, not a better answer. The *Literary Digest* editors confused precision with accuracy — their large sample gave very precise estimates of what biased respondents thought, but those estimates were systematically off from the actual population.

2. Counterfactual Thinking: If the Literary Digest had used telephone directories but achieved a 100% response rate (all 10 million recipients responded), would their prediction have been accurate? Why or why not?

Suggested Answer

Probably not. Even with a 100% response rate, the sampling frame itself was biased — it excluded the approximately 65% of Americans without telephones. These non-phone-owning Americans were disproportionately lower-income and pro-Roosevelt. Eliminating nonresponse bias would have helped, but selection bias would have remained. The prediction would likely have been closer but still wrong in the same direction.

3. Modern Parallels: In 2016, many political polls underestimated support for Donald Trump. Without looking it up, hypothesize which of the biases discussed in this case study might have contributed to that polling error. Then research the actual explanations.

Suggested Answer

Several biases likely contributed. **Nonresponse bias:** Trump supporters may have been less likely to respond to polls (sometimes called the "shy voter" effect, though this is debated). **Selection bias:** Some polls relied on models of "likely voters" that didn't anticipate the surge of new or infrequent voters Trump brought to the polls. **Sampling frame issues:** Reaching a representative sample has become harder as people abandon landline phones, screen calls, and decline to respond to surveys. The underlying problem is the same as 1936: systematic differences between who responds to polls and who actually votes.

4. Constructive Design: If you were advising a polling firm in 1936, knowing what you know now, how would you design a poll to accurately predict the election outcome? Be specific about your sampling method and how you'd address the biases that doomed the Literary Digest.

Suggested Answer

Key design choices: - **Use stratified sampling** instead of convenience sampling. Divide the population by income, geography, urban/rural, and race/ethnicity, then randomly sample within each stratum. This ensures representation of low-income voters, rural voters, and other groups the *Literary Digest* missed. - **Sample from voter registration rolls** (not phone books), which include people regardless of wealth. Better yet, combine multiple sampling frames to improve coverage. - **Keep the sample manageable** (5,000-50,000, not millions) to allow for more intensive follow-up with non-respondents. - **Track and correct for nonresponse.** Monitor which demographic groups are responding and weight the results to match the known population distribution. - **Use in-person interviews** rather than mail-in postcards, to reach people who might not return a postcard. This is exactly what George Gallup did.

5. Ethics and Power: The Literary Digest poll wasn't just wrong — it had real consequences. Political campaigns adjust strategy based on polls. Voters may change behavior based on perceived momentum. Discuss: What responsibility do pollsters and media organizations have to ensure their methods are sound before publishing predictions?

Suggested Answer

This is an open-ended question with multiple valid perspectives. Key points to consider: - Publishing inaccurate poll results can influence voter behavior (people may not vote if they think their candidate will win or lose easily), affecting democratic outcomes. - Media organizations have a responsibility to report the methodology and limitations of polls, not just the headline numbers. Readers should know the sampling method, response rate, and margin of error. - There's a tension between speed (being first to publish) and accuracy (getting the methodology right). The *Literary Digest* prioritized the spectacle of their massive sample over the quality of their methodology. - Modern polling organizations (Gallup, Pew, FiveThirtyEight) typically publish detailed methodology alongside their results, allowing readers to evaluate the quality of the data. This transparency is itself an ethical practice.

Key Takeaways from This Case Study

Bias is directional. The Literary Digest poll wasn't randomly wrong — it was systematically wrong in one direction. Every source of bias pointed the same way, creating a massive, predictable error.
Sample size can create false confidence. The enormous response count (2.4 million) made the prediction seem authoritative. Size became a substitute for quality — and a dangerous one.
The sampling frame matters as much as the sampling method. Even if the Literary Digest had randomly selected from its mailing list, the results would have been biased because the list itself excluded lower-income Americans.
Nonresponse compounds selection bias. When non-respondents are systematically different from respondents, low response rates amplify whatever bias exists in the sampling frame.
The principles are timeless. The Literary Digest story is from 1936, but the same types of errors occur today in online polls, product reviews, and AI training data. Understanding these biases is not just a historical exercise — it's a survival skill for the information age.