Exercises: Why Statistics Matters (and Why You Might Actually Enjoy This)

These exercises progress from concept checks to challenging applications. Estimated completion time: 1.5 hours.

Difficulty Guide: - ⭐ Foundational (5-10 min each) - ⭐⭐ Intermediate (10-20 min each) - ⭐⭐⭐ Challenging (20-40 min each) - ⭐⭐⭐⭐ Advanced/Research (40+ min each)


Part A: Conceptual Understanding ⭐

A.1. Explain the difference between descriptive and inferential statistics in your own words. Give one example of each that is NOT from the textbook.

A.2. A university reports that the average SAT score of its incoming freshman class is 1180. Is this descriptive or inferential statistics? What if the university uses this number to claim that "students at our university are above the national average"?

A.3. A classmate says, "Statistics is just math with calculators." Explain why this is an incomplete (and somewhat misleading) description of what statistics actually is.

A.4. List the four pillars of a statistical investigation. For each pillar, give an example of what could go wrong if that pillar is weak or missing.

A.5. Why is the phrase "decisions under uncertainty" central to the definition of statistics? Give an example of a decision you made today that involved uncertainty.

A.6. A sports commentator says, "This team wins 80% of their games when they score first." Identify: What is the population? What is the sample (if any)? Is this descriptive or inferential?

A.7. The textbook defines statistical thinking as "seeing variation, uncertainty, and randomness not as obstacles to understanding but as the raw material of understanding." In your own words, what does this mean? Why is this a different way of thinking than most people are used to?


Part B: Applied Analysis ⭐⭐

B.1. For each of the following, classify as descriptive or inferential statistics. Then explain your reasoning.

a) A poll of 1,500 registered voters finds that 54% support a ballot measure. The pollster reports, "A majority of voters in this state support the measure."

b) A teacher calculates the class average on the midterm exam: 78%.

c) A pharmaceutical company tests a new drug on 5,000 patients and concludes that it reduces symptoms in the general population.

d) The U.S. Census Bureau reports that the median household income in 2024 was $80,610.

e) An insurance company uses data from 10,000 past claims to set premium rates for next year's customers.

B.2. Read the following headline: "Study Finds That People Who Eat Dark Chocolate Have Lower Blood Pressure." A friend sees this and says, "Great, I'm going to eat dark chocolate every day to lower my blood pressure."

a) What statistical distinction is your friend failing to make? b) What additional information would you want before accepting this headline's implied claim? c) What alternative explanations might exist for the observed association?

B.3. Consider Alex Rivera's challenge at StreamVibe (from Section 1.5). Why can't Alex simply compare watch time this month to watch time last month to evaluate the new recommendation algorithm? What else might have changed? What would a better approach look like?

B.4. Sam Okafor's basketball player, Daria, went from shooting 31% on 180 attempts to 38% on 65 attempts. Calculate the actual number of three-pointers made in each season. Without doing formal statistics yet, do you think the improvement is "real" or could it be random fluctuation? Justify your reasoning.

B.5. For each of the four anchor examples (Maya, Alex, James, Sam), identify which of the four pillars of statistical investigation they are currently working on. What would the next pillar in their process be?


Part C: Statistical Literacy in the Wild ⭐⭐-⭐⭐⭐

C.1. Find a news article from the past week that references a statistic or study. Copy the relevant sentence or paragraph and answer: a) What claim is being made? b) Is this descriptive or inferential? c) What is the population? What is the sample? d) On a scale of 1-5, how confident are you in this claim? Why?

C.2. Open any social media platform (or website with reviews). Find a product, restaurant, or app with: - A rating based on fewer than 10 reviews - A rating based on more than 500 reviews

Compare them. Which rating do you trust more? Write a paragraph explaining your reasoning in terms of sample size and reliability. (You'll formalize this intuition in Chapters 11-13.)

C.3. Find an advertisement or marketing claim that uses a statistic (e.g., "9 out of 10 dentists recommend..."). What questions should a statistically literate person ask about this claim? List at least 4 specific questions.


Part D: Synthesis & Critical Thinking ⭐⭐⭐

D.1. The textbook describes a 2018 study that found a healthcare algorithm systematically underestimated the health needs of Black patients. This happened because the algorithm used healthcare spending as a proxy for health needs.

a) Why is this an example of a statistical assumption leading to real-world harm? b) What alternative variable could the algorithm designers have used instead of spending? What challenges might that alternative create? c) How does this example illustrate the theme "the human stories behind the data"?

D.2. Make an argument for why statistical literacy is more important now than at any previous point in human history. Your argument should reference at least three specific examples of how data-driven systems affect people's lives today.

D.3. A critic argues: "Statistics can be manipulated to prove anything. That makes it untrustworthy." Write a 2-3 paragraph response that acknowledges the legitimate concern while defending the value of statistical thinking.


Part M: Mixed Practice (Interleaved) ⭐⭐

Since this is Chapter 1, these problems preview the kind of statistical thinking we'll develop throughout the course.

M.1. A school district reports that average math scores improved by 5 points this year. The superintendent claims this proves the new curriculum is working. What questions would you ask before accepting this conclusion?

M.2. An online dating website advertises: "Couples who meet on our platform are 20% less likely to divorce." Assuming this statistic is accurately calculated, what are at least two reasons why this might NOT mean the website causes better marriages?

M.3. Your doctor tells you that a screening test for a disease is "95% accurate." Does this mean that if you test positive, there's a 95% chance you have the disease? (Hint: the answer is no, and we'll learn why in Chapter 9. For now, just explain why this feels like a tricky claim.)

M.4. A friend shows you a graph where ice cream sales and drowning deaths both increase in June and July. "Ice cream causes drowning!" they joke. But seriously: why do these two variables move together? What concept from Section 1.3 explains this?


Part E: Research & Extension ⭐⭐⭐⭐

E.1. Research the "replication crisis" in psychology (or your major field). Write a 1-page summary that answers: a) What is the replication crisis? b) How does it relate to the concepts of descriptive and inferential statistics? c) What statistical practices contributed to the problem? d) What changes have been proposed to address it?

E.2. Choose one of the four anchor examples (Maya, Alex, James, or Sam) and research a real-world parallel. Find a real study, dataset, or controversy that mirrors their scenario. Write a paragraph connecting the real-world example to the concepts in this chapter.


Solutions

Selected solutions in appendices/answers-to-selected.md.