Quiz: Your Statistical Journey Continues
This is the final quiz. It's cumulative — drawing from all 28 chapters — with an emphasis on integration, judgment, and the ability to connect concepts across the course. Try to answer each question before revealing the answer.
1. A researcher reports a statistically significant result with p = 0.001 and Cohen's d = 0.05. The best interpretation is:
(a) The effect is both real and important
(b) The effect is probably real but very small — statistically significant but not practically significant
(c) The p-value must be wrong because the effect size is too small
(d) The result should be ignored because d < 0.20
Answer
**(b) The effect is probably real but very small — statistically significant but not practically significant.** A very small p-value tells us the result is unlikely under the null hypothesis (Theme 4: uncertainty). A very small effect size (d = 0.05 is well below Cohen's "small" benchmark of 0.20) tells us the effect, while real, is tiny. This is common with very large samples, where even trivial differences become statistically significant (Chapter 17). This illustrates the critical distinction between statistical and practical significance.2. Which of the following study designs can support a causal claim?
(a) A cross-sectional survey with n = 50,000
(b) A longitudinal observational study following patients for 10 years
(c) A randomized controlled experiment with n = 200
(d) A multiple regression model with 15 control variables
Answer
**(c) A randomized controlled experiment with n = 200.** Only randomized experiments can support causal claims (Chapter 4, Theme 5). Randomization ensures that all potential confounders — both measured and unmeasured — are balanced between groups. Large observational studies, long follow-up periods, and statistical controls for confounders can strengthen associations, but they cannot eliminate the possibility of unmeasured confounders. Correlation does not imply causation, regardless of sample size or study duration.3. The Central Limit Theorem states that:
(a) All populations are normally distributed
(b) Large samples always produce significant results
(c) The sampling distribution of the sample mean is approximately normal for large samples, regardless of the population's shape
(d) The standard deviation decreases as sample size increases
Answer
**(c) The sampling distribution of the sample mean is approximately normal for large samples, regardless of the population's shape.** The CLT (Chapter 11) is the bridge from probability to inference. It tells us that x-bar is approximately normal with mean mu and standard deviation sigma/sqrt(n), even if the population is skewed, bimodal, or otherwise non-normal. This theorem makes confidence intervals and hypothesis tests possible for virtually any variable.4. Sam's analysis of Daria's shooting illustrates which important statistical concept?
(a) With a small sample, we may lack the power to detect a real effect
(b) Large effects are always statistically significant
(c) The null hypothesis is usually true
(d) Bootstrap methods are always more accurate than formula-based methods
Answer
**(a) With a small sample, we may lack the power to detect a real effect.** At n = 65, Sam had only 24% power to detect Daria's improvement (Chapter 17). The effect was real — it became significant at n = 258 — but the initial sample was too small to reliably detect it. This illustrates the relationship between sample size, power, and the ability to detect effects, and why "fail to reject" does not mean "the effect doesn't exist."5. A 95% confidence interval for a population mean is (42, 58). Which interpretation is correct?
(a) There is a 95% probability that the population mean is between 42 and 58
(b) 95% of the data falls between 42 and 58
(c) If we repeated this procedure many times, about 95% of the resulting intervals would contain the true population mean
(d) The sample mean is 50 with a 5% error rate
Answer
**(c) If we repeated this procedure many times, about 95% of the resulting intervals would contain the true population mean.** The confidence level describes the long-run reliability of the *method*, not the probability that any single interval contains the parameter (Chapter 12). The true mean is fixed — it's either in this interval or it's not. The 95% refers to the process: if we repeated the study with new random samples, 95% of the resulting CIs would capture the true mean.6. Simpson's paradox can occur when:
(a) The sample size is too small
(b) A confounding variable is unevenly distributed across comparison groups
(c) The data contains outliers
(d) The response variable is categorical
Answer
**(b) A confounding variable is unevenly distributed across comparison groups.** Simpson's paradox (Chapter 27) occurs when a trend in aggregated data reverses in subgroups because a lurking variable is distributed differently across the groups being compared. In the UC Berkeley example, women applied disproportionately to competitive departments, creating the appearance of overall discrimination that didn't exist within most departments.7. In Maya's regression model, the coefficient for proximity to the Henderson plant dropped from -4.8 to -2.7 when income, smoking, and healthcare access were added. This suggests:
(a) Proximity to the plant doesn't matter
(b) The original coefficient was partially confounded by those variables
(c) The new model is overfitting
(d) The p-value increased above 0.05
Answer
**(b) The original coefficient was partially confounded by those variables.** When additional predictors are added and a coefficient decreases, it means part of the original effect was explained by the newly added variables (Chapter 23). The relationship between plant proximity and asthma is partly confounded by income, smoking, and healthcare access — but the fact that a significant effect remains (-2.7, p < 0.001) suggests that proximity has an independent association with asthma after controlling for those factors. This illustrates both "holding other variables constant" and the distinction between association and causation (Theme 5).8. P-hacking is problematic because it:
(a) Always involves fabricating data
(b) Inflates the false positive rate above the nominal alpha level
(c) Reduces statistical power
(d) Makes confidence intervals too narrow
Answer
**(b) Inflates the false positive rate above the nominal alpha level.** P-hacking — running multiple analyses and reporting only the significant ones — dramatically inflates the probability of a false positive (Chapter 13, Chapter 17, Chapter 27). With 20 independent tests at alpha = 0.05, the probability of at least one false positive is 1 - (0.95)^20 = 64%. The solution is pre-registration: committing to your analysis plan before examining the data.9. Alex's A/B test at StreamVibe is more trustworthy than Maya's observational comparison because:
(a) Alex had a larger sample size
(b) Alex's result had a smaller p-value
(c) Alex used randomization, which balances both measured and unmeasured confounders
(d) Alex used a two-sample t-test instead of a z-test
Answer
**(c) Alex used randomization, which balances both measured and unmeasured confounders.** Randomization is the key to causal inference (Chapter 4, Chapter 16). Alex's A/B test randomly assigned users to the old or new algorithm, ensuring that any observed difference in watch time is attributable to the algorithm change rather than to differences between the groups. Maya's observational comparison cannot control for unmeasured confounders, regardless of sample size or statistical method (Theme 5).10. An AI diagnostic system reports 95% accuracy on a disease that affects 1% of the population. If the system flags a patient as positive, the probability that the patient actually has the disease (PPV) is:
(a) 95%
(b) About 16%
(c) About 50%
(d) Cannot be determined without knowing sensitivity and specificity separately
Answer
**(d) Cannot be determined without knowing sensitivity and specificity separately.** "95% accuracy" alone is ambiguous — it could mean many different things depending on the sensitivity (true positive rate) and specificity (true negative rate). If the test simply classified everyone as negative, it would be 95% accurate but completely useless. To calculate PPV, we need to apply Bayes' theorem (Chapter 9): PPV = P(disease | positive test) = sensitivity x prevalence / P(positive). This requires knowing sensitivity and specificity, not just overall accuracy (Chapter 26, Theme 3).11. Which of the following is the best way to report a statistical finding?
(a) "The result was significant (p < 0.05)."
(b) "The treatment group scored higher than the control group."
(c) "The treatment group scored 7.2 points higher (95% CI: 3.1 to 11.3, p = 0.002, d = 0.45)."
(d) "We reject the null hypothesis."
Answer
**(c) "The treatment group scored 7.2 points higher (95% CI: 3.1 to 11.3, p = 0.002, d = 0.45)."** This answer includes all the elements of responsible statistical reporting (Chapter 25): the direction and magnitude of the effect (7.2 points), the confidence interval (3.1 to 11.3, showing the range of plausible values), the p-value (0.002, showing statistical significance), and the effect size (d = 0.45, a small-to-medium effect). Options (a), (b), and (d) are all incomplete.12. The bootstrap method works by:
(a) Collecting new samples from the population
(b) Resampling with replacement from the observed data to approximate the sampling distribution
(c) Adjusting the p-value for multiple comparisons
(d) Assuming the population is normally distributed
Answer
**(b) Resampling with replacement from the observed data to approximate the sampling distribution.** The bootstrap (Chapter 18) treats your sample as a stand-in for the population and draws thousands of resamples (with replacement) from it. The distribution of the statistic across these resamples approximates the sampling distribution, allowing you to construct confidence intervals without making distributional assumptions. It's one of the most important innovations in modern statistics.13. Which of these is an example of the ecological fallacy?
(a) Concluding that because states with more ice cream sales have more drownings, ice cream causes drowning
(b) Concluding that because a state votes Republican, every person in that state is Republican
(c) Concluding that a drug is effective because p < 0.05
(d) Concluding that an algorithm is fair because its overall accuracy is 90%
Answer
**(b) Concluding that because a state votes Republican, every person in that state is Republican.** The ecological fallacy (Chapter 27) is the error of applying group-level statistics to individuals. A state that votes 55% Republican still has 45% of voters who didn't vote Republican. Option (a) is a correlation-causation error, option (c) is a misinterpretation of statistical significance, and option (d) is an aggregation bias problem related to Simpson's paradox (though it's close to the ecological fallacy in spirit).14. In the context of hypothesis testing, "fail to reject H_0" means:
(a) H_0 is true
(b) H_a is false
(c) The evidence is not strong enough to conclude H_0 is false at the chosen significance level
(d) The experiment failed
Answer
**(c) The evidence is not strong enough to conclude H_0 is false at the chosen significance level.** "Fail to reject" does not mean "accept" (Chapter 13). It means the data did not provide sufficient evidence against H_0. The null hypothesis might still be false — we might simply lack the sample size or power to detect the effect (Chapter 17). This is exactly what happened with Sam's initial analysis of Daria's shooting: p = 0.097 meant insufficient evidence, not that the improvement wasn't real.15. The data science pipeline, in the correct order, is:
(a) Model, collect, clean, visualize, communicate
(b) Ask, collect, clean, explore, model, evaluate, communicate
(c) Collect, model, test, publish
(d) Hypothesize, experiment, analyze, conclude
Answer
**(b) Ask, collect, clean, explore, model, evaluate, communicate.** The data science pipeline (Section 28.7) follows a logical sequence: (1) Ask a question, (2) Collect or obtain data, (3) Clean and wrangle, (4) Explore and visualize, (5) Model and analyze, (6) Evaluate and validate, (7) Communicate and act. Every step corresponds to skills built in this course, and the Data Detective Portfolio is a complete pipeline project.16. James's analysis found that the algorithm's false positive rate was 31.2% for Black defendants and 13.3% for white defendants. The two-proportion z-test gave z = -4.67, p < 0.001. Which theme does this most directly illustrate?
(a) Theme 1: Statistics as a superpower
(b) Theme 3: AI and algorithms use statistics
(c) Theme 4: Uncertainty is not failure
(d) Theme 6: Ethical data practice
Answer
**(d) Theme 6: Ethical data practice.** While this example touches multiple themes (Theme 2: these are real people affected; Theme 3: the algorithm is an applied statistical model; Theme 1: statistical tools revealed the disparity), the most direct lesson is about ethics. The algorithm's differential error rates raise fundamental questions about fairness, justice, and the ethical deployment of statistical models (Chapter 16, Chapter 26, Chapter 27). The fairness impossibility theorem shows that no algorithm can be fair in all senses simultaneously — making the ethical dimension inescapable.17. A researcher calculates the correlation between hours of study and exam scores and finds r = 0.65. Which statement is most accurate?
(a) Studying more causes higher exam scores
(b) There is a strong positive linear association; students who study more tend to score higher, but we cannot conclude causation
(c) 65% of the variation in exam scores is explained by study hours
(d) The regression equation will perfectly predict exam scores
Answer
**(b) There is a strong positive linear association; students who study more tend to score higher, but we cannot conclude causation.** Correlation quantifies the strength and direction of a linear relationship (Chapter 22) but does not establish causation (Theme 5). The data is likely observational, so confounders (motivation, prior knowledge, test anxiety) could explain part of the association. Option (c) confuses r with R-squared; r = 0.65 means R-squared = 0.42, so about 42% (not 65%) of the variation is explained. Option (d) is incorrect because R-squared < 1.18. Which of the following is the correct interpretation of a p-value of 0.03?
(a) There is a 3% probability that the null hypothesis is true
(b) There is a 3% probability that the result occurred by chance
(c) If the null hypothesis were true, there is a 3% probability of observing data as extreme as or more extreme than what was observed
(d) The effect size is 0.03
Answer
**(c) If the null hypothesis were true, there is a 3% probability of observing data as extreme as or more extreme than what was observed.** The p-value is P(data | H_0), not P(H_0 | data) (Chapter 13). Confusing these two is the prosecutor's fallacy (Chapter 9). Option (a) flips the conditional probability. Option (b) is vague and misleading. Option (d) confuses p-value with effect size. This distinction — between the probability of the data given the hypothesis and the probability of the hypothesis given the data — is one of the most important lessons in the entire course.19. Bayesian statistics differs from the frequentist approach used in most of this textbook primarily because:
(a) It doesn't use data
(b) It incorporates prior beliefs about parameters and updates them with observed data
(c) It doesn't use probability
(d) It can only be applied to large samples
Answer
**(b) It incorporates prior beliefs about parameters and updates them with observed data.** Bayesian statistics (previewed in Section 28.7) treats parameters as having probability distributions rather than fixed unknown values. It starts with a prior distribution reflecting existing knowledge, updates it using the likelihood of the observed data, and produces a posterior distribution. This is a direct extension of Bayes' theorem from Chapter 9, applied to continuous parameters instead of discrete events.20. Which statement best captures the closing message of this textbook?
(a) Statistics is about memorizing formulas and running tests
(b) Statistical significance is the ultimate goal of any analysis
(c) Statistics requires certainty and definitive answers
(d) Statistics requires curiosity, honesty, and the courage to let data change your mind