Quiz: Lies, Damn Lies, and Statistics: Ethical Data Practice

Contributors

Quiz: Lies, Damn Lies, and Statistics: Ethical Data Practice

Test your understanding of Simpson's paradox, the ecological fallacy, questionable research practices, data privacy, ethical frameworks, and responsible data practice. Try to answer each question before revealing the answer.

1. Simpson's paradox occurs when:

(a) Two variables are perfectly correlated

(b) A trend in aggregated data reverses when the data is broken into subgroups

(c) A p-value is below 0.05 but the effect size is small

(d) An outlier distorts the mean of a dataset

Answer

**(b) A trend in aggregated data reverses when the data is broken into subgroups.** Simpson's paradox is the phenomenon in which a statistical trend that appears in combined data disappears or reverses when the data is separated into meaningful subgroups. It occurs because a lurking variable (confounding factor) is unevenly distributed across the groups being compared. The UC Berkeley admissions case is the classic example: women had a lower overall admission rate but were admitted at equal or higher rates in most individual departments.

2. In the UC Berkeley admissions case, the confounding variable was:

(a) Gender of the applicants

(b) Overall university admission rate

(c) The department to which applicants applied

(d) The number of applicants in each year

Answer

**(c) The department to which applicants applied.** Women disproportionately applied to more competitive departments (with low admission rates for everyone), while men disproportionately applied to less competitive departments (with high admission rates). This uneven distribution of applications across departments created the appearance of overall discrimination when none existed within most departments.

3. The ecological fallacy is the error of:

(a) Drawing conclusions about groups based on individual data

(b) Drawing conclusions about individuals based on group-level data

(c) Ignoring environmental factors in statistical analysis

(d) Using aggregate data instead of individual data

Answer

**(b) Drawing conclusions about individuals based on group-level data.** The ecological fallacy occurs when you observe a pattern at the group level (e.g., states with more immigrants have higher literacy rates) and assume it applies to individuals within those groups (immigrants are more literate). Group-level correlations can be driven by confounding variables and may not reflect individual-level relationships at all.

4. Which of the following is an example of the ecological fallacy?

(a) "Countries with higher chocolate consumption have more Nobel laureates, so eating chocolate makes you smarter."

(b) "The average income in this zip code is high, so everyone who lives there must be wealthy."

(c) "This state voted Republican and has high obesity rates, so Republican voters must be obese."

(d) All of the above

Answer

**(d) All of the above.** Each example draws conclusions about individuals from aggregate data. (a) applies country-level data to individual behavior. (b) applies zip-code-level data to individual residents. (c) applies state-level data to individual voters. In each case, the group-level pattern may not hold for individuals within those groups because of confounding variables and within-group variation.

5. P-hacking refers to:

(a) Fabricating data to achieve a desired result

(b) Trying multiple analyses until finding a statistically significant result

(c) Using an incorrect formula for the p-value

(d) Reporting a p-value that is smaller than the actual value

Answer

**(b) Trying multiple analyses until finding a statistically significant result.** P-hacking involves exploiting researcher degrees of freedom — trying different subgroups, variable definitions, outlier rules, or statistical tests — until a p < 0.05 result appears. Unlike data fabrication (which is fraud), p-hacking can be unintentional and often results from exploring data without a pre-specified analysis plan. The primary fix is pre-registration.

6. HARKing stands for:

(a) Hypothesizing And Reviewing Known findings

(b) Hypothesizing After Results are Known

(c) Having All Results Keep significance

(d) Hiding Actual Research Knowledge

Answer

**(b) Hypothesizing After Results are Known.** HARKing is the practice of presenting a post-hoc discovery — something found during exploratory analysis — as if it were a hypothesis formulated before the data was collected. This makes the result appear more convincing than it actually is, because a pre-planned test has more evidential value than a post-hoc one.

7. A researcher runs a study with 20 dependent variables and finds one significant result at p = 0.03. Without adjusting for multiple comparisons, this finding is:

(a) Definitely a true effect

(b) Likely a false positive — expected by chance alone

(c) Only valid if the effect size is large

(d) Valid because p < 0.05

Answer

**(b) Likely a false positive — expected by chance alone.** With 20 independent tests at alpha = 0.05, the expected number of false positives is 20 × 0.05 = 1. Finding exactly one significant result out of 20 is precisely what we'd expect under the null hypothesis. As we discussed in Chapters 13 and 17, running multiple tests without adjusting the significance threshold (e.g., Bonferroni correction) inflates the family-wise error rate to 1 - (0.95)^20 = 64%.

8. Which of the following is the best example of cherry-picking?

(a) Reporting a 95% confidence interval instead of a 99% interval

(b) Choosing a 15-game window where a player shot 42% from three, ignoring the full-season 33% average

(c) Using the median instead of the mean for a skewed distribution

(d) Rounding 4.7% to 5% for clearer communication

Answer

**(b) Choosing a 15-game window where a player shot 42% from three, ignoring the full-season 33% average.** Cherry-picking is selecting data, time ranges, or subgroups that support your conclusion while ignoring the broader picture. The 15-game window gives a misleading impression of the player's shooting ability. Options (a) and (c) are defensible methodological choices, and (d) is a minor rounding issue — none of these selectively omit contradictory evidence.

9. The Belmont Report (1979) established three core principles for research ethics. Which of the following is NOT one of them?

(a) Respect for persons

(b) Statistical significance

(c) Beneficence

(d) Justice

Answer

**(b) Statistical significance.** The three Belmont principles are Respect for Persons (treat individuals as autonomous agents), Beneficence (maximize benefits and minimize harms), and Justice (distribute the burdens and benefits of research fairly). Statistical significance is a methodological concept, not an ethical principle — though as this chapter argues, the misuse of statistical significance has significant ethical implications.

10. The Facebook emotional contagion study (2014) was ethically problematic primarily because:

(a) The effect size was too small to be meaningful

(b) Users were not informed they were in an experiment and could not consent or opt out

(c) The researchers used an incorrect statistical test

(d) The study was not published in a peer-reviewed journal

Answer

**(b) Users were not informed they were in an experiment and could not consent or opt out.** The core ethical violation was the lack of meaningful informed consent. Facebook manipulated the emotional content of 689,003 users' News Feeds without their knowledge. While Facebook's Terms of Service mentioned "research," this does not constitute the kind of specific, voluntary, informed consent required by research ethics standards. The study was published in PNAS, and the statistical analysis was technically sound — the problems were ethical, not methodological.

11. A dataset has names removed but retains date of birth, zip code, and gender. According to Latanya Sweeney's research, approximately what percentage of the U.S. population can be uniquely identified by these three fields?

(a) 15%

(b) 45%

(c) 67%

(d) 87%

Answer

**(d) 87%.** Sweeney's landmark 1997 research demonstrated that 87% of the U.S. population can be uniquely identified using just three fields: date of birth, five-digit zip code, and gender. She famously re-identified Massachusetts Governor William Weld's medical records from an "anonymized" hospital dataset by linking these three variables to publicly available voter registration data. This finding fundamentally changed how we think about data anonymization.

12. Under GDPR, an organization that violates data privacy regulations can face penalties of up to:

(a) $10,000 per violation

(b) $1 million total

(c) 4% of global annual revenue

(d) 10% of domestic revenue

Answer

**(c) 4% of global annual revenue.** GDPR penalties can reach up to 4% of an organization's global annual revenue or 20 million euros, whichever is higher. This is significantly more severe than CCPA penalties (up to $7,500 per intentional violation). The magnitude of GDPR penalties reflects the EU's position that data privacy is a fundamental right, not merely a regulatory compliance issue.

13. Which ethical framework focuses on producing the greatest good for the greatest number?

(a) Rights-based (deontological) ethics

(b) Care ethics

(c) Utilitarian ethics

(d) Virtue ethics

Answer

**(c) Utilitarian ethics.** Utilitarianism, associated with philosophers Jeremy Bentham and John Stuart Mill, evaluates actions based on their consequences — specifically, whether they maximize overall well-being. In data ethics, a utilitarian approach might support using an imperfect algorithm if it produces better outcomes overall than the alternative, even if some individuals bear disproportionate costs.

14. A rights-based ethicist would most likely object to predictive policing algorithms because:

(a) The algorithms are not accurate enough

(b) The algorithms violate individuals' right to be judged as individuals, not as members of statistical groups

(c) The algorithms are too expensive to implement

(d) The algorithms don't produce statistically significant predictions

Answer

**(b) The algorithms violate individuals' right to be judged as individuals, not as members of statistical groups.** Rights-based (deontological) ethics holds that certain rights are inviolable regardless of consequences. The right to individual assessment means that a person should not be detained because *people with similar characteristics* have a statistical tendency to reoffend. Even if the algorithm improves overall prediction accuracy, a rights-based ethicist would argue that it violates the dignity of individuals who are judged by group membership rather than personal behavior.

15. Which of the following practices is the most effective solution to both p-hacking and HARKing?

(a) Using a smaller significance level (alpha = 0.01)

(b) Pre-registering the analysis plan before data collection

(c) Running more statistical tests to confirm the result

(d) Reporting only statistically significant results

Answer

**(b) Pre-registering the analysis plan before data collection.** Pre-registration requires researchers to publicly commit to their hypotheses, analysis methods, and primary outcomes before collecting (or analyzing) data. This prevents p-hacking (because the analysis plan is fixed) and HARKing (because the hypotheses are documented before results are known). Option (a) reduces false positives but doesn't prevent the underlying practices. Option (c) compounds the problem. Option (d) is publication bias — the opposite of a solution.

16. A hospital compares recovery rates for two treatments. Treatment A has a higher overall recovery rate, but Treatment B has a higher recovery rate in every age group. This is an example of:

(a) The ecological fallacy

(b) Cherry-picking

(c) Simpson's paradox

(d) HARKing

Answer

**(c) Simpson's paradox.** The reversal of the trend (Treatment A better overall, Treatment B better in every subgroup) is the defining feature of Simpson's paradox. The confounding variable is age — if Treatment A is disproportionately used on younger, healthier patients (who recover regardless of treatment), while Treatment B is used on older, sicker patients, the aggregate data creates a misleading picture.

17. "Correlation vs. causation is an ethical imperative" means:

(a) It is unethical to compute correlations

(b) Presenting correlations as causal claims can lead to harmful decisions about real people

(c) Only causal studies should be published

(d) Correlations are always misleading

Answer

**(b) Presenting correlations as causal claims can lead to harmful decisions about real people.** Correlations are valuable statistical tools. The ethical imperative is to be honest about what they do and don't prove. When a policymaker says "poverty causes poor health" based on a cross-sectional correlation, they may implement policies that blame individuals rather than addressing systemic causes. When a news headline claims "violent video games cause aggression" based on r = 0.15, school boards may ban games rather than address the actual factors affecting student well-being.

18. The Tuskegee Syphilis Study (1932-1972) led directly to the creation of:

(a) The p-value

(b) The Central Limit Theorem

(c) The Belmont Report and modern IRB requirements

(d) GDPR

Answer

**(c) The Belmont Report and modern IRB requirements.** The exposure of the Tuskegee study in 1972 led to the National Research Act of 1974, which created the National Commission for the Protection of Human Subjects. That commission produced the Belmont Report (1979), establishing the ethical principles of Respect for Persons, Beneficence, and Justice. The subsequent Common Rule (1981) mandated IRB review for all federally funded human subjects research.

19. The replication crisis refers to:

(a) The inability to copy datasets between computers

(b) The finding that many published scientific results cannot be reproduced by other researchers

(c) The high cost of running multiple clinical trials

(d) The difficulty of writing clear statistical reports

Answer

**(b) The finding that many published scientific results cannot be reproduced by other researchers.** The Open Science Collaboration's 2015 attempt to replicate 100 psychology experiments found that only 36% of replications achieved statistical significance. Similar replication rates have been found in cancer biology and other fields. The crisis is driven by underpowered studies, publication bias, p-hacking, and binary threshold thinking — factors we discussed in Chapters 13, 17, and 27.

20. You discover a significant correlation in your data during exploratory analysis. The most ethical way to report this finding is:

(a) Present it as your primary hypothesis

(b) Delete the analysis and pretend you never saw it

(c) Label it as an exploratory finding that needs confirmatory replication

(d) Run additional tests until it becomes non-significant

Answer

**(c) Label it as an exploratory finding that needs confirmatory replication.** Exploratory analysis is a legitimate and important part of science — it generates hypotheses. The ethical violation occurs when exploratory findings are presented as confirmatory (HARKing). The correct approach is to report the finding honestly as exploratory, acknowledge that it was not predicted in advance, and recommend that it be tested in a new, pre-registered study. This gives the finding appropriate weight: interesting but preliminary.