Exercises: Lies, Damn Lies, and Statistics: Ethical Data Practice

Contributors

Exercises: Lies, Damn Lies, and Statistics: Ethical Data Practice

These exercises progress from identifying ethical violations in data practice through applying ethical frameworks, analyzing real-world dilemmas, and developing a personal code of statistical ethics. Estimated completion time: 3 hours.

Difficulty Guide: - ⭐ Foundational (5-10 min each) - ⭐⭐ Intermediate (10-20 min each) - ⭐⭐⭐ Challenging (20-40 min each) - ⭐⭐⭐⭐ Advanced/Research (40+ min each)

Part A: Conceptual Understanding ⭐

A.1. In your own words, explain Simpson's paradox. Give an example (real or hypothetical) that is different from the ones in the chapter.

A.2. True or false (explain each):

(a) If women are admitted at a lower rate than men overall, there must be gender discrimination in at least one department.

(b) The ecological fallacy involves drawing conclusions about groups based on individual-level data.

(c) P-hacking always involves fabricating data.

(d) HARKing is the practice of forming hypotheses after seeing the results of an analysis.

(e) A study can be both technically well-designed and ethically indefensible.

(f) Removing someone's name from a dataset guarantees their privacy.

A.3. Explain the difference between cherry-picking and HARKing. How are they related?

A.4. What are the three principles of the Belmont Report? For each, give an example of how it applies to statistical research.

A.5. Explain why "the risk doubles" can be both technically true and deeply misleading. Under what circumstances would this statement be informative, and under what circumstances would it be deceptive?

A.6. What is the difference between exploratory and confirmatory analysis? Why is it ethically important to distinguish between them?

Part B: Identifying Ethical Violations ⭐

B.1. For each scenario, identify the ethical violation(s) and suggest how the situation should be handled:

(a) A pharmaceutical company runs three clinical trials for a new drug. Two show no significant effect. One shows a significant effect (p = 0.04). The company publishes only the third trial.

(b) A researcher hypothesizes that a new therapy reduces anxiety. The therapy doesn't reduce anxiety (p = 0.34), but it does reduce insomnia (p = 0.02). The researcher writes: "We hypothesized that the therapy would reduce insomnia."

(c) A school district reports: "Our students' test scores increased 15% this year." They do not mention that the district changed the test to an easier version.

(d) A tech company uses employees' browsing data to predict which ones are likely to quit, without telling employees their data is being analyzed.

B.2. A news article reports: "States that voted Republican have higher obesity rates than states that voted Democratic." A commentator concludes: "Republican voters are less healthy."

(a) What logical error is the commentator making?

(b) What is this error called?

(c) What additional data would you need to evaluate the commentator's claim?

B.3. A study compares a new drug to a placebo. The researchers check for significance after every 50 patients (at 50, 100, 150, 200) and stop the trial at 150 patients because p = 0.04.

(a) What is this practice called?

(b) Why does it inflate the false positive rate?

(c) What should the researchers have done instead?

B.4. A data analyst working for a political campaign presents only polls from the last two weeks — a period when their candidate was leading — while ignoring polls from the previous month. Is this cherry-picking? Why or why not?

B.5. For each claim, identify whether it involves cherry-picking, misleading denominators, survivorship bias, or the ecological fallacy:

(a) "90% of successful entrepreneurs dropped out of college." (from a study of billionaires)

(b) "Crime increased 200% in our city." (from 1 incident to 3 incidents)

(c) "Countries with higher chocolate consumption have more Nobel Prize winners per capita."

(d) "Our product has a 98% customer satisfaction rate." (from voluntary reviews)

Part C: Simpson's Paradox ⭐⭐

C.1. A hospital compares survival rates for two surgeons:

Surgeon	Patients	Survived	Rate
Dr. Kim	400	360	90%
Dr. Lee	400	340	85%

Dr. Kim looks better. But when broken down by case difficulty:

	Dr. Kim		Dr. Lee
	Patients	Survived	Patients	Survived
Easy cases	300	285 (95%)	100	96 (96%)
Hard cases	100	75 (75%)	300	244 (81.3%)

(a) Verify the aggregate numbers from the stratified data.

(b) Which surgeon is actually better? Why does the aggregate data favor Dr. Kim?

(c) If you were a patient with a difficult case, which surgeon would you choose based on this data?

(d) Identify the confounding variable and explain how it creates Simpson's paradox here.

C.2. A company tracks promotion rates by gender:

Gender	Promoted	Not Promoted	Rate
Men	180	320	36%
Women	120	380	24%

But when broken down by department:

Department	Men Promoted	Men Total	Men Rate	Women Promoted	Women Total	Women Rate
Engineering	150	400	37.5%	20	50	40%
Marketing	30	100	30%	100	450	22.2%

(a) Verify the aggregate totals.

(b) Does Simpson's paradox apply here? Explain.

(c) A lawsuit alleges gender discrimination based on the aggregate numbers. As an expert witness, what would you testify?

(d) Even if there's no discrimination within departments, could the overall system still be unfair? How?

C.3. Write Python code using the check_simpsons_paradox() function from Section 27.12 to analyze the hospital data from C.1. Interpret the output.

Part D: Ecological Fallacy ⭐⭐

D.1. A researcher finds that countries with higher average daily cheese consumption have higher life expectancies. She concludes that eating cheese helps you live longer.

(a) Explain why this conclusion is an example of the ecological fallacy.

(b) What confounding variables could explain the country-level correlation?

(c) What type of data would you need to actually test whether cheese consumption affects individual longevity?

D.2. A school administrator notes that schools with higher percentages of students receiving free lunch have lower average test scores. She proposes cutting the free lunch program because "it's clearly not helping."

(a) Identify the ecological fallacy in her reasoning.

(b) What alternative explanation is she missing?

(c) What data would you need to properly evaluate the free lunch program's effectiveness?

D.3. In your own words, explain why the ecological fallacy is both a statistical error and an ethical concern. Give an example where the ecological fallacy could lead to a harmful policy decision.

Part E: Character-Based Scenarios ⭐⭐

E.1. Maya is analyzing a dataset linking neighborhood environmental exposures to childhood asthma rates. She finds that three neighborhoods near an industrial facility have asthma rates 4× the county average.

(a) List three stakeholders who would be affected if Maya publishes this finding with neighborhood names.

(b) Apply the utilitarian framework: should she publish? What are the expected benefits and harms?

(c) Apply the rights-based framework: what rights are in tension?

(d) Apply the care ethics framework: who is most vulnerable, and what would best serve them?

(e) What would you recommend Maya do? Justify your recommendation using elements from at least two frameworks.

E.2. Alex's team at StreamVibe wants to run an A/B test that shows some users 30% more "outrage" content (content that generates strong negative emotions) to see if it increases engagement metrics.

(a) How is this similar to the Facebook emotional contagion study?

(b) What are the ethical concerns?

(c) Would your answer change if the test only showed less outrage content to the treatment group? Why or why not?

(d) Draft a set of ethical guidelines for A/B testing at StreamVibe (5-7 guidelines).

E.3. James discovers that the predictive policing algorithm's training data from 2010-2015 reflects a period when Riverside County had a policy of increased patrols in predominantly Black neighborhoods. This means the algorithm was trained on data that already reflected racially disproportionate policing.

(a) How does this affect the validity of the algorithm's risk scores?

(b) Is the algorithm "biased" if it accurately predicts re-arrest rates — even though those rates were inflated by differential policing?

(c) Propose two specific modifications that could reduce the algorithm's racial disparity while maintaining predictive accuracy.

E.4. Sam notices that during contract negotiations, the Raptors' front office presents players with cherry-picked statistics to justify lower salary offers. Meanwhile, players' agents present cherry-picked statistics to justify higher salaries.

(a) Is cherry-picking less unethical when "everyone does it"?

(b) As an analytics intern, Sam has been asked to prepare the front office's statistical report. What ethical obligations does Sam have?

(c) Write a brief guide (3-5 bullet points) for "ethical sports analytics in contract negotiations."

Part F: Questionable Research Practices ⭐⭐

F.1. A psychology researcher collects data on whether background music affects test performance. She measures performance on math, reading, vocabulary, and spatial reasoning. She tests each with a t-test.

(a) How many tests is she running? What is the probability of finding at least one "significant" result (p < 0.05) if music has no real effect?

(b) Is this p-hacking? Why or why not?

(c) What should she have done before collecting data?

(d) If she finds a significant effect for spatial reasoning only (p = 0.03), what should she report?

F.2. Classify each practice as: (A) acceptable, (B) questionable but not fraudulent, or (C) clearly unethical. Explain your reasoning.

(a) Removing three data points that are more than 3 standard deviations from the mean, as pre-specified in the analysis plan.

(b) Removing three data points that are more than 3 standard deviations from the mean, decided after seeing that they make the result non-significant.

(c) Reporting p = 0.052 as "marginally significant."

(d) Conducting both a t-test and a Mann-Whitney test and reporting whichever gives the lower p-value.

(e) Pre-registering a study and then reporting the results exactly as planned, even though the result is p = 0.72.

(f) Running a study on 200 participants, finding p = 0.08, and then collecting 100 more participants to "increase power."

F.3. Explain why pre-registration addresses p-hacking and HARKing but does not address publication bias. What additional reforms are needed to address publication bias?

Part G: Ethical Frameworks ⭐⭐⭐

G.1. A health insurance company wants to use customers' social media posts to predict health risks and adjust premiums. Customers with posts mentioning unhealthy behaviors (smoking, excessive drinking, sedentary lifestyle) would pay higher premiums.

(a) Evaluate this practice using the utilitarian framework.

(b) Evaluate it using the rights-based framework.

(c) Evaluate it using the care ethics framework.

(d) What is your overall assessment? Should the company proceed?

G.2. A university discovers that its predictive model for student success (used for advising and early intervention) performs less accurately for first-generation college students than for students whose parents attended college. The model helps identify at-risk students for tutoring and support.

(a) Should the university continue using the model? Apply all three ethical frameworks.

(b) Is it worse to use a biased model that helps some students or to use no model at all?

(c) What steps could the university take to address the disparity while still providing early intervention?

G.3. A city government wants to publish detailed crime statistics by neighborhood, including specific crime types, times, and locations. The data would help residents make safety decisions but could also stigmatize certain neighborhoods and affect property values.

(a) Identify all stakeholders.

(b) What level of geographic detail is appropriate? (City-wide? District? Block?)

(c) How does this relate to the ecological fallacy?

(d) Draft a policy for how the data should be published (3-5 specific guidelines).

Part H: Data Privacy ⭐⭐⭐

H.1. A researcher is given a dataset of 50,000 hospital records with the following fields removed: name, Social Security number, and address. The dataset still includes: date of birth, zip code, gender, diagnosis, treatment, and length of stay.

(a) Explain why this dataset is not truly anonymous. What attack could re-identify individuals?

(b) How many people in the United States can be uniquely identified by their date of birth, zip code, and gender? (Latanya Sweeney's research found that approximately 87% of the U.S. population is uniquely identified by these three fields.)

(c) What additional steps should the researcher take to protect privacy?

H.2. Compare GDPR and CCPA on the following dimensions:

(a) Consent: How does each law handle consent for data collection?

(b) Right to deletion: How does each law handle requests to delete personal data?

(c) Scope: Which law is broader in its protections?

(d) Enforcement: How are violations penalized?

H.3. A fitness tracking company wants to sell aggregated health data (average heart rates, sleep patterns, activity levels by zip code) to pharmaceutical companies for market research. Individual users consented to data collection for "improving the service" but not specifically for third-party sales.

(a) Is the company violating informed consent? Why or why not?

(b) Under GDPR, would this be legal? Under CCPA?

(c) What ethical obligation does the company have, regardless of legality?

Part I: Comprehensive Analysis ⭐⭐⭐⭐

I.1. (Full case analysis — 40+ minutes)

In 2018, Amazon scrapped an AI recruiting tool after discovering it was biased against women. The tool was trained on resumes submitted to Amazon over a 10-year period — and since most of those resumes came from men (reflecting the male-dominated tech industry), the algorithm learned to penalize resumes that included the word "women's" (as in "women's chess club captain") and to downgrade graduates of all-women's colleges.

(a) What type of bias is this? How did it enter the algorithm?

(b) Is the algorithm "wrong" if it accurately reflects historical hiring patterns?

(c) Apply all three ethical frameworks (utilitarian, rights-based, care ethics) to the decision to scrap the tool.

(d) How does this case relate to Simpson's paradox? (Hint: think about aggregate vs. department-level hiring patterns.)

(e) Draft a 500-word policy brief recommending how companies should evaluate AI hiring tools for bias. Include at least three specific statistical checks.

I.2. (Debate preparation — 40+ minutes)

Prepare for the debate outlined in Section 27.10: "Is it ethical to use statistical models to make consequential decisions about individuals?"

(a) Write a 300-word argument for the pro side.

(b) Write a 300-word argument for the con side.

(c) Write a 200-word synthesis that identifies conditions under which algorithmic decision-making is and is not ethically acceptable.

(d) What role should affected communities play in deciding whether algorithms are used to make decisions about them?

I.3. (Research and reflection — 40+ minutes)

Research one of the following cases and write a 600-word ethical analysis:

(a) The Cambridge Analytica/Facebook scandal (2018) — political data misuse

(b) The COMPAS recidivism algorithm analyzed by ProPublica (2016) — algorithmic fairness

(c) The Henrietta Lacks case — use of biological data without consent

(d) The Target pregnancy prediction algorithm — commercial data inference

For your chosen case: - Summarize the facts (200 words) - Identify the ethical violations using concepts from this chapter (200 words) - Apply at least two ethical frameworks to evaluate the situation (200 words) - Propose specific reforms that would prevent similar issues (bonus: 100 words)