Key Takeaways: Lies, Damn Lies, and Statistics: Ethical Data Practice
One-Sentence Summary
Ethical data practice requires recognizing that statistics can mislead without fabrication (through cherry-picking, Simpson's paradox, and the ecological fallacy), that research integrity depends on distinguishing confirmatory from exploratory analysis (preventing p-hacking and HARKing), that data privacy and informed consent are harder to achieve than most people think, and that every data-driven decision embeds value judgments about whose welfare matters — judgments that demand transparency, accountability, and input from affected communities.
Core Concepts at a Glance
Concept
Definition
Why It Matters
Simpson's paradox
A trend in aggregated data that reverses when broken into subgroups
Data can tell opposite stories at different levels — always check both
Ecological fallacy
Drawing conclusions about individuals from group-level data
Group statistics don't describe individual people
Cherry-picking
Selecting data, ranges, or subgroups that support your conclusion
True statistics can tell false stories through selective presentation
P-hacking
Trying multiple analyses until a significant result appears
Inflates false positive rates far beyond the nominal alpha level
HARKing
Presenting post-hoc discoveries as pre-specified hypotheses
Misrepresents the discovery process and overstates evidence
Informed consent
Participants' knowing agreement to participate in research
Respects individual autonomy and protects against exploitation
Re-identification risk
The ability to identify individuals from "anonymized" data
87% of Americans can be identified by birth date, zip code, and gender
Fairness impossibility
Equal calibration, equal FPR, and equal FNR cannot all be achieved simultaneously
Every algorithm embeds a value judgment about which fairness matters
Simpson's Paradox
Element
Description
What it is
A trend that reverses when data is disaggregated into subgroups
Classic example
UC Berkeley admissions: women had lower overall admission rates but higher rates in most departments
Why it happens
A confounding variable is unevenly distributed across comparison groups
Ethical implication
Both the aggregate and disaggregated stories are "true" — choosing which to present is an ethical decision
The fix
Always check both aggregate and subgroup data; report both; be transparent about the level of analysis
Questionable Research Practices
Practice
What It Is
Why It's Wrong
The Fix
P-hacking
Trying multiple analyses until p < 0.05
Inflates false positive rate (64% with 20 tests)
Pre-register analysis plan
HARKing
Presenting post-hoc findings as hypotheses
Misrepresents evidence strength
Label exploratory analyses as such
Cherry-picking
Selecting supportive data, ignoring contradictory data
Creates misleading impression from true facts
Report all analyses; justify any restrictions
Optional stopping
Checking significance repeatedly and stopping when p < 0.05
Inflates Type I error beyond nominal alpha
Pre-specify sample size
Selective reporting
Reporting only significant results
File drawer problem; biases the literature
Report all results including null findings
Flexible outlier removal
Removing outliers only when they hurt results
Distorts data to match hypothesis
Pre-specify outlier criteria
Ethical Frameworks for Data Practice
Framework
Core Question
Applied to James's Algorithm
Utilitarian
Which choice produces the greatest total good?
Use if total errors decrease, even if one group bears more cost
Rights-based
Does this respect every individual's fundamental rights?
Reject if it violates the right to individual assessment
Care ethics
What response best serves the most vulnerable?
Modify to protect communities historically harmed by the justice system
Research Ethics Timeline
Year
Event
Significance
1932
Tuskegee Syphilis Study begins
399 Black men denied treatment for 40 years
1972
Tuskegee exposed by journalist
Led to public outrage and policy reform
1974
National Research Act
Created National Commission for human subjects protection
1979
Belmont Report
Established three principles: Respect, Beneficence, Justice
1981
Common Rule (45 CFR 46)
Required IRB review for federally funded research
2014
Facebook emotional contagion study
Manipulated 689K users' emotions without consent
2015
Open Science Collaboration
Only 36% of psychology findings replicated
2018
GDPR enacted
EU data privacy regulation with major penalties
2020
CCPA enacted
California data privacy regulation
Data Privacy
Concept
Key Point
Re-identification
Removing names is not enough; date of birth + zip code + gender can identify 87% of Americans
Netflix attack
Narayanan and Shmatikov re-identified "anonymous" users from movie ratings
GDPR
Opt-in consent, right to deletion, up to 4% of global revenue in penalties
CCPA
Opt-out model, right to know and delete, up to $7,500 per intentional violation
The lesson
Any dataset with enough variables can potentially be linked to external information
The "Lying with True Statistics" Checklist
Ask yourself before every analysis:
Check
Question
Cherry-picking
Would my conclusion change if I used all the available data?
Denominator games
Am I reporting both absolute and relative numbers?
Aggregation effects
Could this trend reverse at a different level of analysis?
Survivorship bias
Am I only looking at the "winners"?
Correlation → causation
Am I implying a causal relationship that my study design can't support?
Missing context
Would someone who disagreed with me say I presented the data fairly?
Perspective-Taking Framework
For any data-driven decision, consider:
Stakeholder
Question
The analyst
What are my incentives? Am I under pressure?
The decision-maker
Who is using this? What decisions will they make?
The subjects
Whose data is this? Did they consent?
The affected community
Who will be impacted? Were they consulted?
The absent voices
Who is NOT in the data?
Future users
How might this data be used in ways I didn't intend?
Personal Code of Statistical Ethics (Template)
Domain
Principle
Collecting data
Obtain informed consent; be transparent about purpose
Analyzing data
Pre-register confirmatory analyses; report all results
Reporting results
Include effect sizes and CIs; acknowledge limitations
Making decisions
Consider who might be harmed; seek affected perspectives
Will not
Cherry-pick; present correlations as causal; suppress null results
Key Python Code
Simpson's Paradox Detector
def check_simpsons_paradox(df, outcome, group, stratify_by):
"""
Compare aggregate and stratified trends to detect
Simpson's paradox.
Returns dict with aggregate and per-stratum results,
plus a paradox_detected flag.
"""
agg = df.groupby(group)[outcome].mean()
groups = sorted(agg.index)
agg_diff = agg[groups[1]] - agg[groups[0]]
reversal_count = 0
for stratum in df[stratify_by].unique():
subset = df[df[stratify_by] == stratum]
strat_means = subset.groupby(group)[outcome].mean()
strat_diff = strat_means[groups[1]] - strat_means[groups[0]]
if (agg_diff > 0 and strat_diff < 0) or \
(agg_diff < 0 and strat_diff > 0):
reversal_count += 1
return reversal_count > len(df[stratify_by].unique()) / 2
Common Mistakes
Mistake
Correction
"The aggregate trend tells the whole story"
Always check for Simpson's paradox by stratifying
"The data is anonymized so privacy is protected"
Re-identification is possible with surprisingly few fields
"I found it in the data, so it must be real"
Exploratory findings need confirmatory replication
"p < 0.05 after my third analysis"
Multiple testing inflates false positive rates
"The algorithm is objective, so it's fair"
Algorithms inherit the biases in their training data
"Correlation = causation in this case"
Study design determines whether causal claims are justified
"More accurate overall = better for everyone"
Aggregate accuracy can mask group-level unfairness
"I removed names, so consent doesn't matter"
Using data for purposes beyond the original consent is ethically problematic
Connections
Connection
Details
Ch.4 (Study design)
Informed consent and IRB introduced; deepened here with Tuskegee and modern cases
Ch.13 (Hypothesis testing)
P-hacking introduced; deepened here as ethical violation, not just methodological error
Ch.17 (Power and effect sizes)
Publication bias and replication crisis; deepened here as systemic ethical failure
Ch.22 (Correlation)
Correlation vs. causation; reframed here as ethical imperative, not just statistical principle
Ch.23 (Multiple regression)
Simpson's paradox introduced with kidney stones; given full ethical treatment here
Ch.25 (Communication)
Misleading graphs; reframed here as ethical violations, not just technical errors
Ch.28 (Journey continues)
Personal code of ethics carries forward into all future data work
We use cookies to improve your experience and show relevant ads. Privacy Policy