datasets with millions or billions of observations — carries an almost magical aura. More data sounds like it should always be better. More data means smaller standard errors, tighter confidence intervals, and more power to detect effects. All of which is true — *when the data is representative*. → Chapter 26: Statistics and AI: Being a Critical Consumer of Data
$P(\text{illness} \mid \text{smoker})$ = "Among smokers, what's the probability of illness?" - $P(\text{smoker} \mid \text{illness})$ = "Among those with illness, what's the probability of being a smoker?" - $P(\text{pass} \mid \text{studied})$ = "Among students who studied, what's the probability o → Chapter 9: Conditional Probability and Bayes' Theorem
"Holding other variables constant"
the conceptual leap of controlling for confounders statistically. This connects to the entire course: confounding (Ch. 4), correlation vs. causation (Ch. 22), and the logic of statistical control. Students who master this idea understand why observational data with regression can approximate (but no → Chapter 23: Multiple Regression — Instructor Notes
"P-value explained properly"
fully delivered in Sections 13.5-13.6 - ✅ **"What 'statistically significant' means"** — fully delivered in Section 13.7 - 🔄 **"Daria's shooting analysis"** — partially resolved (formal test, $z = 1.22$, $p = 0.111$, fail to reject at $\alpha = 0.05$; full two-sample test framework in Chapter 16; po → Chapter 13: Hypothesis Testing: Making Decisions with Data
"The probability of WHAT given WHAT?"
Identify which direction the conditional goes. - [ ] **"What's the base rate?"** — How common is this event *before* considering the evidence? - [ ] **"What's the alternative?"** — You must compare competing explanations, not evaluate one in isolation. - [ ] **"Was independence assumed?"** — If prob → Key Takeaways: Conditional Probability and Bayes' Theorem
"This treatment cured 90% of patients"
but the 10% who died weren't tracked, or the patients who were too sick to participate were excluded from the study. - **"These schools have a 100% college acceptance rate"** — because students who wouldn't get in were counseled out before applying. → Chapter 27: Lies, Damn Lies, and Statistics: Ethical Data Practice
the proportion of variability in $y$ explained by $x$. It's the regression analogue of $\eta^2$ from ANOVA (Chapter 20). Same concept, same formula, different context. → Chapter 22: Correlation and Simple Linear Regression
(a) Means:
Player A: $(20+22+19+21+20+23+18+21)/8 = 164/8 = 20.5$ - Player B: $(15+18+20+22+25+20+28+12)/8 = 160/8 = 20.0$ - Player C: $(30+10+25+5+35+15+20+20)/8 = 160/8 = 20.0$ → Quiz: Numerical Summaries — Center, Spread, and Shape
Created the National Commission for the Protection of Human Subjects - **1979: Belmont Report** — Established three core principles: - **Respect for persons** — Individuals must be treated as autonomous agents; those with diminished autonomy deserve additional protection - **Beneficence** — Research → Chapter 27: Lies, Damn Lies, and Statistics: Ethical Data Practice
How many observational units are in your dataset? - How many variables are there? How many are numerical? How many are categorical? - Are there any missing values? Which columns have them? - Does pandas correctly identify the variable types, or are some categorical variables stored as numbers? - Wha → Chapter 3: Your Data Toolkit: Python, Excel, and Jupyter Notebooks
Test name: ___________________________________ - Test statistic formula: ___________________________________ - Observed test statistic value: ___________ - Degrees of freedom (if applicable): ___________ - p-value: ___________ → Appendix E: Templates and Worksheets
6. CI Formula:
Formula: point estimate +/- (critical value) x (standard error) - Standard error formula: ___________________________________ - Standard error value: ___________ - Critical value (z* or t*): ___________ - Degrees of freedom (if t): ___________ - Margin of error: ___________ → Appendix E: Templates and Worksheets
Effect size measure: _______________ Value: ___________ - Is the effect practically meaningful? _________________________________________ - 95% CI for the parameter: ( ___________ , ___________ ) → Appendix E: Templates and Worksheets
pandas thinks it's a number, but it's actually a nominal categorical variable. You can't calculate the "average zip code" (as we saw in Chapter 2's case study on electronic health records). Maya makes a mental note not to include it in any numerical summaries. → Case Study: Exploring Public Health Data with pandas — Dr. Chen's Flu Surveillance
The value of a standardized decision framework - P-values have clear meaning when used correctly - The problem is misuse, not the tool itself - No replacement would be immune to misuse → Exercises: Hypothesis Testing: Making Decisions with Data
extending the two-group comparison from Chapter 16 to three or more groups. The bootstrap and permutation ideas from this chapter can also be applied to multi-group comparisons, providing a useful robustness check on ANOVA results. → Further Reading: The Bootstrap and Simulation-Based Inference
Appendix Types Selected:
Statistical tables (Category A + stats focus) - Python code reference (CODE_LANGUAGE = Python) - Environment setup guide (CODE_LANGUAGE ≠ none) - Data sources guide (Category A + B with data analysis) - Templates & worksheets (Category C) - FAQ / Troubleshooting (Category A + C) - Key studies summar → Introductory Statistics: Making Sense of Data in the Age of AI
Apply the addition and multiplication rules
these are the computational workhorses of probability that students will use through Chapter 10 and beyond. 2. **Distinguish between independent and mutually exclusive events** — students frequently confuse these, and the confusion persists if not addressed directly. 3. **Construct and interpret two → Chapter 8: Probability Foundations — Instructor Notes
Arguments against:
The model systematically over-predicts risk for Black defendants - No individual should be punished because of *statistical patterns* in their demographic group - The training data reflects historical policing patterns, which themselves reflect structural racism - A tool that is 85% accurate overall → Chapter 27: Lies, Damn Lies, and Statistics: Ethical Data Practice
Arguments for using the algorithm:
It's more consistent than individual judges, who also have biases - It provides a structured framework for decisions that were previously subjective - The overall accuracy is high - Not using data means falling back on intuition, which has its own biases → Chapter 27: Lies, Damn Lies, and Statistics: Ethical Data Practice
just because it's made of digits doesn't make it numerical 2. **Treating ordinal as continuous without acknowledging the simplification** — the average of 1-5 ratings is common but technically approximate 3. **Ignoring the data dictionary** — coded values (77 = "Don't know") can corrupt calculations → Key Takeaways: Types of Data and the Language of Statistics
that even excellent tests produce false alarms when the underlying condition is rare. And you've seen that the base rate fallacy, the tendency to ignore prior probabilities, is one of the most common reasoning errors humans make. → Chapter 9: Conditional Probability and Bayes' Theorem
Bayesian updating
the idea that probability is not fixed but changes with new evidence. This is a genuine paradigm shift. Students who internalize Bayesian thinking start seeing evidence differently — they ask "How should this new information update my belief?" rather than "Does this prove or disprove my belief?" → Chapter 9: Conditional Probability and Bayes' Theorem — Instructor Notes
two modes, two peaks. The data doesn't have a single center; it has two. Bimodal distributions often mean your data contains two distinct groups behaving differently (morning visitors and afternoon visitors). → Chapter 5: Exploring Data: Graphs and Descriptive Statistics
binary outcomes
the response variable has exactly two possible values. And here's the problem: the regression models you learned in Chapters 22 and 23 don't work for binary outcomes. If you try to force a straight line through yes/no data, you'll get predictions that are impossible — probabilities below 0 or above → Chapter 24: Logistic Regression: When the Outcome Is Yes or No
bounded below at zero
The **log-normal distribution** often fits right-skewed positive data better than the normal - **Power-law distributions** describe phenomena where extreme values are far more common than the normal model predicts - Assuming normality when the data isn't normal leads to **systematic errors** in pred → Case Study 2: When Normality Fails — Income, Wealth, and Power-Law Distributions
students should leave this chapter always asking "how big is the effect?" 3. **Conduct a basic power analysis** — understanding power demystifies sample size decisions in research design. → Chapter 17: Power and Effect Sizes — Instructor Notes
Calculate and interpret p-values
the most misunderstood concept in introductory statistics. Spend more time here than on any other single concept. 2. **State null and alternative hypotheses** — the framework that structures every subsequent inference chapter. 3. **Distinguish between Type I and Type II errors** — understanding the → Chapter 13: Hypothesis Testing — Instructor Notes
Calculate and interpret standard deviation
this is the measure students will use most frequently for the rest of the course (standard error, test statistics, confidence intervals all depend on it). 2. **Use the five-number summary and box plots** — box plots appear in nearly every subsequent chapter for visual comparison. 3. **Apply the Empi → Chapter 6: Numerical Summaries — Instructor Notes
not even a little bit. We never take derivatives or integrals. - ❌ **Prior statistics courses** — this book starts from zero. - ❌ **Programming experience** — we teach you Python from scratch in Chapter 3. - ❌ **A scientific calculator** — Python and Excel will handle all computation. - ❌ **A "math → Prerequisites: Are You Ready?
arguably the single most important theorem in all of statistics. It's the bridge from probability to inference, and it'll explain why everything we've learned about the normal distribution matters even more than you currently think. → Chapter 10: Probability Distributions and the Normal Curve
Ch.10 section 10.9
`stats.probplot()`, Ch.10 section 10.9 QRPs, see *questionable research practices* quartile, **Ch.6 section 6.4** questionable research practices (QRPs), **Ch.27 section 27.5** → Index
Ch.11 section 11.2
of the mean, **Ch.11 section 11.2** - of the proportion, **Ch.11 section 11.5** sampling variability, **Ch.11 section 11.2** SAT/ACT score distributions, Ch.10 case-study-01 scatterplot, Ch.5 section 5.9, **Ch.22 section 22.2** scipy.stats, see individual function names seaborn, **Ch.5 section 5.2** → Index
Are observations independent? - Create histograms or QQ-plots for each group to check normality - Run Levene's test for equal variances → Chapter 20: Analysis of Variance (ANOVA)
Checklist:
Are we still conversational? Using "you" and "I"? Contractions? ✓ - Are we still leading with stories and concrete examples before abstractions? ✓ (Blood pressure, Daria's three-pointers, StreamVibe watch time) - Are we acknowledging math anxiety without condescending? ✓ ("Take a breath — I'm showin → Chapter 10: Probability Distributions and the Normal Curve
a formula-based method for analyzing categorical data. Where this chapter used simulation to test group differences, the chi-square test uses a clever comparison of observed vs. expected frequencies. Key resources to preview: → Further Reading: The Bootstrap and Simulation-Based Inference
the paired vs. independent distinction is the primary decision point, and students frequently get it wrong. 2. **Conduct and interpret a two-sample t-test** — the most commonly used test in published research. 3. **Construct confidence intervals for the difference between two groups** — the CI for t → Chapter 16: Comparing Two Groups — Instructor Notes
Choose the threshold:
Consider the relative costs of false positives vs. false negatives - The threshold is a *values* decision, not just a statistical one → Key Takeaways: Logistic Regression
Classify variables as categorical or numerical
this determines which statistical methods are appropriate for the rest of the course. 2. **Distinguish between populations and samples** — a foundational distinction for all of inference. 3. **Read and interpret data tables and data dictionaries** — practical skill students need immediately for the → Chapter 2: Types of Data and the Language of Statistics — Instructor Notes
Cluster sampling
the sections are the clusters, and everyone within selected clusters is surveyed. > 2. **Convenience sampling** — the researcher is surveying whoever happens to be at that location at that time. The sample is not random and likely overrepresents frequent mall shoppers. > 3. A stratified sample guara → Chapter 4: Designing Studies: Sampling and Experiments
essentially, it finds users who are similar to you (who watched and liked similar shows) and recommends what *they* watched next. Statistically, this is nearest-neighbor regression: predicting your rating for an unwatched show based on the ratings of your "neighbors." > > **Step 4: Rank and serve.** → Chapter 26: Statistics and AI: Being a Critical Consumer of Data
Color key:
🔵 Light blue: Foundation — start here - 🟠 Orange: Critical bridge chapters — don't skip these - 🟢 Green: Core methods - 🔴 Pink: Capstone and reflection → How to Use This Book
Common actions to log:
Removed duplicate rows - Dropped rows with missing values in column(s) ___ - Imputed missing values in ___ using ___ method - Recoded variable ___ (original values -> new values) - Created new variable ___ from ___ - Removed outliers in ___ (criteria: ___) - Fixed inconsistent entries in ___ (e.g., → Appendix E: Templates and Worksheets
Common wrong interpretations:
"There's a 95% chance mu is between 42 and 58." (Wrong — mu is fixed, not random.) - "95% of the data falls in this interval." (Wrong — that describes the data, not the parameter.) - "If we sampled again, there's a 95% chance the new sample mean would be in this interval." (Wrong — this confuses the → Appendix F: FAQ and Troubleshooting
Communication
[ ] My notebook tells a coherent story from start to finish - [ ] My executive summary/policy brief is written for a non-technical audience - [ ] I've translated statistical results into plain language - [ ] My project is well-organized with clear section headers - [ ] I've proofread for clarity, gr → Capstone Rubric
the two-sample t-test, the paired t-test, and the two-proportion z-test. You'll finally be able to answer Alex's big question: "Did the new recommendation algorithm actually increase watch time compared to the old one?" And Professor Washington's: "Is the algorithm's false positive rate different fo → Chapter 15: Inference for Means
Conduct a one-sample t-test for a population mean
this is the workhorse procedure for inference about means. 2. **Understand when to use z vs. t** — the t-distribution accounts for the additional uncertainty of estimating sigma. 3. **Verify conditions for t-procedures** — randomness, approximate normality (or large sample), and independence. → Chapter 15: Inference for Means — Instructor Notes
Conduct a one-sample z-test for a proportion
this applies the hypothesis testing framework from Ch. 13 to a specific and common scenario. 2. **Verify conditions for inference about proportions** — the success-failure condition (np >= 10 and n(1-p) >= 10) is essential and often neglected. 3. **Interpret results in real-world context** — student → Chapter 14: Inference for Proportions — Instructor Notes
Conduct and interpret a one-way ANOVA
the procedural skill, including reading an ANOVA table. 3. **Perform post-hoc pairwise comparisons** — finding a significant F-test is just the beginning; post-hoc tests tell you which groups differ. → Chapter 20: ANOVA — Instructor Notes
Conduct the test and build a CI.
Compute the z-test statistic and p-value - Construct both a Wald CI and a Wilson CI - Compare the two CIs — do they differ substantially? → Chapter 14: Inference for Proportions
understanding why correlation doesn't prove causation requires grasping confounding. This is a genuine threshold concept: once students see confounders everywhere, they can't unsee them. Some students cross this threshold quickly; others need the entire semester. → Chapter 4: Designing Studies — Instructor Notes
Content Blocks Activated:
Category A: Mathematical formulation, code implementation, worked examples, debugging walkthroughs, comparison tables - Category B: Research study breakdowns, debate/discussion frameworks, ethical analysis - Category C: Action checklists, self-assessment tools, scenario walkthroughs → Introductory Statistics: Making Sense of Data in the Age of AI
contingency table
a grid showing the frequency of each combination of categories. You calculated joint probabilities (cell count / grand total), marginal probabilities (row or column total / grand total), and conditional probabilities (cell count / row or column total). > > Back then, contingency tables were tools fo → Chapter 19: Chi-Square Tests: Categorical Data Analysis
Contingency tables
the topic of this section — are built from two categorical variables. They show how many observations fall into each combination of categories. Remember: categorical variables classify observations into groups. That classification is exactly what makes probability calculations from contingency table → Chapter 8: Probability: The Foundation of Inference
$\bar{d} = 3.17$, $SE_d = 0.748$, $t = 4.24$, $p = 0.0007$ - Conclusion: Strong evidence of improvement. ✓ → Chapter 16: Comparing Two Groups
cross-sectional
a snapshot of many groups at one point in time. Approach (b) is **longitudinal** — following the same individuals over time. The longitudinal approach is better for studying interventions because you can compare each family's asthma outcomes *before and after* receiving the air purifier, using each → Chapter 4: Designing Studies: Sampling and Experiments
D
d)
Descriptive: "What is the average drink price of the four coffee shops in this dataset?" (Just summarizing the data you have.) - Inferential: "Based on this sample, are independent coffee shops in this city more expensive than chain shops, on average?" (Generalizing from 4 shops to all shops in the → Quiz: Types of Data and the Language of Statistics
Data Handling
[ ] I've inspected the data and reported its basic properties - [ ] I've created a data dictionary - [ ] I've addressed missing values with documented reasoning - [ ] I've handled outliers with documented reasoning - [ ] I've created any needed derived variables → Capstone Rubric
Age -99 and 999: Set to NaN (clearly placeholder values, not real ages) - Negative watch times: Set to NaN (impossible values) - Extreme watch times (>24 hours): Flagged but NOT removed. Rationale: the data might represent cumulative watch time over the study period, not a single day. Will investiga → Case Study: Alex's StreamVibe Cleaning Log — A Step-by-Step Template
Decomposing variability
the insight that total variation can be split into "explained" (between-group) and "unexplained" (within-group) components. This idea extends naturally to regression (Ch. 22-23), where R-squared is the proportion of variation explained by the model. → Chapter 20: ANOVA — Instructor Notes
Degrees of freedom for chi-square tests:
Goodness-of-fit: df = k - 1 (where k = number of categories) - Test of independence: df = (r - 1)(c - 1) (where r = rows, c = columns) → Appendix A: Statistical Tables
department selectivity
Women applied more heavily to selective departments (low admission rates for everyone) - Men applied more heavily to less selective departments (high admission rates for everyone) - Aggregating across departments mixed the effect of *who applied where* with the effect of *how each department treated → Chapter 27: Lies, Damn Lies, and Statistics: Ethical Data Practice
Describe the properties of the normal distribution
students need to internalize symmetry, the 68-95-99.7 rule, and the role of mean and standard deviation as parameters. 3. **Assess normality using QQ-plots** — a practical diagnostic skill used whenever t-tests, ANOVA, or regression require normality assumptions. → Chapter 10: Probability Distributions and the Normal Curve — Instructor Notes
descriptive statistics
we're summarizing the data we have. But generalizing to "all American adults" would be **inferential statistics** — we're reaching beyond our sample to make a claim about the population. The quality of that inference depends on how well our 500 respondents represent the U.S. adult population (which → Chapter 3: Your Data Toolkit: Python, Excel, and Jupyter Notebooks
Design principles:
One main idea per slide - Minimal text — let the visuals do the work - Every chart must have a clear takeaway stated in the slide title (e.g., "Customers who receive same-day shipping return 40% fewer items" — not "Return rate by shipping method") → Capstone Project 2: Business Analytics Report
the "prosecutor's fallacy" is one of the most dangerous statistical errors, and it appears in medicine, law, and everyday reasoning. 2. **Apply Bayes' theorem to update probabilities** — this is the mathematical foundation of learning from evidence. 3. **Construct tree diagrams** — tree diagrams mak → Chapter 9: Conditional Probability and Bayes' Theorem — Instructor Notes
Distribution thinking
seeing data as a distribution rather than individual numbers. This chapter is where the shift begins. A student who thinks "the data has a right-skewed distribution with a center around 45 and a spread of about 20" is thinking statistically in a way that a student who only sees individual data point → Chapter 5: Graphs and Descriptive Statistics — Instructor Notes
entire shapes with centers, spreads, peaks, tails, and outliers. Not "the average is 44%" but "the distribution of shooting percentages is symmetric and unimodal, centered around 44%, with a spread from about 15% to 70%." Not "the average age is 38" but "the age distribution is bimodal, with peaks i → Chapter 5: Exploring Data: Graphs and Descriptive Statistics
Domain 1: Education
**Civil Rights Data Collection (CRDC)**: U.S. Department of Education data on school discipline, access to advanced courses, teacher quality, and resource allocation — broken down by race, gender, and disability status. - **National Center for Education Statistics (NCES)**: Graduation rates, test sc → Capstone Project 3: Social Justice Data Audit
Domain 2: Criminal Justice
**Stanford Open Policing Project**: Traffic stop data from multiple states, including driver demographics and stop outcomes. - **The Sentencing Project / U.S. Sentencing Commission**: Federal sentencing data with demographic variables. - **Local police department open data**: Many cities publish arr → Capstone Project 3: Social Justice Data Audit
Domain 3: Employment and Hiring
**Bureau of Labor Statistics / Current Population Survey**: Employment rates, wages, and occupational data by demographics. - **EEOC charge data**: Discrimination complaint data by type and basis. - **Glassdoor or PayScale salary data** (publicly available subsets). → Capstone Project 3: Social Justice Data Audit
Domain 4: Housing and Lending
**Home Mortgage Disclosure Act (HMDA) data**: Mortgage application outcomes by race, income, and geography. - **HUD Fair Housing complaints**: Discrimination complaint data. - **Zillow / Redfin open data**: Housing prices and neighborhood demographics. → Capstone Project 3: Social Justice Data Audit
Dr. Maya Chen
public health epidemiologist tracking disease outbreak patterns across communities (CDC/WHO-style data) 2. **Alex Rivera** — marketing data analyst at StreamVibe testing whether a new recommendation algorithm increases watch time (A/B testing, tech industry) 3. **Professor James Washington** — crimi → Introductory Statistics: Making Sense of Data in the Age of AI
E
Each user is randomly assigned
the hash function is effectively random with respect to user characteristics - **Each user stays in the same group** — the hash of a given user ID always produces the same number, so users don't bounce between layouts between sessions - **The assignment is invisible to users** — they don't know they → Case Study: A/B Testing in Tech — Designing Experiments at Scale
a score of 7 means the same probability of reoffending for all groups 2. **Equal false positive rates** — the same proportion of non-reoffenders are wrongly classified as high risk across all groups 3. **Equal false negative rates** — the same proportion of reoffenders are wrongly classified as low → Case Study 2: James's Algorithmic Reckoning — Ethics of Data-Driven Criminal Justice
Ethics
[ ] I've addressed data provenance and consent - [ ] I've considered who might be harmed - [ ] I've discussed representation and missing voices - [ ] I've considered potential misuse of findings - [ ] My ethical discussion is specific to my project, not generic → Capstone Rubric
Evaluate the model:
Confusion matrix at the chosen threshold - Accuracy, sensitivity, specificity, precision, F1 score - ROC curve and AUC → Key Takeaways: Logistic Regression
Every variable is either categorical or numerical
and getting this right determines which tools and analyses are appropriate. 2. **Numbers aren't always numerical variables.** Zip codes, ID numbers, and phone numbers are categorical despite being made of digits. 3. **Parameters describe populations; statistics describe samples.** Most real-world an → Chapter 2: Types of Data and the Language of Statistics
Example business questions (choose your own):
Which customer segments are most profitable, and what distinguishes high-value customers from low-value ones? - Does the new marketing campaign lead to significantly higher conversion rates compared to the control group? Is the difference large enough to justify the cost? - What factors best predict → Capstone Project 2: Business Analytics Report
Examples of independent samples:
Patients randomly assigned to a drug group vs. a placebo group - Students at School A vs. students at School B - Users who see Algorithm A vs. users who see Algorithm B (Alex's A/B test!) - Crime outcomes under algorithm-based bail vs. judge-based bail (James's study!) → Chapter 16: Comparing Two Groups
Examples:
Is the cure rate higher for Drug A than Drug B? (Maya's world) - Is the click-through rate different for two website designs? (Alex's world) - Is the recidivism rate different for algorithm-recommended vs. judge-recommended bail decisions? (James's world!) → Chapter 16: Comparing Two Groups
exercises.md
practice problems at four difficulty levels - **quiz.md** — self-assessment with answers and explanations - **case-study-01.md** — extended real-world application - **case-study-02.md** — additional deep-dive case study - **key-takeaways.md** — one-page summary card - **further-reading.md** — annota → How to Use This Book
the "big idea" is that you can learn about the population by cleverly reusing your sample. 2. **Construct bootstrap confidence intervals** — the practical skill that students can apply immediately. 3. **Compare simulation-based and formula-based approaches** — understanding when and why the two appr → Chapter 18: Bootstrap and Simulation-Based Inference — Instructor Notes
Explain when nonparametric methods are needed
the decision to use a nonparametric test is based on assumption violations, and students need to diagnose these. 2. **Conduct a Wilcoxon rank-sum test** — the most common nonparametric alternative to the two-sample t-test. 3. **Compare parametric and nonparametric approaches** — students should unde → Chapter 21: Nonparametric Methods — Instructor Notes
Scatterplots of $y$ vs. each $x$ - Correlation matrix among predictors (watch for multicollinearity) - Descriptive statistics → Key Takeaways: Multiple Regression
the error rates within each racial group. Northpointe is reporting **predictive values** — the accuracy rates within each risk category. And here's the thing that breaks people's brains: **it's mathematically impossible for both metrics to be equal across racial groups when the base rates differ.**" → Chapter 26: Statistics and AI: Being a Critical Consumer of Data
Each confirmatory workup costs approximately $800-$1,200 in specialist visits and lab tests. - Total cost of false-positive follow-ups: approximately $2,100 \times $1,000 = $2.1 million. - Cost per true case detected: about $265,000 (total program cost divided by 8 cases found). → Case Study 1: Medical Screening — When a Positive Test Doesn't Mean What You Think
Save your work to Google Drive frequently. - Re-run your notebook from the top when you reconnect (Colab doesn't preserve variables between sessions). → Appendix C: Environment Setup Guide
Flag any ambiguous variables
ones where the classification isn't clear-cut. Write a sentence explaining why you chose the classification you did. > 6. **Identify the data structure:** Is your dataset cross-sectional or longitudinal? How do you know? > > **Example:** If you chose the World Happiness Report: > - Observational uni → Chapter 2: Types of Data and the Language of Statistics
Flip sign
In extreme cases, the coefficient can reverse direction entirely. A variable that appeared to *increase* $y$ in simple regression might *decrease* $y$ once confounders are controlled. This is Simpson's Paradox in regression form. → Chapter 23: Multiple Regression: The Real World Has More Than One Variable
Follow the style guide:
Conversational tone — write like you're explaining to a friend - Lead with intuition, then formulas - Use inclusive language and diverse examples - Keep code examples under 15-20 lines - Always explain what code does in plain English 3. **Respect the citation honesty system:** - **Tier 1:** Only for → Contributing to Introductory Statistics: Making Sense of Data in the Age of AI
df1 = k - 1 (number of groups minus 1) - df2 = N - k (total observations minus number of groups) → Appendix A: Statistical Tables
Format and style requirements:
Use a professional business report format with clear section headers - Open with a one-paragraph executive summary stating the question, key finding, and top recommendation - Use bullet points and numbered lists for readability - Include 3-4 well-designed visualizations embedded in the report (not s → Capstone Project 2: Business Analytics Report
four different public health intervention programs
a vaccination-focused campaign, a nutrition education program, a community fitness initiative, and a standard-care control — and she wants to know: do the programs produce different health outcomes? That's not a two-group question. It's a four-group question. → Chapter 20: Analysis of Variance (ANOVA)
they systematically miss patterns — but **low variance** — they give similar predictions regardless of which specific training data you use. - **Complex models** (like a polynomial with 50 terms) have **low bias** — they can capture intricate patterns — but **high variance** — they're highly sensiti → Chapter 26: Statistics and AI: Being a Critical Consumer of Data
holding other variables constant
the idea that a regression coefficient tells you the effect of one predictor *while controlling for all the others*. > > **Why does this matter for communication?** Because one of the most common misinterpretations of regression is ignoring the "all else equal" clause. When you write "for each addit → Chapter 25: Communicating with Data: Telling Stories with Numbers
Hospital discharge data
matched to voter registration records - **Web browsing histories** — matched to social media profiles - **Genome data** — relatives' DNA can identify "anonymous" donors - **Location data** — just four spatiotemporal points can uniquely identify 95% of people → Chapter 27: Lies, Damn Lies, and Statistics: Ethical Data Practice
How to use this table:
For P(Z <= z): Read the value directly. - For P(Z > z): Compute 1 - P(Z <= z). - For P(-z < Z < z): Compute 2 * P(Z <= z) - 1. - For P(Z < -z): By symmetry, P(Z < -z) = P(Z > z) = 1 - P(Z <= z). → Appendix A: Statistical Tables
Hypotheses:
$H_0$: The two populations have the same distribution (the values from one group are equally likely to be larger or smaller than values from the other) - $H_a$: The values from one group tend to be systematically larger (or smaller) than the other → Chapter 21: Nonparametric Methods: When Assumptions Fail
the decision of how to handle missing values can change results, and students need to understand the tradeoffs. 2. **Document data cleaning decisions for reproducibility** — this is a professional practice that separates careful analysis from sloppy analysis. 3. **Use pandas for common cleaning task → Chapter 7: Data Wrangling — Instructor Notes
sampling bias, response bias, and confounding are concepts students will use for the rest of the course and their lives. 3. **Evaluate whether a study design supports causal conclusions** — the highest-order objective in this chapter. → Chapter 4: Designing Studies — Instructor Notes
Parents received a phone call saying their newborn may have a serious metabolic disorder. - They were told to bring the baby in for confirmatory testing (blood draws, specialist appointments). - The waiting period for confirmatory results was 5-10 days. → Case Study 1: Medical Screening — When a Positive Test Doesn't Mean What You Think
Explain why statistics matters in your life and career, regardless of your major - Tell the difference between descriptive and inferential statistics - Start seeing statistical reasoning in the news, conversations, and decisions around you → Chapter 1: Why Statistics Matters (and Why You Might Actually Enjoy This)
the idea that participants should know they're in a study and agree to participate. This principle emerged from horrific historical abuses: the Tuskegee syphilis study (where Black men with syphilis were deliberately left untreated for decades), Nazi medical experiments, and others. > > Today, any s → Chapter 4: Designing Studies: Sampling and Experiments
the predicted value of $y$ when $x = 0$ - $b_1$ is the **slope** — the predicted change in $y$ for each one-unit increase in $x$ - $x$ is the explanatory (predictor) variable → Chapter 22: Correlation and Simple Linear Regression
Interpret coefficients:
Exponentiate each coefficient: $e^{b_i}$ = odds ratio - "For each one-unit increase in $x_i$, the odds of the outcome are multiplied by $e^{b_i}$, holding all other variables constant" - Check p-values and 95% CIs for the odds ratios → Key Takeaways: Logistic Regression
Interpret individual predictors:
Coefficient: "For each one-unit increase in $x_i$, predicted $y$ changes by $b_i$, **holding all other variables constant**" - t-test / p-value: Is this specific predictor significant? - 95% CI: Plausible range for the true effect → Key Takeaways: Multiple Regression
[ ] I've interpreted every result in context (not just "significant" or "not significant") - [ ] I've correctly interpreted confidence intervals and p-values - [ ] I've discussed limitations and confounders - [ ] I've been careful with causal language - [ ] I've synthesized results into an overall c → Capstone Rubric
Interpretation:
Poverty rate and AQI are statistically significant predictors of ER visit rates after controlling for the other variables. - The uninsured percentage has a p-value of 0.057 — just barely above the $\alpha = 0.05$ threshold. This is a borderline result. The coefficient suggests a real effect, but we → Chapter 23: Multiple Regression: The Real World Has More Than One Variable
Intervention strategies:
**The courtroom analogy.** In a criminal trial, the jury does not calculate "the probability the defendant is innocent." They evaluate "how likely is this evidence, assuming the defendant is innocent?" If the evidence would be very unlikely under innocence (small p-value), they reject the innocence → Common Student Struggles and Intervention Strategies
Investigation examples:
Do suspension rates differ significantly by race after controlling for school size and poverty level? - Is there an association between the percentage of students of color in a school and access to AP courses? - Do students from different income backgrounds have significantly different loan repaymen → Capstone Project 3: Social Justice Data Audit
Day 1: Chapters 1-2. Statistical claims in headlines activity. Variable classification exercise. - Day 2: Chapter 3. Guided Jupyter lab: load data, explore with `.head()`, `.describe()`. Excel parallel demo. → 10-Week Quarter / Accelerated Syllabus
Key features:
28 chapters covering the complete introductory statistics curriculum - Conversational, intuition-first approach — formulas serve understanding, not the other way around - Python and Excel/Google Sheets examples side by side - Progressive portfolio project — leave with a real data analysis you can sh → Introductory Statistics: Making Sense of Data in the Age of AI
`watch_time_min` and `sessions` have identical missing counts (743) — they're missing together, which makes sense (if a user has no watch data, sessions would also be missing) - `satisfaction_score` has 22.4% missing — above the 20% threshold. This variable may not be reliable enough for primary ana → Case Study: Alex's StreamVibe Cleaning Log — A Step-by-Step Template
**Layer 1 (what she says aloud):** "Poverty is correlated with ER overcrowding. But when we look deeper, the real drivers are insurance access and primary care availability. Communities with similar poverty levels have very different ER rates depending on how many doctors they have." → Case Study 1: Maya's Public Health Brief for the City Council
Look back
trace the arc of what you've learned across all eight parts of this textbook 2. **Look around** — see where Maya, Alex, James, and Sam ended up 3. **Look forward** — map the roads that branch out from here, depending on where your curiosity leads → Chapter 28: Your Statistical Journey Continues
M
making decisions under uncertainty
it's the most practical course you'll take regardless of your major 2. **Descriptive statistics** summarizes what you have; **inferential statistics** reaches beyond your data to the bigger picture 3. Every statistical investigation follows **four pillars:** question → data → analysis → interpretati → Chapter 1: Why Statistics Matters (and Why You Might Actually Enjoy This)
mean, median, and mode — each answer the question "What's the typical value?" in different ways. The mean uses every value but is sensitive to outliers. The median is resistant to outliers. The mode identifies the most common value. → Chapter 6: Numerical Summaries: Center, Spread, and Shape
Measures of spread
range, IQR, variance, and standard deviation — quantify how much values differ from each other. **Standard deviation** is the most important: it measures the typical distance of values from the mean. → Chapter 6: Numerical Summaries: Center, Spread, and Shape
Minimum
the smallest value > 2. **Q1** — the first quartile (25th percentile) > 3. **Median (Q2)** — the middle value (50th percentile) > 4. **Q3** — the third quartile (75th percentile) > 5. **Maximum** — the largest value → Chapter 6: Numerical Summaries: Center, Spread, and Shape
using random sampling to approximate quantities that are difficult to compute analytically. The name comes from the Monte Carlo Casino in Monaco, because the methods rely on random chance (like gambling) to produce reliable answers. → Chapter 18: The Bootstrap and Simulation-Based Inference
Among users who watched a recommendation, 40% browsed for 30+ minutes: $P(\text{browsed} \mid \text{watch}) = 0.40$. - Among users who didn't watch, 10% browsed that long: $P(\text{browsed} \mid \text{didn't watch}) = 0.10$. → Chapter 9: Conditional Probability and Bayes' Theorem
nonparametric methods
distribution-free alternatives to the $t$-test and ANOVA that make fewer assumptions about the data. Some nonparametric tests are closely related to chi-square tests; for example, the Kruskal-Wallis test (a nonparametric alternative to one-way ANOVA) is essentially a chi-square test applied to ranke → Further Reading: Chi-Square Tests: Categorical Data Analysis
variables where the values are numbers on a meaningful scale (ages, incomes, test scores, watch times). And understanding histograms is the gateway to one of the most powerful ideas in all of statistics: distribution thinking. → Chapter 5: Exploring Data: Graphs and Descriptive Statistics
O
observational data
people weren't randomly assigned to be vaccinated or not. Maybe vaccinated people tend to be younger, healthier, or have better access to healthcare in general. The association between vaccination and lower hospitalization is real in this data, but calling it *causal* would require a controlled stud → Case Study: Exploring Public Health Data with pandas — Dr. Chen's Flu Surveillance
why this chapter matters 2. **"In this chapter, you will learn to..."** — concrete skills 3. **Learning path annotations** — 🏃 Fast Track and 🔬 Deep Dive guidance 4. **Main content sections** — concepts, examples, code, and practice 5. **Project checkpoint** — apply it to your portfolio 6. **Practic → How to Use This Book
ordinal
the categories have a natural order (free < basic < premium) that reflects increasing levels of service. `user_id` is **nominal** — it's a label with no meaningful order. Both are categorical variables, but the distinction matters because ordinal variables preserve rank information that nominal vari → Chapter 3: Your Data Toolkit: Python, Excel, and Jupyter Notebooks
P
P(A|B) ≠ P(B|A)
and why confusing the two is called the **prosecutor's fallacy**. You've seen how this confusion can have real consequences in medicine, criminal justice, and everyday reasoning. → Chapter 9: Conditional Probability and Bayes' Theorem
P-hacking
exploring many analyses and reporting only the significant ones — inflates the false positive rate far beyond the nominal $\alpha$. It is one of the primary causes of the replication crisis. → Chapter 13: Hypothesis Testing: Making Decisions with Data
the disease prevalence of 1%. But here's the thing: that 1% is itself an estimate. Maya's county might have a different prevalence than the national average. The inference tools in this chapter let you estimate the *actual* prevalence in a specific population and test whether it differs from the ass → Chapter 14: Inference for Proportions
pre-registration
publicly committing to your hypotheses and analysis plan before collecting data — has become a cornerstone of credible science. When a study is pre-registered, you know the researchers didn't explore dozens of paths and cherry-pick the one that "worked." > > **The ethical principle:** A p-value is o → Chapter 13: Hypothesis Testing: Making Decisions with Data
Probability as long-run frequency
the conceptual shift from certainty to probabilistic thinking. Many students think a probability of 0.7 means "it will happen" and 0.3 means "it won't." The idea that probability describes the long-run behavior of a random process (not a prediction about a single event) is a threshold concept. → Chapter 8: Probability Foundations — Instructor Notes
probability is not fixed; it changes with evidence
is the conceptual shift that separates casual probability thinking from the kind of reasoning that actually works in the real world. Every confidence interval (Chapter 12), hypothesis test (Chapter 13), and regression model (Chapters 22-24) you'll encounter builds on this foundation. → Chapter 9: Conditional Probability and Bayes' Theorem
Potential health improvements for ~2,500 residents in three communities - $4.2 million in federal remediation funding - Environmental justice for communities historically ignored - Long-term reduction in healthcare costs - Regulatory accountability → Case Study 1: Maya's Public Health Data Dilemma — Privacy vs. Public Good
randomly assign treatments so that confounders balance out across groups. But what about observational data, where you *can't* randomly assign? > > Multiple regression offers a partial solution: it lets you **statistically control** for confounders by including them in the model. It's not as strong → Chapter 23: Multiple Regression: The Real World Has More Than One Variable
ranks
first, second, third, and so on — and analyzing the ranks instead of the raw values. This simple trick sidesteps the normality assumption entirely. It also makes the methods naturally resistant to outliers, because whether that extreme value is 100 or 1,000,000, it gets the same rank: the highest on → Chapter 21: Nonparametric Methods: When Assumptions Fail
Reading a box plot:
Box width = IQR (spread of the middle 50%) - Median position in box = symmetry or skew - Whisker lengths = range of non-outlier data - Dots beyond whiskers = potential outliers - Comparing box plots side by side = comparing distributions → Key Takeaways: Numerical Summaries — Center, Spread, and Shape
Reading the CI:
Contains zero → no significant difference - Entirely positive → Group 1 plausibly higher - Entirely negative → Group 1 plausibly lower - Width → precision of the estimate → Key Takeaways: Comparing Two Groups
this is a threshold concept that shows how data can tell opposite stories at different levels of aggregation. 3. **Apply ethical frameworks to data analysis** — from collection through reporting, every step involves ethical choices. → Chapter 27: Ethical Data Practice — Instructor Notes
Recommended Data Sources:
**CDC WONDER** (wonder.cdc.gov): Mortality data, birth data, environmental health data. Example datasets include cause-of-death by county, infant mortality rates, cancer incidence. - **Behavioral Risk Factor Surveillance System (BRFSS)**: The largest continuously conducted health survey in the world → Capstone Project 1: Public Health Data Investigation
Recommended datasets (all free and public):
**CDC BRFSS** — health behaviors and outcomes across U.S. states - **Gapminder** — life expectancy, GDP, and population across countries and decades - **U.S. College Scorecard** — college costs, graduation rates, and earnings - **World Happiness Report** — national happiness scores and contributing → How to Use This Book
Red flags from `.describe()`:
`watch_time_min` has a minimum of **-5.2** (impossible — can't watch negative minutes) - `watch_time_min` maximum is **15,840 minutes** = 264 hours = 11 days straight. Possible bot or data error. - `age` minimum is **-99** (impossible — likely a placeholder for "unknown") - `age` maximum is **999** → Case Study: Alex's StreamVibe Cleaning Log — A Step-by-Step Template
a subtle but powerful idea. Students scoring in the top 10% on one exam will, on average, score lower on the next exam — not because they got worse, but because extreme scores tend to be partly due to chance. This concept explains many real-world phenomena (sports "slumps," the "sophomore jinx") and → Chapter 22: Correlation and Simple Linear Regression — Instructor Notes
[ ] My notebook runs from top to bottom without errors (Restart and Run All) - [ ] All data files are included or download instructions are provided - [ ] All imports are at the top - [ ] Code is commented - [ ] Random seeds are set → Capstone Rubric
Required elements (choose at least two):
**Alternative test:** If you used a parametric test, also run the nonparametric equivalent (or vice versa). Do the conclusions change? - **Subgroup analysis:** Does the disparity vary across subgroups? (e.g., does a racial disparity in sentencing look different for drug offenses vs. violent offenses → Capstone Project 3: Social Justice Data Audit
Required elements:
Clearly define the groups being compared and the metric of interest - State null and alternative hypotheses - Choose the appropriate test: - Two-sample t-test (for comparing means of two independent groups) - Paired t-test (for before/after or matched comparisons) - Two-proportion z-test (for compar → Capstone Project 2: Business Analytics Report
Requirements:
Open with the problem and why it matters - Summarize key findings in plain language (no jargon, no formulas) - Include 2-3 well-designed visualizations that support your narrative - Clearly state what the data shows and, equally important, what it does *not* show - End with actionable recommendation → Capstone Project 1: Public Health Data Investigation
Resampling
the insight that you can learn about the population by cleverly reusing the sample. This is a modern computational approach that was impossible before computers. Students who grasp this idea have a deeper understanding of what inference is actually doing. → Chapter 18: Bootstrap and Simulation-Based Inference — Instructor Notes
you can't have a negative F (it's a ratio of positive quantities) - It starts at 0 and has a long right tail - As $df_2$ gets large, the distribution becomes more concentrated around $F = 1$ - It was named in honor of Ronald A. Fisher, the statistician who developed ANOVA in the 1920s → Chapter 20: Analysis of Variance (ANOVA)
the t-test's ability to give approximately correct results even when assumptions aren't perfectly met. The guidelines were: for $n \geq 30$, the CLT handles most non-normality. For $15 \leq n < 30$, check for outliers and strong skew. For $n < 15$, you really need approximate normality. > > Now we f → Chapter 21: Nonparametric Methods: When Assumptions Fail
Rules of thumb:
For small datasets (< 50 observations): 5-7 bins - For medium datasets (50-300): 8-15 bins - For large datasets (300+): 15-25 bins - A popular formula: number of bins ≈ √n (the square root of the number of observations) → Chapter 5: Exploring Data: Graphs and Descriptive Statistics
Run `.info()`
Check data types and non-null counts - [ ] **Run `.describe()`** — Look for impossible min/max, suspicious means, large standard deviations - [ ] **Run `.value_counts()`** on categorical columns — Check for inconsistent categories - [ ] **Check for duplicates** — `df.duplicated().sum()` - [ ] **Coun → Key Takeaways: Data Wrangling — Cleaning and Preparing Real Data
the very thing we couldn't derive a formula for. The spread of this distribution tells you how much the sample median varies from sample to sample. And that gives you everything you need to build a confidence interval. → Chapter 18: The Bootstrap and Simulation-Based Inference
sampling variability
the natural variation that occurs because different random samples contain different individuals. We've been aware of this concept since Chapter 1, when we noticed that a product's 4.2-star rating based on 47 reviews was more trustworthy than a 4.5-star rating based on 3 reviews. Now we're going to → Chapter 11: Sampling Distributions and the Central Limit Theorem
Set up and use a Jupyter notebook
students need this working by the end of class to stay on pace. 2. **Perform basic pandas operations** (load, view, filter, sort) — these are the building blocks for every subsequent lab. 3. **Navigate between Python and spreadsheet approaches** — students should know both exist and when each is app → Chapter 3: Your Data Toolkit — Instructor Notes
Setup (from Washington's expanded dataset):
The base rate of re-offense in the studied population is 20%: $P(\text{re-offense}) = 0.20$. - The algorithm flags 75% of people who will re-offend as "high risk": $P(\text{high risk} \mid \text{re-offense}) = 0.75$ (the algorithm's sensitivity). - The algorithm flags 22% of people who will NOT re-o → Chapter 9: Conditional Probability and Bayes' Theorem
Setup:
Overall, 15% of users who are shown a recommendation click and watch it. This is the **prior**: $P(\text{watch}) = 0.15$. - Among users who watched, 60% had previously watched a movie in the same genre. This is the **likelihood**: $P(\text{same genre} \mid \text{watch}) = 0.60$. - Among users who di → Chapter 9: Conditional Probability and Bayes' Theorem
Shape
Is it symmetric? Skewed? How many peaks? 2. **Center** — Where is the "middle" of the data? 3. **Spread** — How far does the data stretch? 4. **Unusual features** — Outliers? Gaps? Clusters? → Chapter 5: Exploring Data: Graphs and Descriptive Statistics
Shrink
This is the most common case. The simple regression coefficient was "bloated" because it included the effects of correlated omitted variables. Adding those variables deflates the original coefficient to its "true" partial effect. (This happened with Maya's poverty rate: 11.4 → 5.81.) → Chapter 23: Multiple Regression: The Real World Has More Than One Variable
Similar distribution shapes
the two populations should have roughly the same shape, just shifted horizontally (if you want to interpret the test as comparing medians; otherwise, it's a general "stochastic dominance" test) 4. **At least ordinal data** — the observations need to be rankable → Chapter 21: Nonparametric Methods: When Assumptions Fail
Simpson's paradox
data can tell opposite stories at different levels of aggregation. This is deeply counterintuitive and challenges students' trust in simple summaries. Once understood, it permanently changes how students think about aggregated data. → Chapter 27: Ethical Data Practice — Instructor Notes
skewed right
a long tail to the right is pulling the mean above the median. > 3. Standard deviation measures the **typical distance** of values from the mean. It tells you how spread out the data is around the center. > 4. For **bell-shaped, symmetric** distributions: about **68%** of data falls within 1 SD of t → Chapter 6: Numerical Summaries: Center, Spread, and Shape
spurious correlation
two variables that track each other over time purely by coincidence. Both happened to increase over the same period. The correlation is real (the numbers genuinely co-vary), but the relationship is meaningless. → Chapter 22: Correlation and Simple Linear Regression
they strip away the original units and put everything on the same "standard deviations from the mean" scale. You'll use z-scores throughout this course — they're the foundation of hypothesis testing in Chapter 13. → Chapter 6: Numerical Summaries: Center, Spread, and Shape
State hypotheses:
$H_0$: The variable follows the specified distribution - $H_a$: The variable does not follow the specified distribution 2. **Calculate expected frequencies:** $E_i = n \times p_i$ where $p_i$ is the hypothesized proportion for category $i$ 3. **Check conditions:** All expected counts $\geq 5$ 4. **C → Key Takeaways: Chi-Square Tests: Categorical Data Analysis
statistic
a number calculated from a sample of shots (the 65 she's taken so far). It's his best estimate of the parameter, but it's not exactly right. If Daria took another 65 shots, she might shoot 35% or 41%. The statistic *varies* from sample to sample; the parameter does not. → Chapter 2: Types of Data and the Language of Statistics
Statistical Analysis
[ ] I've used at least three distinct statistical methods - [ ] Each method is appropriate for the data type and question - [ ] I've verified conditions/assumptions for each method - [ ] I've calculated and interpreted effect sizes - [ ] I've distinguished between statistical and practical significa → Capstone Rubric
Statistical thinking
seeing the world through the lens of variation and uncertainty. This is a gradual shift that begins here and deepens throughout the course. Don't expect it to click fully in Chapter 1; plant the seed. → Chapter 1: Why Statistics Matters — Instructor Notes
Statistics is about decisions under uncertainty
not just formulas and calculations. Frame this as the course's central idea from day one. 2. **Descriptive vs. inferential statistics** — students need this distinction immediately; it structures the entire course. 3. **AI systems run on statistics** — this motivates the course for students who thin → Chapter 1: Why Statistics Matters — Instructor Notes
StatQuest: "Chi-Square Tests" (YouTube)
visual walkthrough of the goodness-of-fit and independence tests - **Khan Academy: "Chi-Square Distribution" (khanacademy.org)** — the distribution behind the test - **StatKey: Chi-Square Test module** — you can compare the chi-square test to a simulation-based version, connecting Chapter 18's ideas → Further Reading: The Bootstrap and Simulation-Based Inference
StatQuest: "Confidence Intervals" (YouTube)
Josh Starmer's explanation of what "95% confident" really means - **OnlineStatBook: Confidence Interval Simulation** (https://onlinestatbook.com/stat_sim/conf_interval/index.html) — build confidence intervals interactively and watch the coverage probability → Further Reading: Sampling Distributions and the Central Limit Theorem
StatQuest: "Hypothesis Testing" (YouTube)
Josh Starmer's explanation of the logic behind hypothesis tests - **Seeing Theory — Hypothesis Testing** (https://seeing-theory.brown.edu/frequentist-inference/) — interactive visualization of p-values and rejection regions - **Wheelan, *Naked Statistics*, Chapter 9** — accessible introduction to hy → Further Reading: Confidence Intervals: Estimating with Uncertainty
StatQuest: "One-Proportion Z-Test" (YouTube)
focused walkthrough of the proportion test - **Khan Academy: "Hypothesis Test for a Proportion" (khanacademy.org)** — multiple worked examples - **Seeing Theory: Hypothesis Testing module** — interactive p-value visualization for proportions → Further Reading: Hypothesis Testing: Making Decisions with Data
StatQuest: "One-Way ANOVA" (YouTube)
clear visual walkthrough of the $F$-test - **Khan Academy: "ANOVA" (khanacademy.org)** — step-by-step introduction to the decomposition of variability - **SciPy documentation: `scipy.stats.f_oneway`** — the Python function for one-way ANOVA → Further Reading: Chi-Square Tests: Categorical Data Analysis
StatQuest: "Statistical Power" (YouTube)
clear visual explanation of what power is and why it matters - **Khan Academy: "Effect Size" (khanacademy.org)** — Cohen's d and its interpretation - **Seeing Theory: Power module** — interactive power curve visualization → Further Reading: Comparing Two Groups
StatQuest: "Student's t-test" (YouTube)
focused walkthrough of the one-sample t-test - **Khan Academy: "One-sample t-test" (khanacademy.org)** — multiple worked examples - **Seeing Theory: Frequentist Inference module** — interactive t-test visualization → Further Reading: Inference for Proportions
StatQuest: "Two-Sample t-Test" (YouTube)
clear walkthrough of the independent-samples t-test - **Khan Academy: "Paired t-Test" (khanacademy.org)** — multiple worked examples of before-and-after designs - **Seeing Theory: Frequentist Inference module** — interactive visualization of two-sample comparisons → Further Reading: Inference for Means
*Random sample?* Yes — Maya used a random sample from the county's health records. - *Independence?* The county has 500,000 adults. Is $120 \leq 0.10 \times 500{,}000 = 50{,}000$? Yes, easily. - *Nearly normal or large $n$?* $n = 120 \geq 30$, so the CLT guarantees the sampling distribution of $\bar → Chapter 12: Confidence Intervals: Estimating with Uncertainty
Of the 100 with disease: 99% test positive → **99 true positives**, 1 false negative. - Of the 99,900 without disease: 2% test positive → **1,998 false positives**, 97,902 true negatives. → Chapter 9: Conditional Probability and Bayes' Theorem
CDC's Behavioral Risk Factor Surveillance System (BRFSS) - Gapminder (global health and economics) - U.S. College Scorecard (education outcomes) - World Happiness Report - NOAA Climate Data Online → Introductory Statistics: Making Sense of Data in the Age of AI
they fit the model on one portion and evaluate it on another. If you'd tested your Chapter 22 regression model on a held-out sample and the $R^2$ dropped dramatically, that would have been a sign of overfitting. → Chapter 26: Statistics and AI: Being a Critical Consumer of Data
What are my incentives? Am I under pressure to find certain results? > 2. **The decision-maker** — Who is using this analysis? What decisions will they make? > 3. **The subjects** — Whose data is being analyzed? Did they consent? Can they be harmed? > 4. **The affected community** — Who will be impa → Chapter 27: Lies, Damn Lies, and Statistics: Ethical Data Practice
this is the most important threshold concept in the entire course. It is the bridge from probability to inference. Once students truly understand the CLT, confidence intervals (Ch. 12) and hypothesis tests (Ch. 13) make logical sense. Without it, those chapters are just recipes to follow. → Chapter 11: Sampling Distributions and the Central Limit Theorem — Instructor Notes
The courtroom analogy:
$H_0$ = presumption of innocence - Data = prosecution's evidence - p-value = how convincing the evidence is - $\alpha$ = "beyond a reasonable doubt" threshold - Reject $H_0$ = guilty verdict - Fail to reject $H_0$ = not guilty (NOT the same as innocent) → Key Takeaways: Hypothesis Testing: Making Decisions with Data
The data type is the same
continuous numerical, ratio level — but the **operational definition** changes what the numbers mean. This is why data dictionaries are essential: two teams working with "watch time" could be measuring fundamentally different things. → Case Study: Classifying Data at Scale — When Every Click Becomes Data
one of the most misunderstood concepts in all of science. Getting this right transforms statistical reasoning. Getting it wrong leads to the kind of errors documented in the replication crisis. Plan to spend 15-20 minutes specifically on what the p-value does NOT mean. → Chapter 13: Hypothesis Testing — Instructor Notes
They tend to agree when:
Sample sizes are moderate to large ($n \geq 20$ per group) - The data are approximately normal or at least symmetric - There are no extreme outliers - The data are on an interval or ratio scale → Chapter 21: Nonparametric Methods: When Assumptions Fail
They tend to disagree when:
Sample sizes are small ($n < 15$) and the data are skewed - Heavy outliers are present (these inflate the parametric test's standard error) - The data are ordinal (means may not be meaningful) - The distributions have very different shapes across groups → Chapter 21: Nonparametric Methods: When Assumptions Fail
Thinking in odds
the shift from probability to odds/log-odds is a genuine conceptual leap. Students who can move fluidly between probability, odds, and log-odds have the foundation for more advanced modeling. → Chapter 24: Logistic Regression — Instructor Notes
which is what Alex has been waiting for with her A/B test, and what Professor Washington needs for his algorithm audit. → Chapter 14: Inference for Proportions
they simply don't have enough participants to reliably detect the effects they're looking for. An underpowered study is like trying to spot a bird with binoculars that are out of focus. The bird might be right there, but you'll never see it. → Chapter 17: Power, Effect Sizes, and What "Significant" Really Means
Use a spreadsheet when:
You're doing quick, one-off calculations on small data (under ~1,000 rows) - You need to manually enter or edit data - You're sharing results with someone who doesn't know Python - You want to quickly eyeball data by scrolling through it → Chapter 3: Your Data Toolkit: Python, Excel, and Jupyter Notebooks
Use both when:
You want to check whether the formula-based and bootstrap results agree (they should for standard statistics under good conditions) - You're learning statistics and want to build intuition about sampling distributions → Chapter 18: The Bootstrap and Simulation-Based Inference
Use formula-based methods when:
You're computing a CI or test for a mean or proportion - The sample is large enough for the CLT - The data are reasonably normal (or you have $n \geq 30$) - You want a quick answer without programming → Chapter 18: The Bootstrap and Simulation-Based Inference
Use Python when:
Your dataset has more than ~1,000 rows - You need to reproduce your analysis later (or share exact steps) - You're doing anything that requires multiple steps (filter, then calculate, then graph) - You'll need to do the same analysis again on new data - You need statistical tests beyond basic averag → Chapter 3: Your Data Toolkit: Python, Excel, and Jupyter Notebooks
Use simulation-based methods when:
You need inference for a non-standard statistic (median, ratio, correlation, etc.) - Your data are non-normal and your sample is moderate-sized - You want to avoid distributional assumptions - The formula-based conditions are questionable → Chapter 18: The Bootstrap and Simulation-Based Inference
V
Visualization
[ ] Every graph has a title, axis labels, and legend (if needed) - [ ] I've used appropriate graph types for each variable type - [ ] My graphs clearly communicate their intended message - [ ] I've included a variety of visualization types - [ ] My visualizations are integrated with my narrative → Capstone Rubric
W
What "95% confidence" really means
it's about the process, not the specific interval. This interpretation issue is the most commonly tested misconception on AP Statistics exams and in intro stats courses nationwide. → Chapter 12: Confidence Intervals — Instructor Notes
What a p-value is NOT:
It is NOT the probability that the null hypothesis is true - It is NOT the probability that the result happened by chance - It is NOT the probability that you'll get the same result if you repeat the study → Chapter 25: Communicating with Data: Telling Stories with Numbers
What makes this book different:
**Conversational tone** — like learning from a friend who happens to be great at explaining things - **Real-world examples** from healthcare, sports, technology, criminal justice, and everyday life - **Python and Excel** side by side — learn the tools you'll actually use - **Progressive portfolio pr → Introductory Statistics
What research tells us:
A 2012 study in *Pediatrics* found that parents who received false-positive newborn screening results experienced significantly elevated anxiety and depression levels even after the results were cleared. - A follow-up study found that some parents continued to perceive their children as "vulnerable" → Case Study 1: Medical Screening — When a Positive Test Doesn't Mean What You Think
When normality DOES matter:
Small sample sizes ($n < 30$): With small samples, the CLT hasn't kicked in, so the shape of the data matters more - Extreme outliers: Even robust procedures break down when there are extreme outliers - Prediction intervals: If you're predicting *individual* outcomes (not averages), you need the und → Chapter 10: Probability Distributions and the Normal Curve
When normality DOESN'T matter much:
Large sample sizes ($n > 30$ or so): The CLT rescues you - Mild skewness: A little skewness usually doesn't cause problems - Sample means and proportions: Even if individual observations aren't normal, their averages tend to be → Chapter 10: Probability Distributions and the Normal Curve
You have more than 5-6 categories (the slices become impossible to compare) - Categories have similar proportions (can you tell the difference between 22% and 24% slices? Neither can anyone else) - The data doesn't represent parts of a whole - You need precise comparisons (bar charts are always bett → Chapter 5: Exploring Data: Graphs and Descriptive Statistics
When pie charts work:
You have a small number of categories (3-5) - You want to show parts of a whole (must sum to 100%) - One or two categories dominate, and that dominance is the main story - Your audience is non-technical and familiar with pie charts → Chapter 5: Exploring Data: Graphs and Descriptive Statistics
$\bar{x}$ = sample mean - $\mu_0$ = hypothesized population mean (from $H_0$) - $s$ = sample standard deviation - $n$ = sample size - $df = n - 1$ (degrees of freedom) → Key Takeaways: Inference for Means
When $b_0 + b_1 x$ is a large positive number (say, +10), $e^{-10}$ is tiny, so $P \approx \frac{1}{1+0} \approx 1$ - When $b_0 + b_1 x$ is a large negative number (say, -10), $e^{10}$ is huge, so $P \approx \frac{1}{1+22026} \approx 0$ - When $b_0 + b_1 x = 0$, $e^0 = 1$, so $P = \frac{1}{1+1} = 0. → Chapter 24: Logistic Regression: When the Outcome Is Yes or No
Lead with the conclusion, not the methodology - Use plain language (no jargon, no formulas, no p-values in the main text) - Present numbers in context ("Black applicants were denied mortgages at 1.8 times the rate of White applicants with similar incomes and credit scores" — not "the chi-square test → Capstone Project 3: Social Justice Data Audit
Y
Yes
same person, same location, matched pair | **Paired t-test** | The pairing captures within-pair change | | **No** — different people, different units, no matching | **Two-sample t-test** | The groups are independent | → Chapter 16: Comparing Two Groups
Your business question must:
Be specific enough to guide analysis but broad enough to require multiple techniques - Have clear implications for a business decision - Involve at least one comparison between groups or one predictive relationship - Be answerable with the data available (don't promise what the data can't deliver) → Capstone Project 2: Business Analytics Report
Your investigation question must:
Involve a clear comparison between groups defined by a protected or socially relevant characteristic (race, gender, income, disability, geography, etc.) - Be answerable with the data available — don't claim to measure what the data doesn't contain - Be framed neutrally: you're investigating whether → Capstone Project 3: Social Justice Data Audit
Your research question must:
Be specific and answerable with the data you have - Involve at least one numerical variable and at least one categorical variable - Be relevant to a real public health concern - Require more than descriptive statistics to answer (i.e., it should call for inference) → Capstone Project 1: Public Health Data Investigation
each observation's distance from the mean, measured in standard deviations. So the correlation coefficient is the average product of paired z-scores. > > That's all it is. The standard deviation from Chapter 6 is doing the heavy lifting inside the correlation formula. → Chapter 22: Correlation and Simple Linear Regression