Exercises: Your Statistical Journey Continues

Contributors

Exercises: Your Statistical Journey Continues

These exercises are more reflective and integrative than computational. They're designed to help you consolidate what you've learned, identify areas for growth, and finalize your Data Detective Portfolio. Estimated completion time: 2 hours.

Difficulty Guide: - * Foundational (5-10 min each) - ** Intermediate (10-20 min each) - *** Challenging (20-40 min each) - **** Advanced/Portfolio (40+ min each)

Part A: Cumulative Review *

These questions span the entire course. Answer from memory first, then check the relevant chapter if needed.

A.1. For each term, write a one-sentence definition and name the chapter where it was first introduced:

(a) Standard error

(b) Confidence interval

(c) P-value

(d) Effect size

(e) Confounding variable

(f) Simpson's paradox

A.2. Match each scenario to the most appropriate statistical test:

Scenario	Test Options
(a) Comparing average exam scores between two independent sections of a course	A. One-sample t-test
(b) Testing whether a coin is fair after 200 flips	B. Two-sample t-test
(c) Determining whether there's an association between political party and opinion on a policy	C. Paired t-test
(d) Comparing students' test scores before and after a tutoring program	D. One-sample z-test for proportions
(e) Comparing average wait times across four different hospital emergency rooms	E. Chi-square test of independence
(f) Testing whether average blood pressure in a community exceeds the national average	F. One-way ANOVA

A.3. True or false (explain each):

(a) A statistically significant result is always practically important.

(b) Doubling the sample size cuts the margin of error in half.

(c) A 95% confidence interval means there's a 95% probability that the true parameter is inside the interval.

(d) Correlation implies causation when the correlation coefficient is above 0.90.

(e) A p-value of 0.03 means there's a 3% chance the null hypothesis is true.

(f) Nonparametric tests are always better than parametric tests because they make fewer assumptions.

A.4. The Central Limit Theorem is often called the most important theorem in statistics. In three to four sentences, explain what it says and why it matters for inference.

A.5. Explain the difference between Type I and Type II errors. For each of the following contexts, which type of error has worse consequences? Justify your answer.

(a) A medical screening test for a serious but treatable disease

(b) A criminal trial (think of "innocent until proven guilty" as H_0)

(c) An A/B test deciding whether to launch a new product feature

Part B: Integrative Thinking **

These questions require you to connect ideas across multiple chapters.

B.1. Trace Sam's analysis of Daria's shooting from Chapter 1 through Chapter 28. For each chapter listed below, describe what tool or concept Sam used and what was learned:

(a) Chapter 6 (summary statistics)

(b) Chapter 11 (sampling distributions and standard error)

(c) Chapter 13 (hypothesis testing)

(d) Chapter 14 (inference for proportions)

(e) Chapter 17 (power analysis)

(f) Chapter 28 (resolution with 258 attempts)

B.2. Consider this claim from a news article: "People who use standing desks have lower rates of back pain."

Using concepts from at least five different chapters, write a paragraph analyzing this claim. Your analysis should address: study design, potential confounders, statistical vs. practical significance, the appropriate statistical test, and ethical considerations.

B.3. A pharmaceutical company reports that its new drug reduced headache duration by an average of 12 minutes compared to a placebo (p = 0.001, n = 50,000).

(a) Is this result statistically significant? How do you know?

(b) Is it necessarily practically significant? Why or why not?

(c) Why might the very large sample size be important to consider?

(d) What additional information would you want before recommending this drug?

(e) Connect this to at least three of the six recurring themes.

B.4. Compare and contrast how Maya and James used multiple regression (Chapter 23). For each:

(a) What was the research question?

(b) What was the key predictor variable?

(c) What confounders were included?

(d) How did the coefficient of the key predictor change when confounders were added?

(e) Could either make a causal claim? Why or why not?

Part C: Theme Synthesis **

C.1. For each of the six themes, provide one concrete example from your own life or field of interest (not from the textbook) where the theme applies. Write 2-3 sentences for each.

(a) Theme 1: Statistics as a superpower

(b) Theme 2: Human stories behind the data

(c) Theme 3: AI and algorithms use statistics

(d) Theme 4: Uncertainty is not failure

(e) Theme 5: Correlation does not imply causation

(f) Theme 6: Ethical data practice

C.2. Choose one of the six themes and write a short essay (400-500 words) arguing that it is the most important theme in the book. Use at least three specific examples from different chapters to support your argument.

C.3. A friend who hasn't taken statistics asks you: "What's the most important thing you learned in your stats class?" Write your answer in 100-150 words, using language a non-statistician would understand. Do not use any technical jargon.

Part D: Career Connections **

D.1. Choose the career pathway from Section 28.6 that is closest to your own interests. Then:

(a) Identify one specific question in your field that could be answered with a statistical method you learned in this course.

(b) Describe which method you would use and why.

(c) What data would you need? How would you collect it?

(d) What ethical considerations would be involved?

(e) How would you communicate the results to a non-technical audience in your field?

D.2. For each advanced topic previewed in Section 28.7, name one real-world question it could help answer in your field of interest:

(a) Bayesian statistics

(b) Machine learning

(c) Causal inference

(d) Time series analysis

(e) Survival analysis

Part E: Advanced Topics Preview ***

E.1. Read the preview of Bayesian statistics in Section 28.7. Then answer:

(a) What is the key philosophical difference between frequentist and Bayesian approaches to statistics?

(b) In what situations might a Bayesian approach be more useful than a frequentist approach? In what situations might it be less useful?

(c) How does the concept of a "prior" connect to what you already learned about Bayes' theorem in Chapter 9?

E.2. The data science pipeline described in Section 28.7 has seven steps. For your Data Detective Portfolio project, describe what you did at each step. Where were you most confident? Where were you least confident? What would you do differently?

E.3. A researcher has data on patient recovery times after two different surgical procedures. Some patients dropped out of the study before recovery was observed (censoring).

(a) Why can't the researcher simply calculate the average recovery time for each group, ignoring the patients who dropped out?

(b) How does this connect to concepts you learned about missing data in Chapter 7?

(c) Which advanced method from Section 28.7 is designed to handle this problem?

Part F: Portfolio Finalization ****

F.1. Complete the Portfolio Polishing Checklist (Section 28.8). For any item you had to go back and fix or add, describe what was missing and how you addressed it.

F.2. Write the Executive Summary for your portfolio (300-500 words), following the guidelines in Section 28.12, Step 2.

F.3. Write the Personal Reflection for your portfolio (300-500 words), following the guidelines in Section 28.12, Step 3.

F.4. If you're doing a peer review, complete the following for your partner's portfolio:

(a) What was the strongest statistical analysis in this portfolio? Why?

(b) Identify one place where the interpretation could be more nuanced. What would you add or change?

(c) Is there a finding that could be misinterpreted by a general audience? How would you suggest clarifying it?

(d) Does the ethics section adequately address potential harms and limitations?

(e) What is one thing this portfolio does that you want to incorporate into your own?

Part G: The Last Questions ***

G.1. Look back at the Productive Struggle puzzle from Section 28.1 (the commencement speech about statistics majors earning more). Write a complete, thorough analysis (300-400 words) using concepts from across the course. Your analysis should demonstrate statistical thinking, not just vocabulary.

G.2. Design a study. Choose a question you're genuinely curious about and outline a study that could answer it:

(a) State the research question and hypotheses (Ch. 13)

(b) Describe the study design — observational or experimental? What sampling method? (Ch. 4)

(c) Identify the key variables and their types (Ch. 2)

(d) What statistical test(s) would you use? Why? (Ch. 14-21)

(e) How large a sample would you need? Base this on a power analysis with a reasonable effect size. (Ch. 17)

(f) What confounders should you worry about? How would you handle them? (Ch. 4, 23)

(g) What ethical considerations apply? (Ch. 27)

(h) How would you communicate your findings to a non-technical audience? (Ch. 25)

This is the most comprehensive exercise in the textbook. It asks you to use everything. Take your time with it.

G.3. Write a letter to a future student taking this course. In 200-300 words, tell them: What should they expect? What should they not worry about? What concept will be harder than they think? What concept will be easier? What advice would you give them for the portfolio project?