Key Takeaways: Designing Studies — Sampling and Experiments
One-Sentence Summary
How data is collected determines what it can tell you — observational studies reveal associations, experiments reveal causes, and bias in sampling can make even the largest dataset misleading.
Core Concepts at a Glance
| Concept | Definition | Why It Matters |
|---|---|---|
| Observational study | A study that measures without intervening | Can show association, but not causation — confounders may lurk |
| Experiment | A study that imposes a treatment and measures the response | With randomization, can establish cause-and-effect |
| Confounding variable | A variable related to both the explanatory and response variables | The reason "correlation does not imply causation" — confounders create false signals |
| Bias | A systematic tendency to produce results that are wrong in a particular direction | Doesn't shrink with larger samples; must be prevented by design |
| Randomization | Using chance to select samples or assign treatments | Protects against known and unknown biases — the single most powerful tool in statistics |
Decision Flowchart: What Type of Study Is This?
Does the researcher impose a treatment?
│
├── YES → It's an EXPERIMENT
│ │
│ ├── Was there random assignment to groups?
│ │ ├── YES → Can support CAUSAL claims ✓
│ │ └── NO → Confounding may explain results ✗
│ │
│ ├── Was there a control group?
│ │ ├── YES → Has a baseline for comparison ✓
│ │ └── NO → No way to separate treatment from time/other factors ✗
│ │
│ └── Was blinding used?
│ ├── Double-blind → Minimal bias ✓✓
│ ├── Single-blind → Some protection ✓
│ └── None → Placebo effect and researcher bias possible ✗
│
└── NO → It's an OBSERVATIONAL STUDY
│
├── Shows ASSOCIATION only — not causation
├── Confounding variables are always a concern
└── Can provide strong evidence when:
• Large sample
• Replicated across studies
• Dose-response relationship
• Biologically/theoretically plausible
Decision Flowchart: Which Sampling Method?
Do you need your results to generalize to a population?
│
├── YES → You need a probability sample (random element)
│ │
│ ├── Do you need specific subgroups represented?
│ │ ├── YES → STRATIFIED sampling
│ │ │ (divide into strata, randomly sample within each)
│ │ └── NO → ↓
│ │
│ ├── Is the population geographically spread out?
│ │ ├── YES → CLUSTER sampling
│ │ │ (randomly select entire groups)
│ │ └── NO → ↓
│ │
│ ├── Do you have a complete list of the population?
│ │ ├── YES → SIMPLE RANDOM sampling
│ │ │ (every member has equal chance)
│ │ └── NO → SYSTEMATIC sampling
│ │ (every kth member from available list)
│ │
│ └── Always check for bias regardless of method
│
└── NO → CONVENIENCE sampling is acceptable
(but clearly state limitations — results may not generalize)
Sampling Methods Quick Reference
| Method | The Idea | Strength | Weakness |
|---|---|---|---|
| Simple Random | Everyone has equal chance | Unbiased | Need a complete list |
| Stratified | Random within defined subgroups | Guarantees representation | Must know subgroups in advance |
| Cluster | Select entire groups randomly | Cost-effective for spread-out populations | Less precise per observation |
| Convenience | Whoever is easiest to reach | Cheap and fast | Probably biased — use with extreme caution |
| Systematic | Every kth member from a list | Simple to do | Risk if list has periodic pattern |
Bias Cheat Sheet
| Type of Bias | What It Is | How to Spot It | Example |
|---|---|---|---|
| Selection bias | Sample systematically excludes groups | Ask: "Who is missing from this data?" | Phone surveys miss people without phones |
| Response bias | Questions or context influence answers | Look for leading questions, social desirability | "Don't you agree that..." |
| Nonresponse bias | Non-respondents differ from respondents | Check response rates; ask who didn't reply | Only 24% return a survey — who are the 76%? |
| Survivorship bias | Only "survivors" are visible in the data | Ask: "What am I not seeing?" | Studying bullet holes on planes that returned |
The Causal Claims Checklist
When someone says "X causes Y," ask:
- [ ] Was this an experiment or observational study?
- [ ] Was there random assignment?
- [ ] Was there a control group?
- [ ] Was blinding used?
- [ ] Can I think of a confounding variable?
- [ ] How was the sample selected?
- [ ] How large was the sample?
- [ ] Has this been replicated?
Rule of thumb: If fewer than 5 boxes are checked favorably, be skeptical of a causal claim.
Key Terms
| Term | Definition |
|---|---|
| Observational study | Observes without intervening |
| Experiment | Imposes a treatment to observe responses |
| Random sample | Every member has an equal selection chance |
| Stratified sampling | Random sampling within defined subgroups |
| Cluster sampling | Randomly selecting entire groups |
| Convenience sample | Sampling whoever is easiest to reach |
| Systematic sampling | Selecting every kth member |
| Bias | Systematic tendency toward wrong results in one direction |
| Confounding variable | Related to both explanatory and response variables |
| Randomization | Using chance for selection or assignment |
| Control group | Doesn't receive the treatment |
| Treatment group | Receives the treatment |
| Placebo | Inactive treatment that looks real |
| Blinding | Hiding group assignments from participants and/or researchers |
| Double-blind | Neither participants nor researchers know group assignments |
The One Thing to Remember
If you forget everything else from this chapter, remember this:
The design of a study determines what it can prove. Observational studies show association. Experiments with randomization show causation. And a biased sample — no matter how large — gives you a precise wrong answer.