Quiz: The Bootstrap and Simulation-Based Inference

Q: Approximately what percentage of the original observations appear in a single bootstrap sample (for large )? (a) 100% (b) 50% (c) 63.2% (d) 95%

(c) 63.2% Each observation has a probability of of not appearing in a bootstrap sample. So approximately of observations appear at least once. This means about 36.8% of observations are left out of any single bootstrap sample.

Q: A permutation test shuffles group labels 10,000 times. The observed difference in means is 4.8. Of the 10,000 shuffled differences, 320 are greater than or equal to 4.8. The one-sided p-value is: (a) 4.8 / 10,000 = 0.00048 (b) 320 / 10,000 = 0.032 (c) (10,000 - 320) / 10,000 = 0.968 (d) 320 / 4.8 = 66.7

(b) 320 / 10,000 = 0.032 The p-value is the proportion of permuted test statistics that are as extreme as or more extreme than the observed value. Since 320 out of 10,000 shuffled differences are , the one-sided p-value is . For a two-sided test, you would count permuted differences with .

Q: You have a sample of 30 observations and want a 90% bootstrap CI. You should take the ___ and ___ percentiles of the bootstrap distribution. (a) 2.5th and 97.5th (b) 5th and 95th (c) 10th and 90th (d) 0.5th and 99.5th

(b) 5th and 95th For a 90% CI, , so you take the th percentile and the th percentile. This leaves 5% in each tail, capturing the middle 90%. Option (a) gives a 95% CI, option (c) gives an 80% CI, and option (d) gives a 99% CI.

Contributors

Quiz: The Bootstrap and Simulation-Based Inference

Test your understanding of bootstrap resampling, bootstrap confidence intervals, permutation tests, and the comparison between formula-based and simulation-based inference. Try to answer each question before revealing the answer.

1. The bootstrap method estimates the sampling distribution of a statistic by:

(a) Drawing repeated samples from the population (b) Resampling from the original sample with replacement (c) Resampling from the original sample without replacement (d) Using the Central Limit Theorem to derive the distribution mathematically

Answer

**(b) Resampling from the original sample with replacement.** The bootstrap creates new "virtual samples" by drawing from the original sample *with replacement*. Option (a) would give the actual sampling distribution, but we can't do this in practice (we only have one sample). Option (c) would just rearrange the same data and give the same statistic every time. Option (d) describes the formula-based approach, not the bootstrap.

2. Why must bootstrap sampling be done with replacement?

(a) To ensure each bootstrap sample is larger than the original (b) To allow some observations to appear multiple times and others to be left out, creating genuine variation (c) To make the bootstrap distribution exactly match the sampling distribution (d) To avoid violating the independence assumption

Answer

**(b) To allow some observations to appear multiple times and others to be left out, creating genuine variation.** Without replacement, every bootstrap sample would contain exactly the same observations (just reordered), and the statistic would be identical every time. With replacement, different values are emphasized in different samples, which mimics the variation that would occur if we could actually draw new samples from the population.

3. Approximately what percentage of the original observations appear in a single bootstrap sample (for large $n$)?

(a) 100% (b) 50% (c) 63.2% (d) 95%

Answer

**(c) 63.2%** Each observation has a probability of $(1 - 1/n)^n \approx 1/e \approx 0.368$ of *not* appearing in a bootstrap sample. So approximately $1 - 1/e \approx 63.2\%$ of observations appear at least once. This means about 36.8% of observations are left out of any single bootstrap sample.

4. The bootstrap distribution is centered at:

(a) The population parameter (b) The sample statistic (c) Zero (d) The critical value

Answer

**(b) The sample statistic.** The bootstrap distribution is centered at the sample statistic (e.g., $\bar{x}$, the sample median), not the population parameter (e.g., $\mu$, the population median). This is a key difference from the true sampling distribution, which is centered at the population parameter. The bootstrap captures the right *spread* but is centered at the sample estimate.

5. A researcher computes 10,000 bootstrap medians and finds that the 2.5th percentile is 14.3 and the 97.5th percentile is 22.8. The 95% bootstrap confidence interval (percentile method) for the population median is:

(a) (14.3, 22.8) (b) (22.8, 14.3) (c) (14.3 - 22.8, 14.3 + 22.8) (d) Cannot be determined without knowing the sample median

Answer

**(a) (14.3, 22.8)** The percentile method is straightforward: the 95% CI is simply the interval between the 2.5th and 97.5th percentiles of the bootstrap distribution. No additional calculation is needed. This is one of the appeals of the percentile method — it's the simplest bootstrap CI to compute and interpret.

6. A student increases the number of bootstrap samples from 5,000 to 50,000. This will:

(a) Make the confidence interval narrower (b) Make the bootstrap standard error smaller (c) Make the CI endpoints more precise (less Monte Carlo noise) (d) Correct for bias in the original sample

Answer

**(c) Make the CI endpoints more precise (less Monte Carlo noise).** Increasing $B$ reduces the randomness in the bootstrap procedure itself (Monte Carlo noise), making the CI endpoints more stable — if you ran the bootstrap again with a different random seed, you'd get a very similar CI. But it does *not* make the CI narrower. The width of the CI is determined by the original sample size $n$ and the variability in the data. Only increasing $n$ (collecting more data) would make the CI narrower.

7. Which of the following statistics can be handled by the bootstrap but NOT by standard formula-based methods?

(a) The sample mean (b) The sample proportion (c) The sample median (d) All of the above can be handled by both methods

Answer

**(c) The sample median.** Standard formula-based methods ($t$-intervals, $z$-intervals) work well for means and proportions, which have well-known standard error formulas. The median does not have a simple, widely-used standard error formula in introductory statistics. The bootstrap can handle all three — means, proportions, and medians — making it especially valuable for the median and other complex statistics.

8. In a permutation test, the group labels are shuffled to simulate:

(a) The alternative hypothesis (there is a difference) (b) The null hypothesis (there is no difference between groups) (c) The bootstrap distribution (d) The Central Limit Theorem

Answer

**(b) The null hypothesis (there is no difference between groups).** The permutation test directly operationalizes $H_0$: if the groups really come from the same population, then the group labels are meaningless and could be randomly reassigned. By shuffling the labels many times and computing the test statistic each time, we build the null distribution — the distribution of the test statistic under $H_0$.

9. A permutation test shuffles group labels 10,000 times. The observed difference in means is 4.8. Of the 10,000 shuffled differences, 320 are greater than or equal to 4.8. The one-sided p-value is:

(a) 4.8 / 10,000 = 0.00048 (b) 320 / 10,000 = 0.032 (c) (10,000 - 320) / 10,000 = 0.968 (d) 320 / 4.8 = 66.7

Answer

**(b) 320 / 10,000 = 0.032** The p-value is the proportion of permuted test statistics that are as extreme as or more extreme than the observed value. Since 320 out of 10,000 shuffled differences are $\geq 4.8$, the one-sided p-value is $320/10{,}000 = 0.032$. For a two-sided test, you would count permuted differences with $|\text{diff}| \geq 4.8$.

10. Which statement best describes the relationship between the bootstrap and the permutation test?

(a) They are the same method applied in different ways (b) The bootstrap resamples within groups to estimate variability; the permutation test shuffles between groups to test $H_0$ (c) The bootstrap tests hypotheses; the permutation test builds confidence intervals (d) The bootstrap is for proportions; the permutation test is for means

Answer

**(b) The bootstrap resamples within groups to estimate variability; the permutation test shuffles between groups to test $H_0$.** These are related but distinct methods. The bootstrap resamples *with replacement* from the data to approximate the sampling distribution and build confidence intervals. The permutation test shuffles *without replacement* between groups to build the null distribution and compute p-values. The bootstrap is primarily for estimation (CIs), while the permutation test is primarily for hypothesis testing.

11. The bootstrap is LEAST appropriate in which scenario?

(a) Constructing a CI for the median with $n = 40$ (b) Constructing a CI for the mean with $n = 6$ from a heavily skewed population (c) Constructing a CI for the correlation coefficient with $n = 50$ (d) Constructing a CI for the mean with $n = 200$ from a normal population

Answer

**(b) Constructing a CI for the mean with $n = 6$ from a heavily skewed population.** With $n = 6$, the sample is too small to reliably represent the population — especially a heavily skewed one. The bootstrap resamples from the sample, so if the sample is a poor model of the population, the bootstrap distribution will also be poor. Options (a), (c), and (d) all have sample sizes large enough for the bootstrap to work well (though (d) could also be handled perfectly by the $t$-interval).

12. A researcher conducts both a Welch's $t$-test and a permutation test on the same data (two groups, $n_1 = 50$, $n_2 = 50$, approximately normal data). The $t$-test gives $p = 0.023$ and the permutation test gives $p = 0.025$. This close agreement:

(a) Is surprising and suggests an error in one method (b) Is expected — both methods are estimating the same thing when conditions are met (c) Proves that the permutation test is more accurate (d) Proves that the $t$-test is more accurate

Answer

**(b) Is expected — both methods are estimating the same thing when conditions are met.** When the $t$-test conditions are satisfied (independence, approximate normality, adequate sample sizes), both methods approximate the same null distribution. The slight difference ($0.023$ vs. $0.025$) is due to Monte Carlo variability in the permutation test — this would shrink with more permutations. This agreement is reassuring, not alarming.

13. Bradley Efron introduced the bootstrap in:

Answer

**(c) 1979** Bradley Efron published "Bootstrap Methods: Another Look at the Jackknife" in 1979 while at Stanford University. The method required significant computing power, which limited its practical use until the 1990s when computers became powerful enough for routine bootstrap calculations. For reference, William Gosset introduced the $t$-test in 1908 (option a), and Monte Carlo methods were developed in the 1940s (option b).

14. The name "bootstrap" comes from:

(a) A type of computer programming loop (b) The phrase "pulling yourself up by your own bootstraps" — learning about the population from the sample itself (c) The name of the first computer that could run the method (d) Bootstrap sampling without replacement

Answer

**(b) The phrase "pulling yourself up by your own bootstraps" — learning about the population from the sample itself.** Just as pulling yourself up by your own bootstraps is a seemingly impossible feat, learning about sampling variability from a single sample seems paradoxical. Yet the bootstrap does exactly this — it "pulls" information about the sampling distribution from the sample itself. The name captures both the impossibility and the cleverness of the idea.

15. A Monte Carlo simulation is:

(a) A method that always uses the normal distribution (b) Any method that uses repeated random sampling to approximate a numerical result (c) A method specific to gambling applications (d) Another name for the bootstrap

Answer

**(b) Any method that uses repeated random sampling to approximate a numerical result.** Monte Carlo simulation is a broad class of methods that use randomness to compute quantities that are hard to calculate analytically. The bootstrap and permutation tests are both examples of Monte Carlo methods. The name comes from the Monte Carlo Casino, reflecting the role of chance. Monte Carlo methods were developed in the 1940s by Ulam, von Neumann, and Metropolis.

16. You have a sample of 30 observations and want a 90% bootstrap CI. You should take the ___ and ___ percentiles of the bootstrap distribution.

(a) 2.5th and 97.5th (b) 5th and 95th (c) 10th and 90th (d) 0.5th and 99.5th

Answer

**(b) 5th and 95th** For a 90% CI, $\alpha = 0.10$, so you take the $\alpha/2 = 5$th percentile and the $1 - \alpha/2 = 95$th percentile. This leaves 5% in each tail, capturing the middle 90%. Option (a) gives a 95% CI, option (c) gives an 80% CI, and option (d) gives a 99% CI.

17. Which statement about the bootstrap standard error is TRUE?

(a) It equals $s/\sqrt{n}$ for the mean (b) It is the standard deviation of the bootstrap distribution (c) It decreases as the number of bootstrap samples $B$ increases (d) It is always larger than the formula-based standard error

Answer

**(b) It is the standard deviation of the bootstrap distribution.** The bootstrap SE is simply the SD of the $B$ bootstrap statistics. For the mean, it *approximates* $s/\sqrt{n}$ but is not exactly equal to it (option a). It does not systematically decrease with $B$ (option c) — what decreases is the Monte Carlo noise in the *estimate* of the bootstrap SE. And it is not always larger than the formula-based SE (option d).

18. When would you use the bootstrap instead of a formula-based method?

(a) When you have a very large sample from a normal population (b) When you need a CI for a proportion with $n = 1{,}000$ (c) When you need a CI for the median of a skewed distribution (d) When you want the fastest possible computation

Answer

**(c) When you need a CI for the median of a skewed distribution.** The bootstrap shines when formula-based methods don't exist or are unreliable. The median has no standard formula-based CI in introductory statistics, making the bootstrap the natural choice. Options (a) and (b) describe situations where formula-based methods work perfectly well ($t$-interval and $z$-interval, respectively). Option (d) favors formula-based methods, which are computationally cheaper.

19. In a permutation test, you observe a difference of 3.1 between two group means. After 10,000 permutations, the largest shuffled difference is 2.8. Your p-value is:

(a) Exactly 0 (b) Less than 0.0001 (c) Approximately $1/10{,}001 = 0.0001$ (d) Cannot be determined

Answer

**(b) Less than 0.0001** Since none of the 10,000 permuted differences reached 3.1, the p-value is approximately $0/10{,}000 = 0$. However, we should report this as $p < 0.0001$ rather than $p = 0$, because a finite number of permutations can only establish an upper bound on the p-value. With more permutations, we might eventually see a difference as large as 3.1, so the true p-value is not exactly zero — it's just very small. Some statisticians use $(0 + 1)/(10{,}000 + 1) \approx 0.0001$ as a conservative estimate.

20. A researcher computes a bootstrap CI for the mean and gets (12.3, 15.8). She also computes a $t$-interval and gets (12.1, 16.0). The bootstrap CI is slightly narrower. This is most likely because:

(a) The bootstrap is always more precise than the $t$-interval (b) The bootstrap is capturing the actual (possibly non-normal) shape of the sampling distribution rather than assuming a $t$-distribution (c) The researcher used too few bootstrap samples (d) The $t$-interval is always wider than the bootstrap

Answer

**(b) The bootstrap is capturing the actual (possibly non-normal) shape of the sampling distribution rather than assuming a $t$-distribution.** The $t$-distribution has heavier tails than the normal (as we learned in [Chapter 12](../../part-04-bridge-to-inference/chapter-12-confidence-intervals/index.md)), which makes $t$-intervals slightly wider as a conservative adjustment for the uncertainty in estimating $\sigma$ with $s$. The bootstrap doesn't make this assumption — it builds the distribution empirically. If the data are actually close to normal, the bootstrap may produce a slightly narrower interval because it doesn't add the extra conservatism of the $t$-distribution's heavy tails. Neither method is "always" wider or narrower than the other.