Case Study 1: Maya's Bootstrap Analysis of Emergency Department Wait Times

The Setup

Dr. Maya Chen faces a familiar problem: hospital administrators want a single number to summarize emergency department performance, and they keep citing the mean wait time. But Maya knows — and you know from Chapter 6 — that the mean is a terrible summary for skewed data. Wait times are heavily right-skewed: most patients are seen within 30 minutes, but a few complex trauma cases or mental health holds can push individual wait times past three hours.

The mean wait time at Maya's hospital is 47 minutes. The median is 28 minutes. These tell very different stories. The mean says "patients wait almost 50 minutes on average." The median says "the typical patient waits less than half an hour." For a patient walking into the ED wondering how long they'll wait, the median is far more informative.

Maya's challenge: she needs to present a confidence interval for the median wait time to the hospital quality committee. The committee understands confidence intervals (she taught them, using the principles from Chapter 12). But the $t$-interval is for means. There is no "median interval" in the standard toolkit.

Enter the bootstrap.

The Data

Maya collected wait times (in minutes) for 50 consecutive patients during a typical weekday. Here's the full analysis:

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

np.random.seed(314)

# Simulated ED wait times — realistic right-skewed distribution
# Mix of routine (exponential ~20 min) and complex cases (exponential ~90 min)
routine = np.random.exponential(scale=20, size=38)
complex_cases = np.random.exponential(scale=90, size=12)
wait_times = np.concatenate([routine, complex_cases])
wait_times = np.round(np.clip(wait_times, 3, 280), 1)  # Min 3 min, max 280

n = len(wait_times)
print("=" * 55)
print("Maya's ED Wait Time Data — Summary")
print("=" * 55)
print(f"  n         = {n}")
print(f"  Mean      = {np.mean(wait_times):.1f} minutes")
print(f"  Median    = {np.median(wait_times):.1f} minutes")
print(f"  SD        = {np.std(wait_times, ddof=1):.1f} minutes")
print(f"  Min       = {np.min(wait_times):.1f} minutes")
print(f"  Max       = {np.max(wait_times):.1f} minutes")
print(f"  Q1        = {np.percentile(wait_times, 25):.1f} minutes")
print(f"  Q3        = {np.percentile(wait_times, 75):.1f} minutes")
print(f"  Skewness  = {stats.skew(wait_times):.2f}")
print()
print("Notice: The mean ({:.1f}) is much higher than the median ({:.1f})".format(
    np.mean(wait_times), np.median(wait_times)))
print("This confirms strong right-skewness.")

The Problem with the t-Interval Here

Let's first check why the $t$-interval isn't ideal — even for the mean:

# t-interval for the mean
se_mean = np.std(wait_times, ddof=1) / np.sqrt(n)
t_star = stats.t.ppf(0.975, df=n-1)
ci_mean_lower = np.mean(wait_times) - t_star * se_mean
ci_mean_upper = np.mean(wait_times) + t_star * se_mean

print("t-interval for the MEAN:")
print(f"  ({ci_mean_lower:.1f}, {ci_mean_upper:.1f}) minutes")
print()
print("But Maya wants a CI for the MEDIAN.")
print("The t-interval formula doesn't apply to medians.")
print("There is no 'median interval' formula in standard statistics.")

The $t$-interval gives a CI for the mean wait time. But Maya needs a CI for the median — and the formula $\bar{x} \pm t^* \cdot s/\sqrt{n}$ doesn't work for medians. The standard error formula $s/\sqrt{n}$ is specific to the sampling distribution of $\bar{x}$, not the sampling distribution of the median.

The Bootstrap Solution

# Bootstrap CI for the median
np.random.seed(42)
n_boot = 10000
boot_medians = np.zeros(n_boot)

for i in range(n_boot):
    boot_sample = np.random.choice(wait_times, size=n, replace=True)
    boot_medians[i] = np.median(boot_sample)

# 95% CI — percentile method
ci_lower = np.percentile(boot_medians, 2.5)
ci_upper = np.percentile(boot_medians, 97.5)
boot_se = np.std(boot_medians)

print("=" * 55)
print("Bootstrap CI for the MEDIAN Wait Time")
print("=" * 55)
print(f"  Observed median:  {np.median(wait_times):.1f} minutes")
print(f"  Bootstrap SE:     {boot_se:.1f} minutes")
print(f"  95% CI:           ({ci_lower:.1f}, {ci_upper:.1f}) minutes")

Comparing Bootstrap CIs for Mean vs. Median

Let's also compute the bootstrap CI for the mean, to compare it with the $t$-interval:

# Bootstrap CI for the mean (for comparison with t-interval)
boot_means = np.zeros(n_boot)
for i in range(n_boot):
    boot_sample = np.random.choice(wait_times, size=n, replace=True)
    boot_means[i] = np.mean(boot_sample)

ci_mean_boot_lower = np.percentile(boot_means, 2.5)
ci_mean_boot_upper = np.percentile(boot_means, 97.5)

print("\n--- Comparison ---")
print(f"{'Method':<35} {'95% CI':<25} {'Width'}")
print("-" * 70)
print(f"{'t-interval for mean':<35} ({ci_mean_lower:.1f}, {ci_mean_upper:.1f})"
      f"{'':>5} {ci_mean_upper - ci_mean_lower:.1f} min")
print(f"{'Bootstrap CI for mean':<35} ({ci_mean_boot_lower:.1f}, "
      f"{ci_mean_boot_upper:.1f}){'':>5} "
      f"{ci_mean_boot_upper - ci_mean_boot_lower:.1f} min")
print(f"{'Bootstrap CI for median':<35} ({ci_lower:.1f}, {ci_upper:.1f})"
      f"{'':>5} {ci_upper - ci_lower:.1f} min")

Notice that: 1. The $t$-interval and bootstrap CI for the mean are similar — as expected when both methods are applicable. 2. The bootstrap CI for the median is narrower and lower than the CI for the mean — reflecting the fact that the median is less affected by the extreme wait times.

Visualization

fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Top left: Original data
axes[0, 0].hist(wait_times, bins=20, edgecolor='black', alpha=0.7,
                color='lightcoral')
axes[0, 0].axvline(np.mean(wait_times), color='blue', linewidth=2,
                    linestyle='--', label=f'Mean = {np.mean(wait_times):.1f}')
axes[0, 0].axvline(np.median(wait_times), color='red', linewidth=2,
                    linestyle='-', label=f'Median = {np.median(wait_times):.1f}')
axes[0, 0].set_xlabel('Wait Time (minutes)')
axes[0, 0].set_ylabel('Frequency')
axes[0, 0].set_title('ED Wait Times (Right-Skewed)')
axes[0, 0].legend()

# Top right: Bootstrap distribution of the median
axes[0, 1].hist(boot_medians, bins=40, edgecolor='black', alpha=0.7,
                color='steelblue')
axes[0, 1].axvline(ci_lower, color='red', linewidth=2, linestyle='--',
                    label=f'2.5th pctl = {ci_lower:.1f}')
axes[0, 1].axvline(ci_upper, color='red', linewidth=2, linestyle='--',
                    label=f'97.5th pctl = {ci_upper:.1f}')
axes[0, 1].axvline(np.median(wait_times), color='darkred', linewidth=2,
                    label=f'Original median = {np.median(wait_times):.1f}')
axes[0, 1].set_xlabel('Bootstrap Median (minutes)')
axes[0, 1].set_ylabel('Frequency')
axes[0, 1].set_title('Bootstrap Distribution of the Median')
axes[0, 1].legend(fontsize=9)

# Bottom left: Bootstrap distribution of the mean
axes[1, 0].hist(boot_means, bins=40, edgecolor='black', alpha=0.7,
                color='lightgreen')
axes[1, 0].axvline(ci_mean_boot_lower, color='red', linewidth=2,
                    linestyle='--', label=f'2.5th = {ci_mean_boot_lower:.1f}')
axes[1, 0].axvline(ci_mean_boot_upper, color='red', linewidth=2,
                    linestyle='--', label=f'97.5th = {ci_mean_boot_upper:.1f}')
axes[1, 0].set_xlabel('Bootstrap Mean (minutes)')
axes[1, 0].set_ylabel('Frequency')
axes[1, 0].set_title('Bootstrap Distribution of the Mean')
axes[1, 0].legend(fontsize=9)

# Bottom right: Comparison of CIs
methods = ['t-interval\n(mean)', 'Bootstrap\n(mean)', 'Bootstrap\n(median)']
lowers = [ci_mean_lower, ci_mean_boot_lower, ci_lower]
uppers = [ci_mean_upper, ci_mean_boot_upper, ci_upper]
centers = [np.mean(wait_times), np.mean(wait_times), np.median(wait_times)]
colors_ci = ['green', 'blue', 'red']

for i, (method, lo, hi, center, color) in enumerate(
        zip(methods, lowers, uppers, centers, colors_ci)):
    axes[1, 1].plot([lo, hi], [i, i], color=color, linewidth=3)
    axes[1, 1].plot(center, i, 'o', color=color, markersize=10)
    axes[1, 1].text(hi + 1, i, f'({lo:.1f}, {hi:.1f})', va='center',
                     fontsize=10)

axes[1, 1].set_yticks(range(3))
axes[1, 1].set_yticklabels(methods)
axes[1, 1].set_xlabel('Wait Time (minutes)')
axes[1, 1].set_title('Comparison of Confidence Intervals')
axes[1, 1].grid(axis='x', alpha=0.3)

plt.tight_layout()
plt.show()

Additional Analyses: 75th Percentile and IQR

Maya realizes that the hospital committee would also benefit from understanding the worst-case waits and the spread of waits. She uses the bootstrap for two more statistics:

# Bootstrap CI for the 75th percentile
boot_q75 = np.zeros(n_boot)
for i in range(n_boot):
    boot_sample = np.random.choice(wait_times, size=n, replace=True)
    boot_q75[i] = np.percentile(boot_sample, 75)

ci_q75 = (np.percentile(boot_q75, 2.5), np.percentile(boot_q75, 97.5))
print(f"75th percentile: {np.percentile(wait_times, 75):.1f} minutes")
print(f"  95% Bootstrap CI: ({ci_q75[0]:.1f}, {ci_q75[1]:.1f}) minutes")
print(f"  Interpretation: 75% of patients wait less than this range")

# Bootstrap CI for the IQR
boot_iqr = np.zeros(n_boot)
for i in range(n_boot):
    boot_sample = np.random.choice(wait_times, size=n, replace=True)
    boot_iqr[i] = np.percentile(boot_sample, 75) - np.percentile(boot_sample, 25)

ci_iqr = (np.percentile(boot_iqr, 2.5), np.percentile(boot_iqr, 97.5))
observed_iqr = np.percentile(wait_times, 75) - np.percentile(wait_times, 25)
print(f"\nIQR: {observed_iqr:.1f} minutes")
print(f"  95% Bootstrap CI: ({ci_iqr[0]:.1f}, {ci_iqr[1]:.1f}) minutes")
print(f"  Interpretation: The middle 50% of patients span this range")

Maya's Report to the Hospital Quality Committee

Here's the paragraph Maya writes for her committee report:

Emergency Department Wait Time Analysis

We analyzed wait times for 50 consecutive patients during a standard weekday shift. Due to the strongly right-skewed nature of ED wait times — where a small number of complex cases create very long waits — we report the median wait time rather than the mean, as the median better represents the typical patient experience.

The observed median wait time was approximately 28 minutes. Using bootstrap resampling (10,000 iterations), we constructed a 95% confidence interval for the true population median wait time of approximately (18, 37) minutes. This means we are 95% confident that the typical patient's wait time falls within this range.

For context, the mean wait time was 47 minutes — substantially higher than the median due to several waits exceeding 100 minutes. A confidence interval for the mean is misleading here because it answers the wrong question: patients want to know their likely wait, not the average of all waits including rare extreme cases.

Recommendation: Report median wait times in patient-facing communications. Use the mean only for internal resource planning (where the total burden, including rare long waits, is relevant).

Key Takeaways from This Case Study

  1. The right statistic matters. The mean and median tell different stories for skewed data. Maya chose the median because it better represents the typical patient experience — and the bootstrap made inference possible for that statistic.

  2. The bootstrap handles any statistic. Maya computed CIs for the median, the 75th percentile, and the IQR — three statistics with no standard formula-based CI. The same np.random.choice() approach worked for all three.

  3. The bootstrap CI for the mean closely matches the t-interval. This is exactly what we'd expect. When both methods are applicable, they should agree — and the agreement provides mutual validation.

  4. Communicating results matters. The CI is only useful if stakeholders understand it. Maya translated the statistical results into actionable language for her committee. This connects directly to the communication skills we'll develop further in Chapter 25.

Questions for Discussion

  1. If Maya had only 12 patients instead of 50, would you still trust the bootstrap CI for the median? Why or why not?

  2. Maya's data include a few waits over 100 minutes. How would the bootstrap CI for the mean compare to the $t$-interval if there were even more extreme outliers (say, a 5-hour wait)? Which would be more affected?

  3. The hospital tracks wait times continuously. If Maya used the last 500 patients' wait times, would the bootstrap CI be narrower or wider than with $n = 50$? Would the bootstrap still be necessary, or could the CLT handle the mean at that sample size?

  4. An administrator asks: "What's the 95th percentile wait time? I want to know the longest wait we should plan for." Could Maya use the bootstrap for this? What concerns might arise when bootstrapping extreme percentiles?