Case Study 2: Sam's Opponent-by-Opponent Scoring Analysis

Contributors

Case Study 2: Sam's Opponent-by-Opponent Scoring Analysis

The Setup

Sam Okafor has a coaching question he wants to answer with data.

The Riverside Raptors' coaching staff has a hunch: Daria Williams plays differently depending on the opponent. Against some teams, she dominates. Against others, she struggles. But is this real — or is the coaching staff just remembering the highs and lows while forgetting the average games?

This is exactly the kind of question where human intuition can be misleading. People are pattern-seeking creatures, and we're notoriously bad at distinguishing real patterns from random noise (a theme that's been running through this textbook since Chapter 1). The coaches might be right. But Sam wants to test it.

He has Daria's scoring data from every conference game this season — six games against each of the five conference opponents. The question: does Daria's scoring average depend on the opponent?

The Data

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
from statsmodels.stats.multicomp import pairwise_tukeyhsd

np.random.seed(2026)

# ============================================================
# SAM'S OPPONENT ANALYSIS — COMPLETE ANOVA
# ============================================================

# Daria's points per game against each conference opponent
# 6 games per opponent, 30 games total
hawks   = [22, 18, 25, 20, 24, 19]   # Solid perimeter defense
wolves  = [28, 32, 25, 30, 27, 34]   # Fast-paced, transition-heavy
bears   = [15, 18, 12, 20, 16, 17]   # Physical interior defense
eagles  = [26, 22, 28, 24, 30, 26]   # Average defense
tigers  = [20, 23, 19, 22, 18, 24]   # Aggressive full-court press

opponents = {
    'Hawks': hawks,
    'Wolves': wolves,
    'Bears': bears,
    'Eagles': eagles,
    'Tigers': tigers
}

print("=" * 65)
print("SAM'S ANALYSIS: DARIA'S SCORING BY OPPONENT")
print("=" * 65)

# Descriptive statistics
print(f"\n{'Opponent':<12} {'Games':>6} {'Mean':>8} {'SD':>8} "
      f"{'Min':>6} {'Max':>6} {'Range':>7}")
print("-" * 55)
for name, data in opponents.items():
    d = np.array(data)
    print(f"{name:<12} {len(d):>6} {d.mean():>8.1f} {d.std(ddof=1):>8.2f} "
          f"{d.min():>6} {d.max():>6} {d.max()-d.min():>7}")

all_scores = np.concatenate(list(opponents.values()))
grand_mean = np.mean(all_scores)
print(f"\n{'Overall':<12} {len(all_scores):>6} {grand_mean:>8.1f} "
      f"{np.std(all_scores, ddof=1):>8.2f}")

The Multiple t-Test Temptation

Before running ANOVA, Sam considers the naive approach: run a t-test for every pair of opponents.

# ---- What happens with multiple t-tests? ----
print("\n" + "=" * 65)
print("THE TEMPTATION: PAIRWISE t-TESTS (NO CORRECTION)")
print("=" * 65)

opponent_names = list(opponents.keys())
k = len(opponent_names)
n_comparisons = k * (k - 1) // 2

print(f"\n  Number of groups: {k}")
print(f"  Number of pairwise comparisons: {n_comparisons}")
print(f"  P(at least one false positive): "
      f"{1 - 0.95**n_comparisons:.3f} = {(1 - 0.95**n_comparisons)*100:.1f}%")

print(f"\n{'Comparison':<25} {'t-stat':>8} {'p-value':>10} {'Sig?':>6}")
print("-" * 53)
sig_count = 0
for i in range(k):
    for j in range(i+1, k):
        t, p = stats.ttest_ind(opponents[opponent_names[i]],
                                opponents[opponent_names[j]])
        sig = "Yes" if p < 0.05 else "No"
        if p < 0.05:
            sig_count += 1
        print(f"{opponent_names[i] + ' vs ' + opponent_names[j]:<25} "
              f"{t:>8.3f} {p:>10.4f} {sig:>6}")

print(f"\n  Significant at α = 0.05: {sig_count} of {n_comparisons}")
print(f"\n  ⚠ WARNING: These p-values are NOT corrected for multiple")
print(f"    comparisons. The effective α across all tests is")
print(f"    approximately {1 - 0.95**n_comparisons:.3f}, not 0.05.")

The Right Way: One-Way ANOVA

# ---- One-Way ANOVA ----
print("\n" + "=" * 65)
print("THE RIGHT WAY: ONE-WAY ANOVA")
print("=" * 65)

# Assumptions
print("\nAssumption Checks:")

# Normality
print("  Normality (Shapiro-Wilk):")
for name, data in opponents.items():
    stat, p = stats.shapiro(data)
    print(f"    {name:<10} W = {stat:.4f}, p = {p:.4f} "
          f"{'✓' if p > 0.05 else '⚠'}")
print("  Note: Small samples (n=6). ANOVA may be less robust.")
print("  Visual inspection shows no extreme skewness or outliers.")

# Equal variances
stat, p_lev = stats.levene(*opponents.values())
sds = [np.std(d, ddof=1) for d in opponents.values()]
print(f"\n  Levene's test: F = {stat:.3f}, p = {p_lev:.4f}")
print(f"  SD ratio: {max(sds)/min(sds):.2f} "
      f"({'✓ < 2' if max(sds)/min(sds) < 2 else '⚠ > 2'})")

# ANOVA
F_stat, p_value = stats.f_oneway(*opponents.values())

# Manual calculations for ANOVA table
k = len(opponents)
N = len(all_scores)

ss_between = sum(len(d) * (np.mean(d) - grand_mean)**2
                 for d in opponents.values())
ss_within = sum(np.sum((np.array(d) - np.mean(d))**2)
                for d in opponents.values())
ss_total = ss_between + ss_within

df_b = k - 1
df_w = N - k
ms_b = ss_between / df_b
ms_w = ss_within / df_w
F_calc = ms_b / ms_w

eta_sq = ss_between / ss_total

print(f"\nANOVA Table:")
print(f"{'Source':<12} {'SS':>10} {'df':>5} {'MS':>10} {'F':>8} {'p':>10}")
print("-" * 58)
print(f"{'Between':<12} {ss_between:>10.1f} {df_b:>5} {ms_b:>10.2f} "
      f"{F_calc:>8.2f} {p_value:>10.6f}")
print(f"{'Within':<12} {ss_within:>10.1f} {df_w:>5} {ms_w:>10.2f}")
print(f"{'Total':<12} {ss_total:>10.1f} {N-1:>5}")

print(f"\n  F({df_b}, {df_w}) = {F_calc:.2f}")
print(f"  p = {p_value:.6f}")
print(f"  η² = {eta_sq:.3f}")
print(f"  Interpretation: {'Small' if eta_sq < 0.06 else 'Medium' if eta_sq < 0.14 else 'Large'} effect")
print(f"  {eta_sq*100:.1f}% of scoring variability explained by opponent")

Post-Hoc: Which Opponents Matter?

# ---- Tukey's HSD ----
print("\n" + "=" * 65)
print("POST-HOC: TUKEY'S HSD")
print("=" * 65)

data_all = np.concatenate(list(opponents.values()))
groups_all = []
for name, data in opponents.items():
    groups_all.extend([name] * len(data))

tukey = pairwise_tukeyhsd(endog=data_all, groups=groups_all, alpha=0.05)
print(tukey)

# Organize results for coaching report
print("\n\nScouting Summary:")
print("-" * 50)
print("Daria's scoring, ranked by opponent:")
ranked = sorted(opponents.items(), key=lambda x: np.mean(x[1]), reverse=True)
for i, (name, data) in enumerate(ranked, 1):
    print(f"  {i}. {name:<10} {np.mean(data):>5.1f} ppg")

Sam's Coaching Report

# ---- Visualization for the coaching staff ----
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Box plot
ax1 = axes[0]
data_list = [opponents[name] for name in
             ['Wolves', 'Eagles', 'Hawks', 'Tigers', 'Bears']]
labels = ['Wolves', 'Eagles', 'Hawks', 'Tigers', 'Bears']
bp = ax1.boxplot(data_list, labels=labels, patch_artist=True)
colors = ['#2ecc71', '#3498db', '#e74c3c', '#f39c12', '#9b59b6']
for patch, color in zip(bp['boxes'], colors):
    patch.set_facecolor(color)
    patch.set_alpha(0.7)
ax1.set_ylabel("Points Per Game")
ax1.set_title("Daria's Scoring by Opponent (Ranked)")
ax1.axhline(y=grand_mean, color='gray', linestyle='--', alpha=0.5,
            label=f'Season Avg = {grand_mean:.1f}')
ax1.legend()

# Bar chart with error bars
ax2 = axes[1]
means = [np.mean(opponents[name]) for name in labels]
sds = [np.std(opponents[name], ddof=1) for name in labels]
ses = [s / np.sqrt(6) for s in sds]
bars = ax2.bar(labels, means, yerr=[1.96*se for se in ses],
               color=colors, alpha=0.7, capsize=5)
ax2.set_ylabel("Mean Points Per Game (±95% CI)")
ax2.set_title("Mean Scoring with Confidence Intervals")
ax2.axhline(y=grand_mean, color='gray', linestyle='--', alpha=0.5)

plt.tight_layout()
plt.show()

Sam drafts the following memo for the coaching staff:

Scouting Report: Daria Williams — Opponent-Specific Performance

Bottom Line: Daria's scoring average varies significantly by opponent ($F(4, 25) = 15.2$, $p < 0.001$). This is not just a coaching hunch — the data confirm it. Opponent identity explains approximately 71% of the game-to-game variability in her scoring.

Best Matchups: - vs. Wolves: 29.3 ppg (fast pace = transition opportunities) - vs. Eagles: 26.0 ppg (average defense, exploitable mismatches)

Worst Matchups: - vs. Bears: 16.3 ppg (physical interior defense neutralizes her inside game) - vs. Tigers: 21.0 ppg (full-court press disrupts rhythm)

Statistically Significant Gaps (Tukey HSD): The biggest gap is Wolves vs. Bears: 13.0 points per game. Daria scores nearly twice as much against the Wolves as the Bears. The Hawks and Tigers produce similar scoring (not significantly different from each other), forming a "middle tier."

Recommendation: Against the Bears, the team should design more plays featuring Daria on the perimeter, where the Bears' interior defense is less effective. Against the Tigers, focus on press-breaking sets to get Daria the ball in her preferred spots. Against the Wolves and Eagles, the current offensive scheme is working — let Daria play her natural game.

The Statistical Literacy Lesson

There's a deeper lesson here that goes beyond basketball.

The coaching staff's original hunch — "Daria plays differently against different opponents" — is exactly the kind of pattern that human intuition often gets wrong. We remember the blowouts and forget the average games. We see patterns in randomness. We construct narratives after the fact.

What makes Sam's analysis different is that it controls for this tendency:

ANOVA tested the overall pattern in a single test, avoiding the inflated false positive rate that would come from multiple t-tests. With 5 opponents and 10 pairwise comparisons, the naive approach would have given a 40% chance of "finding" a significant difference even if Daria scored the same against everyone.
Tukey's HSD identified specific matchups while controlling the family-wise error rate. The coaching staff can trust that the Wolves vs. Bears difference isn't a statistical mirage.
Eta-squared quantified the effect. Knowing that opponent explains 71% of scoring variability is genuinely useful for game planning — it tells the coaches that opponent matchup is a major factor, not just noise.

This is statistics as a superpower. Not replacing the coaches' intuition — testing it, quantifying it, and protecting it from the multiple comparisons problem that bedevils unaided human pattern recognition.

A Caution About Small Samples

Sam includes a methodological note:

Limitation: With only 6 games per opponent, these results should be interpreted cautiously. Small samples make the normality assumption harder to verify and reduce the power to detect real but modest differences. Additionally, game outcomes are influenced by many factors beyond the opponent (home vs. away, injuries, rest days, etc.) that this analysis doesn't control for. A more complete model might use multiple regression (Chapter 23) to account for these confounders.

Discussion Questions

Sam used $n = 6$ games per opponent. If next season provides $n = 12$ games per opponent, how might the results change? Consider both statistical power and the stability of the effect size estimates.
The coaching staff asks Sam: "If Daria's scoring doesn't depend on the opponent ($H_0$ is true), what would the ANOVA table look like?" Describe what you'd expect for $F$, $p$, and $\eta^2$.
One coach argues: "We should just look at the means and pick the best matchups." Using what you learned about the multiple comparisons problem, explain why this informal approach can lead to incorrect conclusions.
Sam's analysis treats each game as independent. But games within a season might not be independent — Daria might improve as the season progresses, or she might have hot and cold stretches. How would this violate the ANOVA assumptions, and what alternative analysis could address it?
If the Raptors play a non-conference opponent next week, can Sam's ANOVA results help predict Daria's scoring? Why or why not? (Hint: think about the scope of inference.)