Case Study: Probability in Sports — From Fantasy Leagues to the Vegas Line

Contributors

Case Study: Probability in Sports — From Fantasy Leagues to the Vegas Line

The Setup

Sam Okafor's internship with the Riverside Raptors has given him front-row access to the world of sports analytics. But one evening, scrolling through his phone, he notices something: the same probability concepts he's been studying in his statistics class are powering an entire industry.

Sports betting in the U.S. became a $120 billion industry after the Supreme Court's 2018 ruling in Murphy v. NCAA allowed states to legalize it. Fantasy sports platforms like DraftKings and FanDuel — which frame themselves as "games of skill" rather than gambling — generate billions more. And beneath all of it — every line, every spread, every over/under — is probability.

Sam decides to investigate. What he finds is a masterclass in every concept from Chapter 8.

How Betting Odds Are Really Probabilities

When you see a sports betting line, you're looking at implied probabilities — even if they don't look like it.

American Odds to Probability

A sportsbook offers the following odds on a basketball game:

Riverside Raptors: -150 (favorites)
Central City Comets: +130 (underdogs)

What does this mean?

Negative odds (-150): You must bet $150 to win $100. This implies the team is more likely to win.

Positive odds (+130): A $100 bet wins $130. This implies the team is less likely to win.

To convert to probabilities:

$$P(\text{favorite}) = \frac{|\text{negative odds}|}{|\text{negative odds}| + 100} = \frac{150}{150 + 100} = \frac{150}{250} = 0.60$$

$$P(\text{underdog}) = \frac{100}{|\text{positive odds}| + 100} = \frac{100}{130 + 100} = \frac{100}{230} = 0.435$$

Wait — 0.60 + 0.435 = 1.035. That's more than 1!

This "extra" 3.5% is the vigorish (or "vig") — the sportsbook's built-in profit margin. It's their version of the house edge. The sportsbook doesn't care who wins; it profits from that gap regardless.

Sam runs the numbers:

def american_to_probability(odds):
    """Convert American odds to implied probability."""
    if odds < 0:
        return abs(odds) / (abs(odds) + 100)
    else:
        return 100 / (odds + 100)

# Raptors game
raptors_odds = -150
comets_odds = 130

p_raptors = american_to_probability(raptors_odds)
p_comets = american_to_probability(comets_odds)

print(f"Raptors (-150): Implied P(win) = {p_raptors:.4f}")
print(f"Comets  (+130): Implied P(win) = {p_comets:.4f}")
print(f"Sum: {p_raptors + p_comets:.4f}")
print(f"Vigorish (overround): {(p_raptors + p_comets - 1) * 100:.1f}%")

# Adjust to "true" probabilities by removing the vig
total = p_raptors + p_comets
true_p_raptors = p_raptors / total
true_p_comets = p_comets / total
print(f"\nVig-adjusted probabilities:")
print(f"Raptors: {true_p_raptors:.4f}")
print(f"Comets:  {true_p_comets:.4f}")
print(f"Sum: {true_p_raptors + true_p_comets:.4f}")

After removing the vig, the "true" implied probabilities are approximately 0.579 for the Raptors and 0.421 for the Comets. These sum to 1, as probabilities should.

Connection to Section 8.4: The probabilities of all outcomes MUST sum to 1 (Rule 2). When they sum to more than 1, someone is skimming a profit off the top. The vig is essentially a tax on uncertainty — you're paying the sportsbook for the privilege of betting.

The Law of Large Numbers in Action: Why the House Always Wins

Sam asks his supervisor, an analytics veteran: "If the odds correctly reflect the true probabilities, doesn't the sportsbook have a 50/50 shot of losing on each bet?"

His supervisor smiles. "On each bet, sure. But the sportsbook doesn't make one bet. They make millions."

This is the law of large numbers (Section 8.3) applied to business. On any single game, the sportsbook might lose. But over thousands of games, their built-in edge — the vigorish — guarantees a profit.

Let's simulate it:

import numpy as np
import matplotlib.pyplot as plt

np.random.seed(42)

def simulate_sportsbook(n_bets, vig=0.035, true_prob=0.55):
    """Simulate a sportsbook's profit over n_bets.

    Each bet: bettor wagers $100.
    If bettor wins: sportsbook loses the payout (adjusted for vig).
    If bettor loses: sportsbook keeps the $100.
    """
    # With vig, the sportsbook's effective edge is the vig divided across bets
    # Simplified: each bet has expected value of vig * bet_size for the house
    cumulative_profit = np.zeros(n_bets)
    profit = 0

    for i in range(n_bets):
        bet_amount = 100
        # The bettor wins with the true probability
        if np.random.random() < true_prob:
            # Bettor wins: sportsbook pays out, but less than fair due to vig
            # Fair payout would be (1 - true_prob) / true_prob * 100
            # Actual payout is reduced by vig
            fair_payout = (1 - true_prob) / true_prob * bet_amount
            actual_payout = fair_payout * (1 - vig/2)
            profit -= actual_payout  # Sportsbook pays out
            profit += bet_amount     # Sportsbook keeps the bet
        else:
            # Bettor loses: sportsbook keeps the wager
            profit += bet_amount
        cumulative_profit[i] = profit

    return cumulative_profit

n_bets = 10000
cumulative = simulate_sportsbook(n_bets)

plt.figure(figsize=(10, 5))
plt.plot(range(1, n_bets + 1), cumulative, color='green', linewidth=0.8)
plt.axhline(y=0, color='red', linestyle='--', linewidth=1)
plt.xlabel('Number of Bets', fontsize=12)
plt.ylabel('Cumulative Sportsbook Profit ($)', fontsize=12)
plt.title('Law of Large Numbers: Sportsbook Profit Over Time', fontsize=14)
plt.tight_layout()
plt.show()

print(f"After {n_bets:,} bets:")
print(f"  Total profit: ${cumulative[-1]:,.2f}")
print(f"  Average profit per bet: ${cumulative[-1]/n_bets:.2f}")

The sportsbook's profit line may dip below zero early on (a bad streak of losses), but over thousands of bets, the law of large numbers takes over. The cumulative profit trends steadily upward. The house always wins — not because of luck, but because of math.

Fantasy Sports: Probability as Skill

Sam's roommate plays daily fantasy sports (DFS) and insists it's "pure skill." Sam decides to analyze this claim through the lens of probability.

In DFS, you draft a virtual team of real players and earn points based on their actual performance. To win, your team needs to outscore other contestants' teams.

Building a Lineup with Probability

Suppose Sam is drafting for a fantasy basketball contest. He needs to decide between two players for his lineup:

Player A: - Average points per game: 22.5 - Games this season: 60 - Standard deviation: 8.3

Player B: - Average points per game: 21.8 - Games this season: 45 - Standard deviation: 4.1

Sam applies the relative frequency approach. Based on season-long data:

$$P(\text{Player A scores 25+}) \approx \frac{\text{Games with 25+ points}}{\text{Total games}} = \frac{24}{60} = 0.40$$

$$P(\text{Player B scores 25+}) \approx \frac{\text{Games with 25+ points}}{\text{Total games}} = \frac{16}{45} = 0.356$$

Player A has a higher probability of a big game — but also more volatility (higher standard deviation, as Sam learned in Chapter 6). Player B is more consistent but has a lower ceiling.

Connection to Chapter 6: The standard deviation is the "typical distance from the mean" — Sam's threshold concept from Chapter 6. Here it captures how much a player's performance varies game to game. In fantasy sports, you sometimes want high-variance (boom-or-bust) players in tournaments (where you need to beat thousands of opponents) and low-variance players in head-to-head matchups (where consistency wins). Standard deviation isn't just a number — it's a strategic variable.

Independence and Correlation in Sports

Sam's supervisor points out a subtlety: player performances aren't always independent.

If Sam drafts a quarterback AND his favorite receiver, their fantasy scores are positively correlated — when the QB throws a touchdown, the receiver scores too. This violates the independence assumption of the multiplication rule.
If two players are on opposing teams and the game goes to overtime, both might have higher-than-normal stats. Their performances are linked through the game script.

This means Sam can't simply use $P(A \text{ and } B) = P(A) \times P(B)$ to estimate the probability of both players having big games. He'd need to account for the correlation — a topic that connects to Chapter 22 (correlation and regression).

For now, the key lesson is: the multiplication rule for independent events requires actual independence. In real data, always ask: "Are these events truly independent?"

The Gambler's Fallacy in the Wild

Sam visits a sports bar during the NBA playoffs. A fan at the next table groans: "LeBron has missed his last four three-pointers. He's due to make one!"

Sam recognizes the gambler's fallacy (Section 8.3) in real time. Each shot is not a perfectly independent event — fatigue, defensive pressure, and game flow all matter — but the logic of "he's due" is still flawed. A player doesn't become more likely to make a shot simply because he's missed several in a row.

In fact, extensive research on the "hot hand" in basketball has produced a nuanced conclusion:

The original 1985 study by Gilovich, Vallone, and Tversky found NO evidence of the hot hand — players were not more likely to make a shot after making the previous one. The perceived "hot hand" appeared to be the gambler's fallacy in reverse (people seeing patterns in randomness).
Later research (Miller and Sanjurjo, 2018) identified a subtle statistical bias in the original study and found weak evidence that the hot hand might exist, but the effect is much smaller than people perceive.

The lesson: even if a small hot-hand effect exists, the perception of hot streaks vastly exceeds reality. Our brains are pattern-seekers — they find streaks even in purely random sequences.

Contingency Tables in Sports Analytics

Sam creates a contingency table to analyze the Raptors' performance by game location and result:

	Win	Loss	Total
Home	28	13	41
Away	18	23	41
Total	46	36	82

Using the techniques from Section 8.7:

Marginal probability: $$P(\text{win}) = \frac{46}{82} = 0.561$$

Joint probability: $$P(\text{home and win}) = \frac{28}{82} = 0.341$$

Addition rule: $$P(\text{home or win}) = P(\text{home}) + P(\text{win}) - P(\text{home and win})$$ $$= \frac{41}{82} + \frac{46}{82} - \frac{28}{82} = \frac{59}{82} = 0.720$$

Independence check: If location and winning were independent: $$P(\text{home}) \times P(\text{win}) = 0.500 \times 0.561 = 0.280$$

But the actual joint probability is: $$P(\text{home and win}) = 0.341$$

Since $0.341 \neq 0.280$, location and winning are NOT independent — home court advantage is real. The Raptors win a higher proportion of home games (28/41 = 0.683) than away games (18/41 = 0.439).

Connection to Chapter 9 Preview: The difference between P(win | home) = 0.683 and P(win | away) = 0.439 is a conditional probability. You're conditioning on the game location. Chapter 9 will formalize this idea — and it's exactly the kind of analysis that professional sports teams do every day.

What Sam Learned

After his deep dive into sports probability, Sam draws several conclusions:

Probability is everywhere in sports — not just in betting, but in team strategy, player evaluation, game planning, and fan behavior.
The law of large numbers is the sportsbook's business model. They don't need to predict individual games correctly. They need the vig and volume.
Independence matters. Whether it's stacking a quarterback with his receiver in fantasy sports or assuming shot attempts are independent, the multiplication rule only works when the assumption holds.
The gambler's fallacy is persistent. Even in a setting where people supposedly understand randomness (sports), the "due" fallacy thrives.
Contingency tables reveal patterns. Home court advantage, clutch performance, matchup data — all of it can be analyzed with the two-way tables from Section 8.7.

Discussion Questions

A sportsbook offers odds of -200 on Team A and +170 on Team B. Convert these to implied probabilities. What is the vigorish? Do you think these odds accurately reflect the true probabilities?
Why do sportsbooks adjust their odds as more bets come in? How does this relate to the idea that the sportsbook wants to guarantee a profit regardless of the game's outcome?
A fantasy sports player argues: "I've studied the matchups, so I have an edge over other players. This is skill, not gambling." Using the concepts of relative frequency and subjective probability, evaluate this claim. Can it be partially true?
Sam notices that some NBA players shoot better in the fourth quarter of close games ("clutch" performance). How would you use a contingency table to test whether "clutch" performance is real or just random variation? What would the table look like?
The sports betting industry is built on the assumption that most bettors will lose over time. Is this an ethical business model? How does the law of large numbers create an inherent information and structural advantage for the house?

The Takeaway

Sports analytics is probability in action. Every prediction, every line, every draft pick is a statement about how likely something is to happen. The rules from Chapter 8 — complement, addition, multiplication, independence, the law of large numbers — aren't abstract formulas. They're the operating system of a multibillion-dollar industry.

And here's the most important lesson Sam takes away: the value of probability isn't in predicting any single event correctly. It's in making consistently better decisions over many events. A 60% probability doesn't mean you'll win this bet. It means that if you make similar bets 1,000 times, you'll come out ahead. That's the law of large numbers — and it's as true in sports as it is in medicine, business, and every other domain where uncertainty reigns.