Case Study 1 — Ranking Sports Teams When Everyone Plays a Different Schedule

DataField.Dev

Case Study 1 — Ranking Sports Teams When Everyone Plays a Different Schedule

Field: sports analytics and ranking systems. This case study takes the chapter's anchor — the dominant eigenvector of a stochastic matrix — out of the web and onto the playing field, where the same mathematics solves a problem that has caused real controversy: how do you rank teams fairly when they do not all play each other?

The problem: wins are not created equal

Imagine a league of teams that, over a season, each play only a fraction of the possible opponents. Some teams feast on a weak schedule and rack up wins; others lose a few games but only against the very best. At the end of the season you must produce a single ranking — to seed a tournament, to crown a champion, to decide who is left out. The naive answer, "rank by number of wins," is exactly the flawed link-counting of §29.1.1 in a new costume. A team with twelve wins against cupcakes may be weaker than a team with nine wins against a brutal slate, but win-counting cannot see the difference. It treats every victory as worth the same, ignoring whom you beat.

This is not a hypothetical annoyance. College sports in the United States, where hundreds of teams play wildly uneven schedules and a playoff includes only a handful, have fought for decades over how to rank fairly. Computer rankings — several of which are eigenvector methods of exactly the kind we are about to build — have been part of official selection systems, and the arguments they settle (and provoke) are real. The core mathematical insight is the chapter's: a win should be worth more when it comes against a strong opponent, and "strong" is itself defined recursively by who that opponent beat. That recursion is an eigenvector.

Step 1: turn a season into a directed graph

Model each team as a node. For every game, draw an arrow from the loser to the winner — the loser "endorses" the team that beat it, exactly as a web page endorses the pages it links to. A team accumulates incoming arrows by beating others, and its endorsements flow outward to the teams that beat it. This is a directed graph, and we can build its link matrix precisely as in §29.2: column $j$ (team $j$) distributes team $j$'s rank equally among the teams that defeated it, each getting $1/(\text{number of losses of } j)$.

Why point from loser to winner rather than the reverse? Because we want rank (importance, quality) to flow toward the teams that win, just as web rank flows toward the pages that get linked. A team's quality should be vouched for by the quality of the teams it beat — so the teams it beat should pass rank to it, which means arrows point from the beaten team to the victor. A team that loses to a very strong team is passing along a very valuable endorsement; a team that loses to a weak team passes along little. The recursion writes itself.

Consider a concrete four-team season. The "beaten-by" record — for each team, the list of teams that defeated it — is:

Team $A$ lost to $\{C, D\}$,
Team $B$ lost to $\{A, D\}$,
Team $C$ lost to $\{B, D\}$,
Team $D$ lost to $\{B\}$.

So team $D$ beat $A$, $B$, and $C$ (it lost only to $B$); team $A$ beat $B$; team $B$ beat $C$ and $D$; team $C$ beat $A$. Reading the "beaten-by" lists as outgoing endorsements, team $A$ endorses $C$ and $D$ (each $\tfrac12$), team $B$ endorses $A$ and $D$ (each $\tfrac12$), team $C$ endorses $B$ and $D$ (each $\tfrac12$), and team $D$ endorses $B$ (all of it, since $D$ lost only once).

Step 2: build the matrix and rank by the dominant eigenvector

Order the teams $A, B, C, D$. The link matrix, column by column (where does each team's endorsement go?), is

$$M = \begin{bmatrix} 0 & \tfrac12 & 0 & 0 \\[2pt] 0 & 0 & \tfrac12 & 1 \\[2pt] \tfrac12 & 0 & 0 & 0 \\[2pt] \tfrac12 & \tfrac12 & \tfrac12 & 0 \end{bmatrix}.$$

Every column sums to $1$ — no team in this league is undefeated (which would create a dangling node, the analog of a page with no outgoing links). We then form the Google matrix $G = 0.85\,M + \tfrac{0.15}{4}\mathbf{1}\mathbf{1}^{\mathsf{T}}$ exactly as in the chapter, damping to guarantee a unique positive ranking even if the season's results were oddly disconnected, and compute its dominant eigenvector by power iteration.

# Rank teams by the dominant eigenvector of the loss->win endorsement matrix.
import numpy as np
np.set_printoptions(precision=4, suppress=True)
teams = ['A', 'B', 'C', 'D']
beaten_by = {'A': ['C', 'D'], 'B': ['A', 'D'], 'C': ['B', 'D'], 'D': ['B']}
idx = {t: i for i, t in enumerate(teams)}
n = 4
M = np.zeros((n, n))
for loser, winners in beaten_by.items():
    j = idx[loser]
    for w in winners:
        M[idx[w], j] += 1.0 / len(winners)     # endorsement splits over losses
G = 0.85 * M + 0.15 / n * np.ones((n, n))
w, V = np.linalg.eig(G)
r = np.abs(V[:, np.argmax(w.real)].real); r /= r.sum()
for t in teams:
    print(f"{t}: rank {r[idx[t]]:.4f}  (wins this season: "
          f"{sum(t in v for v in beaten_by.values())})")
print("ranking:", [teams[i] for i in np.argsort(-r)])
# A: rank 0.1922  (wins: 1)
# B: rank 0.3640  (wins: 2)
# C: rank 0.1192  (wins: 1)
# D: rank 0.3246  (wins: 3)
# ranking: ['B', 'D', 'A', 'C']

The result is striking, and it is the whole point of the case study. Team $D$ won the most games (three) but does not rank first — team $B$ does, with only two wins. Win-counting would crown $D$; the eigenvector crowns $B$. Why? Because $B$'s two wins include a victory over $D$ itself, the team that beat everyone else. Beating the strongest team is worth a great deal of rank, and $B$ did it. Meanwhile $D$'s three wins came against $A$, $B$, and $C$ — but two of those ($A$ and $C$) are weak teams whose endorsements carry little rank. The recursion has done exactly what we asked: it weighted each win by the strength of the beaten opponent, and a single win over the kingpin outweighed a pile of wins over also-rans.

Step 3: why this is the same mathematics as PageRank

Strip away the sports vocabulary and the structure is identical to the web. Teams are pages; a loss is a link; "quality" is rank; and the ranking is the dominant eigenvector of a damped stochastic matrix, found by power iteration. Every conceptual move transfers: the column-stochastic normalization (a team's endorsement is split evenly over its losses, just as a page's rank splits over its links), the dangling-node problem (an undefeated team endorses no one — patch it), the damping factor (guaranteeing a unique ranking even from a strange or disconnected schedule), and the Perron-Frobenius guarantee that exactly one positive ranking exists. The reader who understands PageRank understands sports ranking for free; only the interpretation of the arrows changed.

This transferability is recurring theme #4 of the book — linear algebra is the most applied branch of pure mathematics; learn it once, use it everywhere. The dominant eigenvector that ranked the web ranks sports teams, and (in the next case study) ranks scientific papers, with no change to the underlying computation. The eigenvector does not know or care whether its matrix came from hyperlinks, game results, or citations. It computes self-consistent importance from a network of endorsements, full stop.

A caution, honestly stated

Real sports-ranking systems are more careful than our toy in ways worth naming, lest you over-trust the four-line script. They often weight games by margin of victory (a blowout endorses more than a squeaker) or by where the game was played (a road win is harder), and they may incorporate the recency of results. They also confront the genuine pathology of an undefeated team: a team that never lost endorses no one, a dangling node whose handling (damping, or excluding it from the flow) measurably changes the ranking — and over which fans will argue bitterly. And there is a deeper philosophical limit: any ranking method encodes a choice about what "better" means, and no eigenvector can settle a question that is partly about values rather than facts. What the linear algebra does provide is a principled, transparent, manipulation-resistant core: importance defined recursively and solved as a fixed point, immune to the "beat up on weak opponents" gaming that defeats naive win-counting. That core is the same dominant eigenvector you computed for the web, and it is why eigenvector methods keep reappearing wherever a fair ranking from uneven, networked competition is needed.