Case Study 1 — The SIR Epidemic Model, Built and Solved End-to-End (Biology Track)

DataField.Dev

Case Study 1 — The SIR Epidemic Model, Built and Solved End-to-End (Biology Track)

This is the flagship worked portfolio: the SIR model from first principles to validated prediction, every tool traced home. Use it as the template for your own Biology-track portfolio.

1. The problem

A new respiratory illness appears in an isolated town of $N = 1000$ people. The health office has one question with budget and policy riding on it: how many will be sick at the worst moment, when will that moment come, and how many will be infected before it is over? No one has handed us a formula. We must build one.

This is the SIR model, the biology anchor of this book — introduced in Chapter 19 and brought to its full payoff here. We walk the entire modeling cycle of §39.2: assumptions, equations, solution, validation, communication.

2. The assumptions (this step is the modeling)

Split the population into three compartments:

$S(t)$ — susceptible: can still catch it.
$I(t)$ — infectious: currently have it and can transmit.
$R(t)$ — recovered/removed: immune or no longer transmitting.

We assume:

Closed population. $S + I + R = N$ for all time; we ignore births and natural deaths over the short outbreak. (Reasonable for weeks; wrong for years.)
Homogeneous mixing. Everyone is equally likely to contact everyone. (Wrong in detail — towns have households and schools — but a useful aggregate.)
Permanent immunity. Recovery confers lasting protection. (Approximate; many diseases wane.)

Each assumption is false in detail and useful in aggregate. Stating them is not a disclaimer; it is the model's definition of what matters.

3. The equations, every term explained

The model is a system of three coupled differential equations — the tool from Chapter 19. Each equation states a rate of change, the derivative of Chapter 6:

$$\frac{dS}{dt} = -\beta\,\frac{SI}{N}, \qquad \frac{dI}{dt} = \beta\,\frac{SI}{N} - \gamma I, \qquad \frac{dR}{dt} = \gamma I.$$

$\dfrac{dS}{dt} = -\beta SI/N$: the product $SI$ counts susceptible–infectious encounters; dividing by $N$ makes it a fraction; $\beta$ is the transmission rate (adequate contacts per infectious person per day). Susceptibles only decrease, hence the minus sign.
$\dfrac{dI}{dt} = \beta SI/N - \gamma I$: the first term is exactly the people leaving $S$ (they flow into $I$); the second is recovery, where $\gamma$ is the recovery rate and $1/\gamma$ is the mean infectious period.
$\dfrac{dR}{dt} = \gamma I$ collects the recovered.

The three rates sum to zero, $\frac{dS}{dt}+\frac{dI}{dt}+\frac{dR}{dt}=0$ — the differential statement that $N$ is conserved. That is our first built-in check.

Our parameters. Suppose contact tracing estimates $\beta = 0.3$ per day and the illness lasts about ten days, so $\gamma = 0.1$ per day ($1/\gamma = 10$ days — biologically plausible).

4. The decisive number: $R_0$

At the very start almost everyone is susceptible, $S \approx N$, so

$$\frac{dI}{dt} \approx \beta\frac{N\cdot I}{N} - \gamma I = (\beta - \gamma)\,I.$$

Infections grow iff $\beta > \gamma$, i.e. iff the basic reproduction number

$$R_0 = \frac{\beta}{\gamma} = \frac{0.3}{0.1} = 3$$

exceeds 1. With $R_0 = 3$, one case produces three new cases in a fully susceptible town: this outbreak takes off. More precisely, $\frac{dI}{dt} > 0$ while $S/N > 1/R_0$, so the epidemic peaks when the susceptible fraction falls to $1/R_0 = 1/3 \approx 33\%$. The herd-immunity threshold is

$$1 - \frac{1}{R_0} = 1 - \frac13 = \frac23 \approx 66.7\%.$$

Once two-thirds are immune, each case spawns fewer than one new case and the epidemic turns around.

5. Solving it numerically

The SIR system has no elementary closed-form solution (a recurring lesson — Chapter 14, §14.12: most ODEs escape elementary functions). So we integrate numerically with scipy.integrate.solve_ivp — an adaptive Runge–Kutta method, the sophisticated descendant of the Euler stepping from Chapter 19, itself rooted in the Riemann sums of Chapter 13.

# SIR epidemic model: solve numerically and read off predictions.
import numpy as np
from scipy.integrate import solve_ivp

N = 1000.0
beta, gamma = 0.3, 0.1          # per day

def sir(t, y):
    S, I, R = y
    dS = -beta * S * I / N
    dI =  beta * S * I / N - gamma * I
    dR =  gamma * I
    return [dS, dI, dR]

y0 = [N - 1, 1, 0]              # one initial infection
sol = solve_ivp(sir, (0, 160), y0, t_eval=np.linspace(0, 160, 161),
                rtol=1e-8, atol=1e-8)
S, I, R = sol.y
R0 = beta / gamma
print(f"R0 = {R0:.1f}")                                          # R0 = 3.0
print(f"Peak infected = {I.max():.0f} on day {sol.t[I.argmax()]:.0f}")  # ~301 on day 38
print(f"Final attack rate = {R[-1]/N:.1%}")                      # ~94.1%

Headline predictions. With $R_0 = 3$, the model forecasts a peak of about 301 simultaneously infected people around day 38, and a final attack rate of ~94%. The town should plan for roughly 300 simultaneous cases mid-outbreak.

The startling part: the epidemic turns around at 67% infected, yet ~94% are infected in the end. That gap is overshoot — infections keep happening past the turnaround because a large infectious pool is still burning through the remaining susceptibles. This is the model's most important and least intuitive output.

Real-World Application — Epidemic forecasting. This exact model drove the early COVID-19 projections behind 2020 lockdown timing. "Flatten the curve" was a statement about $I(t)$: interventions that lower $\beta$ (masking, distancing) reduce and broaden the peak so it stays under hospital capacity, even when they barely change the eventual attack rate. The effective reproduction number $R_t = R_0\,S/N$ was reported daily worldwide.

6. Validation: does the model obey its own laws?

A model you have not checked is a story, not science. We run three independent checks.

Conservation. Summing the equations gives $\frac{d}{dt}(S+I+R)=0$, so $S+I+R \equiv 1000$. In the simulation it holds to solver tolerance; a drift would mean a sign error.

The final-size relation (an independent analytic cross-check). Divide the $S$ equation by the $R$ equation — the $I$'s cancel:

$$\frac{dS}{dR} = \frac{-\beta SI/N}{\gamma I} = -\frac{\beta}{\gamma N}S = -\frac{R_0}{N}S.$$

This is a separable first-order ODE (Chapter 19). Separating and integrating (Chapter 13 — the integral recovers a total from a rate) and pushing $t\to\infty$ gives the final-size equation for the surviving susceptible fraction $s_\infty = S(\infty)/N$:

$$s_\infty = e^{-R_0(1-s_\infty)}.$$

This is transcendental — solve it with the root-finder of Chapter 11 (Newton's method) or brentq:

from scipy.optimize import brentq
s_inf = brentq(lambda s: s - np.exp(-R0 * (1 - s)), 1e-9, 0.999)
print(f"s_inf = {s_inf:.3f}, attack rate = {1 - s_inf:.3f}")
# s_inf = 0.060, attack rate = 0.940  -> matches the simulation's 0.941

For $R_0 = 3$ it gives $s_\infty \approx 0.06$, an attack rate of 0.94 — matching the numerical simulation. Two independent routes (integrating the full system, and reducing it to one equation) agree. That is validation.

Sensitivity. Re-running with $\beta$ raised 10% (to $0.33$, so $R_0 = 3.3$) moves the peak earlier and higher and nudges the attack rate up a few points — informative but not explosive, so the qualitative story is robust.

Common Pitfall. Do not read the 67% herd-immunity threshold as the final infected fraction. The threshold is the turnaround; the final attack rate (~94% here) is larger because of overshoot. Relaxing interventions exactly at the threshold still lets a large second wave occur — a real policy trap.

7. The tools, traced home

Tool	Where it appears here	Home chapter
Derivative (rate of change)	each $dS/dt$, $dI/dt$, $dR/dt$	6
Differential equation	the SIR system itself	19
Series / approximation	early growth $\propto e^{(\beta-\gamma)t}$	23
Integral / FTC	the final-size relation	13–14
Root-finding	solving $s_\infty = e^{-R_0(1-s_\infty)}$	11
Numerical ODE solver	Runge–Kutta via `solve_ivp`	13, 19

8. Limitations (stated plainly)

Homogeneous mixing is false in any real town (households, schools, super-spreaders break it); immunity may wane; $\beta$ and $\gamma$ are estimated with real uncertainty; and we ignored deaths and demographics. The model is a first approximation — useful for the shape and scale of the outbreak, not for street-level prediction. The honest extensions (SIRD, vital dynamics, age-structured contact matrices) are further loops of the modeling cycle (§39.3.6).

9. Conclusion (the punchline first)

The town should prepare for roughly 300 simultaneous infections around day 38, and expect about 94% to be infected before the outbreak ends — even though it begins to recede once two-thirds are immune. That single sentence, defensible and validated, is what a month of calculus bought. Behind it sits a rate (derivative), a dynamic law (ODE), an accumulation (integral), and a threshold ($R_0$) — the whole toolkit, on a problem nobody pre-digested.

Discussion Questions

The herd-immunity threshold (67%) and the final attack rate (94%) differ because of overshoot. Explain to a non-mathematician why the epidemic keeps growing past its turnaround point.
Two independent methods gave the same attack rate. Why is independent agreement stronger validation than re-running the same simulation more carefully?
A mayor proposes lifting all restrictions the moment 67% immunity is reached. Using this model, argue why that is risky.
Which assumption (closed population, homogeneous mixing, permanent immunity) do you think is most dangerous for a year-long disease, and how would you relax it?

Short Annotated Reading

Kermack & McKendrick (1927), "A Contribution to the Mathematical Theory of Epidemics." The original SIR paper; the source of the threshold theorem and the final-size relation.
Hethcote (2000), "The Mathematics of Infectious Diseases," SIAM Review. A clear modern survey of SIR and its many extensions; excellent for portfolio extensions.
Keeling & Rohani, Modeling Infectious Diseases in Humans and Animals. The standard graduate reference; chapters on age structure and contact matrices map directly onto §39.3.6.