For thirty-eight chapters you have been collecting tools. The derivative gave you rate of change (Chapters 6–9). The integral gave you accumulation (Chapters 13–18). Optimization gave you the best choice (Chapters 10 and 31). The differential...
Prerequisites
- Chapters 1-38
Learning Objectives
- Synthesize calculus skills into a coherent, end-to-end modeling project.
- Choose appropriate mathematical tools for a real-world problem and justify each choice.
- Build, solve, and validate the SIR epidemic model from first principles.
- Trace every tool in a model back to the chapter that introduced it.
- Produce a portfolio demonstrating fluency across the calculus toolkit.
In This Chapter
- 39.1 The Capstone
- 39.2 The Modeling Cycle
- 39.3 Biology Track — The SIR Model, Built and Solved (Flagship)
- 39.4 Economics Track — Constrained Production, Built and Solved
- 39.5 Physics Track — The Hohmann Transfer, Built and Solved
- 39.6 Data Science Track — Gradient-Descent Curve Fitting, Built and Solved
- 39.7 Cross-Track Synthesis: One Toolkit, Four Worlds
- 39.8 Validation, Sensitivity, and Honesty
- 39.9 Assembling and Communicating the Portfolio
- Looking Ahead
- Reflection
Chapter 39 — The Mathematical Modeling Portfolio
39.1 The Capstone
For thirty-eight chapters you have been collecting tools. The derivative gave you rate of change (Chapters 6–9). The integral gave you accumulation (Chapters 13–18). Optimization gave you the best choice (Chapters 10 and 31). The differential equation gave you dynamics — laws that say how a system changes from one instant to the next (Chapter 19). Series gave you approximation of functions that have no clean formula (Chapters 23–24). The gradient gave you learning — a direction to step to reduce error (Chapter 30).
This chapter is where those tools stop being exercises and become a working machine. Your capstone is a modeling portfolio: a collection of mathematical models you build, solve, validate, and explain. You will choose one of four tracks that has been threaded through the entire book:
- Biology — epidemics, population dynamics, ecology.
- Economics — production, optimization, market dynamics.
- Physics — motion, fields, orbits.
- Data Science — gradient descent, curve fitting, optimization.
This chapter does not merely describe the tracks. It builds one complete, end-to-end model in each, so you can see exactly what "finished" looks like before you build your own. The flagship is the SIR epidemic model, the anchor example introduced in Chapter 19 and reaching its climax here: we will derive it, solve it numerically with scipy, extract its three headline predictions, and validate it against the mathematics it must obey.
The Key Insight. A model is not a calculation; it is a choice. The equations are the easy part — calculus has already taught you those. The hard part is judgment: which features of the world to capture, which to ignore, what to compute, and how to know whether the answer means anything. The portfolio demonstrates that you can exercise that judgment, not just turn a crank.
This is also the chapter where the six recurring themes of the book converge. Calculus is the mathematics of change (Theme 1): every model here is a statement about how something changes. Geometry and algebra are inseparable (Theme 2): every model has a phase portrait or a trajectory you can see. The Fundamental Theorem of Calculus (Theme 3, Chapter 14) is the silent engine that turns every rate into a total. Hand computation builds understanding; machine computation builds power (Theme 4): you will derive each model by hand, then hand it to Python. Calculus appears in every quantitative field (Theme 5): that is the whole point of having four tracks. And approximation is the soul of calculus (Theme 6): numerical ODE solvers, Taylor expansions, and gradient descent are all "close enough" made rigorous.
39.2 The Modeling Cycle
Every model in this chapter — and every model you will ever build — passes through the same loop.
- Identify the problem. What real phenomenon are you trying to understand or predict?
- Make assumptions. What do you include, and what do you deliberately ignore? This step is the modeling; everything else is bookkeeping.
- Build the model. Translate the assumptions into equations — usually derivatives (rates), integrals (totals), or an optimization (a best choice).
- Solve or simulate. Find an analytic solution if one exists; otherwise integrate numerically.
- Validate. Compare against data, known special cases, conservation laws, or dimensional analysis. A model you have not checked is a story, not science.
- Refine. Loop back. Every pass teaches you what the previous assumptions cost you.
- Communicate. A model nobody understands or trusts changes nothing.
The cycle is not linear. You will build, discover the model predicts nonsense, return to your assumptions, and rebuild. That loop is the work. Hold the cycle in mind as we walk the four tracks — each worked example below is one full pass around it.
Geometric Intuition. Picture the modeling cycle as a spiral, not a circle. Each loop returns you to the same seven stages, but one level higher: a richer model, a tighter fit, a clearer story. The basic SIR model is the innermost loop; age structure and vaccination are the loops further out. You never "finish" — you stop when the model is good enough for the decision it must support.
39.3 Biology Track — The SIR Model, Built and Solved (Flagship)
This is the centerpiece of the chapter. We will build the SIR model of epidemics from scratch, solve it, and read off everything it predicts. The SIR model is the anchor example first developed in Chapter 19; here it reaches its full payoff.
39.3.1 The assumptions
Split a population of fixed size $N$ into three compartments:
- $S(t)$ — susceptible: people who can catch the disease.
- $I(t)$ — infectious: people who currently have it and can transmit it.
- $R(t)$ — recovered (or removed): people who have had it and are now immune (or have died).
At all times $S(t) + I(t) + R(t) = N$. We assume the population mixes homogeneously (everyone equally likely to contact everyone), the timescale is short enough to ignore births and natural deaths, and recovery confers permanent immunity. These assumptions are wrong in detail and useful in aggregate — exactly what a model should be.
39.3.2 The equations, every term explained
The model is a system of three coupled differential equations (the tool from Chapter 19). Each one is a statement about a rate of change — a derivative (Chapter 6):
$$\frac{dS}{dt} = -\,\beta\,\frac{S I}{N}, \qquad \frac{dI}{dt} = \beta\,\frac{S I}{N} - \gamma I, \qquad \frac{dR}{dt} = \gamma I.$$
Read every symbol:
- $\dfrac{dS}{dt}$ is the rate at which susceptibles are being depleted. The product $SI$ counts the number of susceptible–infectious encounters (proportional to both populations); dividing by $N$ makes it a fraction; $\beta$ is the transmission rate — the average number of adequate contacts one infectious person makes per unit time. The minus sign says susceptibles only ever decrease.
- $\dfrac{dI}{dt}$ has two terms. The first, $+\beta SI/N$, is exactly the people leaving $S$ — they flow into $I$. The second, $-\gamma I$, is recovery: $\gamma$ is the recovery rate, and $1/\gamma$ is the average duration of infectiousness.
- $\dfrac{dR}{dt} = \gamma I$ collects the recovered. Notice the three rates sum to zero: $\frac{dS}{dt} + \frac{dI}{dt} + \frac{dR}{dt} = 0$, which is the differential statement that $N$ is conserved — our first built-in validation check.
Geometric Intuition. Think of the disease as a fluid flowing downhill through three tanks: $S \to I \to R$. The pipe from $S$ to $I$ has a valve whose opening depends on how full both the $S$ and $I$ tanks are (that is the $SI$ term — transmission needs both a spark and fuel). The pipe from $I$ to $R$ has a fixed-rate valve ($\gamma$). The epidemic is the transient slosh of fluid through the middle tank: $I$ rises while the $S \to I$ inflow beats the $I \to R$ outflow, peaks when they balance, and falls once $S$ runs too low to sustain it.
39.3.3 The single most important number: $R_0$
The epidemic either takes off or it doesn't, and one number decides. At the very start, almost everyone is susceptible, so $S \approx N$ and
$$\frac{dI}{dt} \approx \beta\,\frac{N \cdot I}{N} - \gamma I = (\beta - \gamma)\,I.$$
Infections grow if and only if $\beta - \gamma > 0$, i.e. $\beta/\gamma > 1$. Define the basic reproduction number
$$R_0 = \frac{\beta}{\gamma}.$$
$R_0$ is the average number of new infections one infectious person causes in a fully susceptible population. If $R_0 > 1$, $I$ grows exponentially at first (an outbreak); if $R_0 < 1$, it dies out. This is the threshold that defined a generation's worth of public-health policy.
More precisely, infection grows while $\frac{dI}{dt} > 0$, i.e. while $\beta S/N > \gamma$, i.e. while $S/N > 1/R_0$. The epidemic peaks exactly when the susceptible fraction falls to $1/R_0$. Everything above that line is the herd immunity threshold: once a fraction
$$1 - \frac{1}{R_0}$$
of the population is immune, each infection produces less than one new infection on average and the epidemic recedes — without every remaining susceptible getting sick. For $R_0 = 3$, that threshold is $1 - 1/3 = 2/3 \approx 66.7\%$.
Check Your Understanding. A disease has transmission rate $\beta = 0.4$ per day and recovery rate $\gamma = 0.1$ per day. (a) What is $R_0$? (b) What susceptible fraction triggers the peak of infections? (c) What is the herd immunity threshold?
Answer
(a) $R_0 = \beta/\gamma = 0.4/0.1 = 4$. (b) The peak occurs when $S/N = 1/R_0 = 1/4 = 25\%$. (c) Herd immunity threshold $= 1 - 1/R_0 = 1 - 1/4 = 75\%$. Once three-quarters of the population is immune, each case spawns fewer than one new case and the epidemic turns around.
39.3.4 Solving the SIR model with scipy
The SIR system has no elementary closed-form solution — a recurring lesson of this book (Chapter 14, §14.12: most integrals and most ODEs escape elementary functions). So we integrate it numerically. This is scipy.integrate.solve_ivp, the modern successor to the odeint solver you met in Chapter 19. Under the hood it is a high-order Runge–Kutta method: a sophisticated, adaptive cousin of the Euler stepping you first saw in Chapter 19, itself a descendant of the Riemann-sum idea from Chapter 13.
# SIR epidemic model: solve the system numerically and read off its predictions.
import numpy as np
from scipy.integrate import solve_ivp
N = 1000.0 # total population
beta, gamma = 0.3, 0.1 # transmission and recovery rates (per day)
def sir(t, y):
S, I, R = y
dS = -beta * S * I / N
dI = beta * S * I / N - gamma * I
dR = gamma * I
return [dS, dI, dR]
y0 = [N - 1, 1, 0] # one initial infection in a fully susceptible town
t_span = (0, 160)
sol = solve_ivp(sir, t_span, y0, t_eval=np.linspace(0, 160, 161),
rtol=1e-8, atol=1e-8)
S, I, R = sol.y
R0 = beta / gamma
peak_day = sol.t[I.argmax()]
print(f"R0 = {R0:.1f}") # R0 = 3.0
print(f"Peak infected = {I.max():.0f} on day {peak_day:.0f}") # ~301 on day 38
print(f"Final recovered = {R[-1]:.0f} (attack rate {R[-1]/N:.1%})") # ~941 (94.1%)
print(f"Herd immunity threshold = {1 - 1/R0:.1%}") # 66.7%
With $R_0 = 3$, the model predicts a peak of about 301 simultaneously infected people around day 38, and a final attack rate of ~94% — almost everyone is eventually infected, even though the epidemic turns around when only 67% have been infected. That gap between the 67% herd-immunity threshold and the 94% final attack rate is the model's most important and least intuitive prediction: the epidemic overshoots. Infections keep happening after the turnaround because there is still a large pool of infectious people working through the remaining susceptibles.
Real-World Application — Epidemic forecasting (public health). This exact model, with these exact compartments, drove the early COVID-19 projections that informed lockdown timing in 2020. The "flatten the curve" slogan was a statement about this model: interventions that reduce $\beta$ (masking, distancing) lower and broaden the $I(t)$ peak so it stays under hospital capacity, even when they do not change the eventual attack rate much. The single quantity $R_0$ (and its time-varying cousin, the effective reproduction number $R_t = R_0\, S/N$) was reported daily by health agencies worldwide.
39.3.5 Validation: does the model obey its own laws?
Before trusting any model, check it against truths it must satisfy.
Conservation. Sum the three equations: $\frac{d}{dt}(S+I+R) = 0$, so $S+I+R \equiv N$ for all time. In the code above, S + I + R stays equal to 1000 to within solver tolerance. If it drifts, you have a bug.
The final-size relation. There is one beautiful analytic check the numerical solution must reproduce. Divide the $S$ equation by the $R$ equation:
$$\frac{dS}{dR} = \frac{-\beta S I/N}{\gamma I} = -\frac{\beta}{\gamma N}\,S = -\frac{R_0}{N}\,S.$$
This is a separable first-order ODE (Chapter 19) — the $I$'s cancelled. Separating and integrating (Chapter 13, the integral as the tool that recovers a total from a rate) gives $S = S_0\, e^{-R_0 (R - R_0^{\text{init}})/N}$, and pushing $t \to \infty$ (where $I \to 0$ so everyone is in $S$ or $R$) yields the final-size equation for the surviving susceptible fraction $s_\infty = S(\infty)/N$:
$$s_\infty = e^{-R_0\,(1 - s_\infty)}.$$
This is a transcendental equation — solve it with the root-finder from Chapter 11 (Newton's method) or scipy.optimize.brentq. For $R_0 = 3$ it gives $s_\infty \approx 0.06$, i.e. an attack rate of $1 - s_\infty \approx 0.94$ — matching the numerical simulation exactly. Two independent routes (numerical integration of the full system, and analytic reduction to a single equation) agree. That is validation.
# Validation: the final-size relation must match the simulation's attack rate.
from scipy.optimize import brentq
s_inf = brentq(lambda s: s - np.exp(-R0 * (1 - s)), 1e-9, 0.999)
print(f"Final-size equation: S_inf/N = {s_inf:.3f}, attack rate = {1 - s_inf:.3f}")
# S_inf/N = 0.060, attack rate = 0.940 -> matches R[-1]/N = 0.941 from simulation
Common Pitfall. Students often read the herd immunity threshold $1 - 1/R_0$ as the final fraction infected. It is not. The threshold is where the epidemic peaks and starts to decline; the final attack rate is larger — given by the final-size equation above — because of overshoot. For $R_0 = 3$ the threshold is 67% but ~94% are eventually infected. Confusing the turnaround point with the end point is the most common error in interpreting an SIR model, and it has real policy consequences: relaxing interventions exactly at the herd-immunity threshold still lets a large second wave of infections occur.
39.3.6 Where this model contributes to your portfolio
The SIR model is your portfolio's flagship if you choose Biology. The natural extensions, each a further loop of the modeling cycle:
- SIRD — split $R$ into recovered and dead to track mortality.
- SIR with vital dynamics — add birth and death terms for long-timescale diseases (measles, endemic equilibria).
- Age-structured SIR — replace each compartment by a vector indexed by age group $i$, coupled through a contact matrix $C_{ij}$ measured from survey data: $$\frac{dS_i}{dt} = -S_i\sum_j \beta\,C_{ij}\,\frac{I_j}{N_j}, \qquad \frac{dI_i}{dt} = S_i\sum_j \beta\,C_{ij}\,\frac{I_j}{N_j} - \gamma I_i.$$ This is exactly the structure real COVID-19 models used: schoolchildren and retirees have very different contact rates, and the matrix encodes that.
- Vaccination — start with a fraction already in $R$, or add a vaccination rate; compare random, targeted (high-contact individuals first), and ring (contacts of the infected) strategies by simulation.
Each extension is a chance to compare strategies quantitatively: which minimizes total cases for a fixed vaccine supply? That comparison — an optimization (Chapter 31) layered on top of a differential equation (Chapter 19) — is the kind of integrative result a strong portfolio shows.
39.4 Economics Track — Constrained Production, Built and Solved
The economics flagship is constrained optimization: a firm choosing inputs to maximize output (or minimize cost) subject to a budget. This synthesizes partial derivatives (Chapter 29), the gradient (Chapter 30), and Lagrange multipliers (Chapter 31).
39.4.1 The model
A firm produces output using labor $L$ and capital $K$ through a Cobb–Douglas production function
$$Q(L, K) = A\,L^{a}\,K^{b},$$
where $A > 0$ is total-factor productivity and $a, b > 0$ are the output elasticities of labor and capital. The firm has a fixed budget $B$ to spend, with wage $w$ per unit labor and rental rate $r$ per unit capital:
$$wL + rK = B.$$
Goal: choose $L, K$ to maximize $Q$ subject to the budget constraint.
39.4.2 Solving by hand with Lagrange multipliers
Form the Lagrangian (Chapter 31):
$$\mathcal{L}(L, K, \lambda) = A L^{a} K^{b} - \lambda\,(wL + rK - B).$$
The first-order conditions set each partial derivative to zero:
$$\frac{\partial \mathcal{L}}{\partial L} = a A L^{a-1} K^{b} - \lambda w = 0, \qquad \frac{\partial \mathcal{L}}{\partial K} = b A L^{a} K^{b-1} - \lambda r = 0.$$
Each partial derivative is itself a marginal product — the extra output from one more unit of input, the rate-of-change idea from Chapter 6 applied in two variables. Dividing the first condition by the second eliminates $\lambda$:
$$\frac{a A L^{a-1}K^{b}}{b A L^{a} K^{b-1}} = \frac{w}{r} \;\;\Longrightarrow\;\; \frac{a}{b}\cdot\frac{K}{L} = \frac{w}{r}.$$
This is the famous economic principle: at the optimum, the ratio of marginal products equals the ratio of input prices — the firm equalizes "bang per buck" across inputs. Solving for $K$ gives $K = \dfrac{b}{a}\dfrac{w}{r}L$. Substituting into the budget $wL + rK = B$:
$$wL + r\cdot\frac{b}{a}\frac{w}{r}L = B \;\;\Longrightarrow\;\; wL\Big(1 + \frac{b}{a}\Big) = B \;\;\Longrightarrow\;\; L^{*} = \frac{a}{a+b}\cdot\frac{B}{w}.$$
By the symmetric argument,
$$K^{*} = \frac{b}{a+b}\cdot\frac{B}{r}.$$
Every equation explained: the optimal labor spend is the fraction $\frac{a}{a+b}$ of the budget, divided by the wage. The Cobb–Douglas exponents are the budget shares — a clean, testable prediction. If labor's elasticity $a$ is large, the firm spends proportionally more on labor. The result is geometrically the point where the budget line is tangent to a level curve (isoquant) of $Q$ — the gradient $\nabla Q$ is parallel to the gradient of the constraint, which is precisely what the Lagrange condition $\nabla Q = \lambda \nabla g$ says (Chapter 31).
39.4.3 Verifying numerically
# Cobb-Douglas output maximization under a budget constraint.
import numpy as np
from scipy.optimize import minimize
A, a, b = 100.0, 0.3, 0.7 # productivity and elasticities
w, r, B = 10.0, 20.0, 1000.0 # wage, rental, budget
# Minimize -Q (i.e. maximize Q) subject to wL + rK = B.
neg_Q = lambda x: -A * x[0]**a * x[1]**b
budget = {'type': 'eq', 'fun': lambda x: w*x[0] + r*x[1] - B}
res = minimize(neg_Q, x0=[1, 1], constraints=budget,
bounds=[(1e-6, None), (1e-6, None)])
L_star, K_star = res.x
print(f"Numerical: L* = {L_star:.2f}, K* = {K_star:.2f}")
# Closed-form prediction:
L_hand = a/(a+b) * B/w
K_hand = b/(a+b) * B/r
print(f"Hand form: L* = {L_hand:.2f}, K* = {K_hand:.2f}")
# Both give L* = 30.00, K* = 35.00 -> hand and machine agree
The hand formula gives $L^* = \frac{0.3}{1.0}\cdot\frac{1000}{10} = 30$ and $K^* = \frac{0.7}{1.0}\cdot\frac{1000}{20} = 35$, and scipy.optimize.minimize lands on the same point. Hand computation built the understanding; the machine confirmed it (Theme 4).
Real-World Application — Factor demand in manufacturing (economics). Economists estimate Cobb–Douglas and CES production functions from real industry data (output, hours worked, capital stock) to predict how firms substitute between labor and capital when relative prices change — for example, how an increase in the minimum wage $w$ shifts firms toward capital $K$ (automation). The comparative-statics question "how does $L^*$ move when $w$ rises?" is answered by differentiating $L^* = \frac{a}{a+b}\frac{B}{w}$: $\frac{\partial L^*}{\partial w} = -\frac{a}{a+b}\frac{B}{w^2} < 0$ — higher wages reduce labor demand, the rate measured by a derivative (Chapter 6).
A second economics model that completes the track is consumer and producer surplus as definite integrals (Chapter 18): given a demand curve $p_d(q)$ and supply curve $p_s(q)$ meeting at equilibrium $(q^*, p^*)$, consumer surplus is $\int_0^{q^*}\big(p_d(q) - p^*\big)\,dq$ and producer surplus is $\int_0^{q^*}\big(p^* - p_s(q)\big)\,dq$ — areas between curves, evaluated by the Fundamental Theorem of Calculus (Chapter 14). The portfolio combines the two: optimize to find the equilibrium, integrate to find the welfare.
39.5 Physics Track — The Hohmann Transfer, Built and Solved
The physics flagship is a Hohmann transfer orbit: the minimum-fuel two-burn maneuver between two circular orbits. It synthesizes conic sections (Chapter 27), vector-valued functions and Newton's laws (Chapters 19 and 28), and the energy bookkeeping that integration provides (Chapter 18).
39.5.1 The physics, every equation explained
For a satellite of negligible mass orbiting a body of gravitational parameter $\mu = GM$, the vis-viva equation relates speed $v$ at radius $r$ to the orbit's semimajor axis $a$:
$$v^2 = \mu\Big(\frac{2}{r} - \frac{1}{a}\Big).$$
This is not an axiom — it is the conservation of energy ($\frac12 v^2 - \mu/r = -\mu/(2a) = \text{const}$), and energy is itself the integral of force over displacement (work, Chapter 18). For a circular orbit, $a = r$, so $v_{\text{circ}} = \sqrt{\mu/r}$.
The Hohmann transfer rides an ellipse tangent to the inner circle (radius $r_1$) at its perigee and to the outer circle (radius $r_2$) at its apogee. The transfer ellipse therefore has semimajor axis
$$a_t = \frac{r_1 + r_2}{2},$$
so $\dfrac{1}{a_t} = \dfrac{2}{r_1 + r_2}$. Two engine burns ($\Delta v$'s) are needed:
- Burn 1, at radius $r_1$: speed up from the inner circular speed $v_1 = \sqrt{\mu/r_1}$ to the transfer-ellipse perigee speed $v_p = \sqrt{\mu(2/r_1 - 2/(r_1+r_2))}$. So $\Delta v_1 = v_p - v_1$.
- Burn 2, at radius $r_2$: speed up from the transfer-ellipse apogee speed $v_a = \sqrt{\mu(2/r_2 - 2/(r_1+r_2))}$ to the outer circular speed $v_2 = \sqrt{\mu/r_2}$. So $\Delta v_2 = v_2 - v_a$.
39.5.2 Computing it
# Hohmann transfer: LEO to geostationary orbit. Total delta-v budget.
import numpy as np
mu = 398600.0 # Earth's GM, km^3/s^2
r1, r2 = 6678.0, 42164.0 # LEO and GEO radii (km from Earth's center)
v1 = np.sqrt(mu / r1) # circular speed, inner orbit
v2 = np.sqrt(mu / r2) # circular speed, outer orbit
vp = np.sqrt(mu * (2/r1 - 2/(r1 + r2))) # transfer perigee speed
va = np.sqrt(mu * (2/r2 - 2/(r1 + r2))) # transfer apogee speed
dv1, dv2 = vp - v1, v2 - va
print(f"dv1 = {dv1:.4f} km/s, dv2 = {dv2:.4f} km/s") # 2.4258, 1.4668
print(f"Total dv = {dv1 + dv2:.4f} km/s") # 3.8926 km/s
# Transfer time = half the period of the transfer ellipse (Kepler's third law)
a_t = (r1 + r2) / 2
T_transfer = np.pi * np.sqrt(a_t**3 / mu) / 3600
print(f"Transfer time = {T_transfer:.2f} hours") # 5.27 hours
The model predicts a total budget of about 3.89 km/s and a transfer lasting 5.27 hours. These are the numbers a mission planner starts from: real launches to geostationary orbit reserve a $\Delta v$ very close to this Hohmann value, then add small corrections for Earth's oblateness ($J_2$) and the finite duration of each burn.
Geometric Intuition. Picture three nested shapes sharing a focus at Earth's center: the small inner circle, the large outer circle, and the transfer ellipse kissing both — touching the inner circle at one end and the outer circle at the diametrically opposite end. Each burn is a sudden stretch of the velocity vector tangent to the path: the first burn turns a circle into an ellipse by adding speed at perigee; the second re-circularizes by adding speed at apogee. The whole maneuver is a half-lap of the ellipse, which is why the transfer time is half its orbital period — Kepler's third law (a conic-section fact from Chapter 27) doing the timing.
The portfolio extension is to simulate the trajectory by numerically integrating Newton's law $\ddot{\mathbf r} = -\mu\,\mathbf r/\|\mathbf r\|^3$ as a vector ODE (Chapters 19 and 28) with solve_ivp, confirming that the two computed burns actually deliver the satellite from one circle to the other.
39.6 Data Science Track — Gradient-Descent Curve Fitting, Built and Solved
The data-science flagship is fitting a model to data by gradient descent — the climax of the gradient-descent anchor that began in Chapter 6 (the derivative as a direction to step) and matured in Chapter 30 (the multivariable gradient and machine learning).
39.6.1 The model and the loss
Suppose we have data points $(x_i, y_i)$ and we want the best-fitting line $\hat y = mx + c$. "Best" means minimizing the mean squared error loss:
$$L(m, c) = \frac{1}{n}\sum_{i=1}^{n}\big(m x_i + c - y_i\big)^2.$$
This is a function of two parameters $(m, c)$. To minimize it we follow the negative gradient downhill (Chapter 30). The gradient components are partial derivatives (Chapter 29), each computed by the chain rule (Chapter 7):
$$\frac{\partial L}{\partial m} = \frac{2}{n}\sum_{i=1}^{n}\big(m x_i + c - y_i\big)\,x_i, \qquad \frac{\partial L}{\partial c} = \frac{2}{n}\sum_{i=1}^{n}\big(m x_i + c - y_i\big).$$
Gradient descent updates the parameters by stepping against the gradient, scaled by a learning rate $\eta$:
$$m \leftarrow m - \eta\,\frac{\partial L}{\partial m}, \qquad c \leftarrow c - \eta\,\frac{\partial L}{\partial c}.$$
Every term is calculus: the loss is a sum (an accumulation, Chapter 13), its gradient is the vector of partial derivatives (Chapter 30), and each update is a linear approximation step (the tangent-line idea from Chapter 11) that improves the fit a little.
39.6.2 Implementing it from scratch
# Fit a line by gradient descent and confirm it matches the exact least-squares solution.
import numpy as np
rng = np.random.default_rng(0)
x = np.linspace(0, 10, 50)
y = 2.0 * x + 1.0 + rng.normal(0, 1.5, x.size) # true slope 2, intercept 1, plus noise
m, c, eta, n = 0.0, 0.0, 0.005, x.size
for _ in range(20000):
err = (m * x + c) - y
grad_m = (2/n) * np.sum(err * x) # dL/dm
grad_c = (2/n) * np.sum(err) # dL/dc
m -= eta * grad_m # step downhill
c -= eta * grad_c
print(f"Gradient descent: m = {m:.4f}, c = {c:.4f}") # m = 2.174, c = 0.323
# Exact least-squares solution for comparison (the closed form GD is converging to):
A = np.vstack([x, np.ones_like(x)]).T
m_ls, c_ls = np.linalg.lstsq(A, y, rcond=None)[0]
print(f"Least squares: m = {m_ls:.4f}, c = {c_ls:.4f}") # m = 2.174, c = 0.323
Gradient descent converges to $m \approx 2.174$, $c \approx 0.323$ — exactly the closed-form least-squares answer (the fit is not $2.0$ and $1.0$ because the noise sample shifts the optimum slightly; gradient descent finds the true minimizer of the loss, whatever it is). The iterative learner and the one-shot linear-algebra solution agree, which is the validation: a convex loss has a single minimum, and following the gradient downhill must reach it.
Real-World Application — Training every machine-learning model (data science). This three-line update is how neural networks, logistic regressions, and large language models are trained. The only differences in a deep network are that the loss is non-convex, the gradient is computed by backpropagation (the multivariable chain rule of Chapter 30 applied layer by layer), and the optimizer is a refined gradient descent (momentum, Adam). Strip a billion-parameter model down to its mathematical core and you find exactly the update above: parameter $\leftarrow$ parameter $-\,\eta\,\nabla(\text{loss})$.
Common Pitfall. The learning rate $\eta$ is the most error-prone knob in all of gradient descent. Too small, and the loop above needs millions of iterations to converge. Too large, and the steps overshoot the minimum and the loss diverges to infinity — the parameters explode. Students who see
nanin their training output have almost always set $\eta$ too high. The cure is to scale your features (so all partial derivatives have comparable magnitude) and reduce $\eta$ until the loss decreases monotonically. There is no universally correct $\eta$; it is the first hyperparameter you tune.
39.7 Cross-Track Synthesis: One Toolkit, Four Worlds
Look back at the four flagship models. They study epidemics, factories, spacecraft, and data — four worlds with nothing physical in common. Yet each was built from the same six tools, each tool tracing to a specific chapter:
| Tool | What it provides | Home chapter | Biology (SIR) | Economics | Physics | Data Science |
|---|---|---|---|---|---|---|
| Derivative | rate of change | 6–9 | $dI/dt$ | marginal product | velocity | $\partial L/\partial m$ |
| Integral | accumulation / total | 13–18 | final size relation | consumer surplus | work = energy | loss = sum of errors |
| Optimization | the best choice | 10, 31 | optimal vaccination | max output / min cost | min-fuel orbit | min loss |
| Differential equation | dynamics over time | 19 | the SIR system | capital accumulation | Newton's $\ddot{\mathbf r}$ | gradient flow |
| Series / approximation | functions with no formula | 23–24 | early-growth $e^{(\beta-\gamma)t}$ | log-linearization | perturbation terms | Taylor of activations |
| Gradient | direction to improve | 30 | parameter calibration | $\nabla Q = \lambda\nabla g$ | n/a | the learning step |
This table is the thesis of the book (Theme 5): calculus appears in every quantitative field, and it is the same calculus. The epidemiologist, the economist, the aerospace engineer, and the machine-learning researcher are using one toolkit. Once you can read this table across any single row — "the integral means accumulation, whether of immune individuals, of consumer welfare, of energy, or of error" — you have understood what this whole textbook was for.
Three of the four anchor examples that threaded the book also reach their resolution in this synthesis: the SIR model (introduced Chapter 19) is solved in full here in §39.3; gradient descent (introduced Chapter 6, matured Chapter 30) becomes a working curve-fitter in §39.6; and the area-under-a-curve idea that powered the normal distribution (Chapter 13) reappears as economic surplus and as the loss-as-accumulated-error in §39.4 and §39.6. The fourth anchor, Euler's formula (Chapter 24), is pure mathematics and lives in Chapter 40's reflection.
Geometric Intuition. Every one of these four models, however different its subject, ultimately produces a picture you can see: the SIR model gives a trajectory in $S$-$I$-$R$ space (a curve threading three tanks); the economics model gives a budget line tangent to an isoquant; the Hohmann transfer gives nested conics; gradient descent gives a path rolling downhill on a bowl-shaped loss surface. Geometry and algebra are inseparable (Theme 2): if you cannot draw your model, you do not yet fully understand it.
39.8 Validation, Sensitivity, and Honesty
A model you have not stress-tested is a story. Four kinds of check turn a story into science, and your portfolio should apply each:
- Sanity checks. Limiting cases (does SIR with $R_0 < 1$ produce no epidemic? does the firm spend its entire budget?), dimensional analysis (do the units of $\beta SI/N$ work out to people-per-day?), and conservation laws ($S+I+R = N$ always).
- Comparison with data. Fit parameters to real observations (an outbreak curve, an industry's input choices, a published mission $\Delta v$) and measure the residual error.
- Sensitivity analysis. Vary each parameter and watch the output. For SIR, how does the peak day move as $\beta$ rises 10%? A model whose conclusions flip under tiny parameter changes is fragile and you must say so.
- Cross-validation (data science). Hold out part of the data; fit on the rest; test on the held-out part. A model that fits its training data perfectly but fails on new data has overfit — it memorized noise instead of learning signal.
Common Pitfall — Overfitting. When fitting a model to data, you can always drive the training error to zero by adding parameters: a polynomial of degree $n-1$ passes through any $n$ points exactly. But that curve wiggles wildly between the points and predicts new data terribly. The fix is to prefer the simplest model that fits (Occam's razor made quantitative) and to validate on data the model never saw. Training error is a measure of memorization; held-out error is a measure of understanding.
Honesty is the final, non-negotiable ingredient. Every model in this chapter is wrong in ways worth stating plainly: SIR assumes homogeneous mixing (false in any real city); Cobb–Douglas assumes smooth substitution between labor and capital (false at the level of a single machine); the Hohmann transfer assumes instantaneous burns and a two-body universe (false for any real, multi-body solar system); the line fit assumes the relationship is linear (rarely exactly true). Overconfident models cause real harm — mispriced financial risk in 2008, brittle epidemic forecasts. State your assumptions, state where the model breaks, and quantify your uncertainty. A model that knows its own limits is trustworthy precisely because it is humble.
39.9 Assembling and Communicating the Portfolio
Your finished portfolio, whichever track you chose, should contain:
- Statement of the problem — one or two paragraphs naming the real phenomenon.
- Assumptions — the modeling choices, stated explicitly and defended.
- The model — the equations, with every symbol defined, as we did for SIR in §39.3.2.
- Calculus tools used — a table like §39.7, citing the chapter behind each tool.
- Implementation — a reproducible Python/Jupyter notebook.
- Results — labeled plots and the headline numbers (the SIR peak, the firm's $L^*, K^*$, the $\Delta v$ budget, the fitted slope).
- Validation — sanity checks, data comparison, sensitivity, and at least one independent cross-check (as the final-size relation validated the SIR simulation).
- Limitations — what the model ignores and where it breaks.
- Conclusion — the punchline, in plain language.
Communication is not decoration; a model the world cannot understand changes nothing. Lead with the punchline, not the algebra. Compare:
Opaque: "We computed $\partial Q/\partial L = aAL^{a-1}K^{1-a}$ at the constrained optimum..."
Clear: "Increasing labor by 1% raises output by $a$% — and at the optimum, the firm spends exactly the fraction $\frac{a}{a+b}$ of its budget on labor."
Show shape with plots, not just numbers. Quantify uncertainty. Cite your data sources. Make the code run for someone else on the first try. These are the habits that separate a calculation from a contribution.
Add to Your Modeling Portfolio — The Capstone Assembly. This is the moment every prior "Add to Your Modeling Portfolio" prompt was building toward. Assemble your complete portfolio now, integrating all four threads of calculus across the chosen track: - Biology: the SIR (or logistic) model — differential equations (Ch. 19) for the dynamics, the integral (Ch. 13–14) for the final-size relation, optimization (Ch. 31) for vaccination strategy, series (Ch. 23) for the early-exponential phase. - Economics: constrained production — partial derivatives (Ch. 29) for marginal products, Lagrange multipliers (Ch. 31) for the optimum, the integral (Ch. 18) for surplus, the derivative (Ch. 6) for comparative statics. - Physics: the trajectory — conic sections (Ch. 27) for orbit geometry, vector ODEs (Ch. 19, 28) for the motion, the integral (Ch. 18) for energy/work, optimization (Ch. 10) for minimum fuel. - Data Science: the trained model — the gradient (Ch. 30) for the learning step, the chain rule (Ch. 7) for backpropagation, optimization (Ch. 31) for the loss minimum, cross-validation for honesty. Whichever track you chose, your portfolio now demonstrates the same arc: a rate became a model (derivative), the model accumulated a total (integral), dynamics unfolded over time (ODE), and a best choice emerged (optimization) — every equation explained, every tool traced home. That arc, built once for real, is the proof that you can do calculus, not just recite it.
Looking Ahead
You have now assembled a complete mathematical model and traced every tool in it back to the chapter that forged it. That is the practical summit of the book. Chapter 40 — The Big Picture is the conceptual summit: we step back from any single model and reflect on the whole calculus journey — revisiting all six recurring themes, resolving the last anchor (Euler's formula), and asking what calculus is, where it leads next, and why, four centuries after Newton and Leibniz, it remains the language in which the changing world is written.
Reflection
Thirty-eight chapters of tools converged in this one chapter into four working machines. The epidemiologist's $\frac{dI}{dt}$, the economist's Lagrangian, the engineer's vis-viva equation, and the data scientist's gradient step are, underneath, the same handful of ideas: a rate, a total, a dynamic law, a best choice — derivative, integral, differential equation, optimization. You did not just learn to differentiate and integrate; you learned to model — to look at a piece of the changing world, decide what matters, write it in the language of calculus, compute the consequences, and check whether they are true. That capability is the whole point. Build something with it that you are proud of.