Chapter 31 — Key Takeaways
Finding Critical Points (Section 31.2)
Set $\nabla f = \mathbf{0}$. For $f(x, y)$ this means $f_x = 0$ and $f_y = 0$, solved simultaneously — they are a system, not two independent equations. Geometrically, the tangent plane (built in Chapter 30) is horizontal there. Extrema can also occur where $\nabla f$ fails to exist (corners, cusps); handle those points directly, as in Chapter 10.
Three Types of Critical Points (Section 31.4)
| Type | Visual | Test |
|---|---|---|
| Local minimum | Bowl | $D > 0$ and $f_{xx} > 0$ |
| Local maximum | Inverted bowl | $D > 0$ and $f_{xx} < 0$ |
| Saddle | Saddle / Pringles chip | $D < 0$ |
| Inconclusive | – | $D = 0$ |
where $D = f_{xx}f_{yy} - f_{xy}^2 = \det H_f$. The saddle point is the genuinely new phenomenon: $f$ rises along some directions and falls along others, with no single-variable analog. On a contour map, a saddle is the point where two level curves cross in an X.
The Hessian Matrix (Section 31.3)
$$H_f = \begin{pmatrix} f_{xx} & f_{xy} \\ f_{xy} & f_{yy} \end{pmatrix}.$$
By Clairaut's theorem (Chapter 29) it is symmetric. Its determinant $\det H_f = D$ is the discriminant. The test is just the completed square of the second-order Taylor quadratic form $Q(h,k) = f_{xx}h^2 + 2f_{xy}hk + f_{yy}k^2$: $D > 0$ forces both squared coefficients to share a sign (definite — max or min), $D < 0$ makes them disagree (saddle).
Higher Dimensions (Section 31.6)
For $f(x_1, \ldots, x_n)$, the single number $D$ does not generalize, but the eigenvalue criterion does. Classify by the eigenvalues of the $n \times n$ Hessian: - all positive (positive-definite) → minimum; - all negative (negative-definite) → maximum; - mixed signs (indefinite) → saddle; - any zero → degenerate, inconclusive.
In high dimensions almost every critical point is a saddle (a minimum needs all eigenvalues positive at once) — the central fact behind neural-network training.
Absolute Extrema on Closed Regions (Section 31.5)
The Extreme Value Theorem (the 2-D cousin of the Chapter 9 result) guarantees a continuous $f$ on a closed, bounded region attains both extremes. Find them in three steps:
- Find interior critical points; list $f$ there.
- Find extrema on the boundary (parametrize + single-variable calculus, or Lagrange).
- Compare all values; largest is the absolute max, smallest the absolute min.
The boundary is the most-forgotten step — the true extreme often lives on the edge while the interior critical point is a saddle decoy.
Lagrange Multipliers — One Constraint (Section 31.8)
To optimize $f(x, y)$ subject to $g(x, y) = 0$:
$$\nabla f = \lambda \nabla g, \qquad g = 0.$$
The geometry: at a constrained extremum, a level curve of $f$ is tangent to the constraint, so the gradients (each perpendicular to its curve, per Chapter 30) are parallel. The condition is necessary, not sufficient — evaluate $f$ at every candidate and compare. Assumes $\nabla g \ne \mathbf 0$.
Lagrange Multipliers — Multiple Constraints (Section 31.9)
For constraints $g_1 = 0, \ldots, g_k = 0$:
$$\nabla f = \lambda_1 \nabla g_1 + \cdots + \lambda_k \nabla g_k,$$
one multiplier per constraint. In 3-D with two constraints: five equations in $x, y, z, \lambda_1, \lambda_2$.
Economic Interpretation of $\lambda$ (Section 31.10)
$\lambda = \partial(\text{optimal value})/\partial(\text{constraint level})$ — the shadow price, the marginal value of relaxing the constraint by one unit. In consumer theory, $\lambda$ is the marginal utility of income; the optimum satisfies the equal-marginal-utility-per-dollar condition $U_x/p_x = U_y/p_y = \lambda$. Cobb–Douglas $U = x^a y^b$ splits the budget in proportion to the exponents.
Statistics and Machine Learning (Section 31.11)
- Least squares (unconstrained): minimize $S(m,c) = \sum (y_i - mx_i - c)^2$ via $\nabla S = \mathbf 0$, giving the linear normal equations. Because $S$ is a sum of squares, its Hessian is positive-definite everywhere → the critical point is the global minimum (no saddles).
- Maximum likelihood (often constrained): least squares = MLE under Gaussian noise. Constrained MLE (probabilities summing to 1) is solved by Lagrange multipliers; e.g., $\hat p_i = n_i/n$.
Convexity (Section 31.12)
$f$ is convex if its Hessian is positive-semidefinite everywhere (all eigenvalues $\ge 0$). A convex function has no saddle points and no spurious local minima — every critical point is the global minimum, and gradient descent (Chapter 30) finds it from any start. Linear/ridge/logistic regression, SVMs, and the Markowitz portfolio are convex; neural-network training is not.
Newton's Method, Multivariable (Section 31.7, optional)
$$\mathbf{x}_{k+1} = \mathbf{x}_k - H_f(\mathbf x_k)^{-1} \nabla f(\mathbf{x}_k).$$
The inverse Hessian replaces $1/f''$ from Chapter 11. Quadratic convergence near a nondegenerate critical point, but inverting an $n\times n$ matrix costs $\sim n^3$ — hence quasi-Newton (BFGS, L-BFGS) in practice.
Common Pitfalls
- Stopping at $\nabla f = \mathbf 0$: a critical point alone does not reveal its type — always apply the second-derivative test.
- Reading $D = 0$ as "no extremum": it only means the test is silent; check $f$ along well-chosen paths, including curves, not just lines.
- Pairing $f_x = 0$ and $f_y = 0$ solutions independently: solve the system; back-substitute and confirm both vanish.
- Forgetting the boundary in closed-region problems.
- Treating the Lagrange point as automatically optimal: it is a candidate; evaluate and compare. Watch the sign of $\lambda$ (it is signed). Use one $\lambda_i$ per constraint.
Connections
- Chapter 9: 1-D Extreme Value Theorem → its 2-D form here.
- Chapter 10: single-variable optimization and the closed-interval method → critical-point + boundary method here.
- Chapter 11: Newton's method (1-D) → multivariable Newton (Section 31.7).
- Chapter 29: partial derivatives and Clairaut's theorem → the symmetric Hessian.
- Chapter 30: the gradient and gradient descent → $\nabla f = \mathbf 0$ and the tangency argument behind Lagrange.
- Chapter 32: double and triple integrals — accumulation over the regions we optimized over.
Skills You Should Have
- Find critical points by solving the system $\nabla f = \mathbf 0$.
- Build the Hessian and compute $D$; apply the second-derivative test; handle $D = 0$ by path-testing.
- Classify in $n$ dimensions via Hessian eigenvalues.
- Find absolute extrema on closed regions (interior + boundary + compare).
- Apply Lagrange multipliers with one and with two constraints, and interpret $\lambda$.
- Recognize least squares / MLE as multivariable optimization and use convexity to certify a global optimum.
What's Next
Chapter 32 introduces double and triple integrals — accumulation over two- and three-dimensional regions, the natural generalization of the definite integral of Chapter 13, with applications to volume, mass, center of mass, and probability. Chapter 33 then develops the change of variables and the Jacobian that make those integrals computable.
Reflection
Single-variable calculus gave you peaks and valleys. Several variables gave you the saddle point — a place that is level yet neither highest nor lowest — and the constrained optimum, where a level curve kisses a constraint and two gradients fall into line. The Hessian reads the local geometry off the second derivatives; Lagrange multipliers optimize while tethered. From a household budget to a regression line to a trained network, the same equations are quietly doing the choosing — which is the only kind of optimization the real world ever asks for.