> Learning paths. Math majors — read everything, especially the spectral proof that the three tests agree, the congruence argument behind the pivot test, and the Math-Major Sidebars on Sylvester's law and the Cholesky existence proof. CS / Data...
Prerequisites
- chapter-27-spectral-theorem
Learning Objectives
- Read a quadratic form $\mathbf{x}^{\mathsf{T}}A\mathbf{x}$ for symmetric $A$ as a surface over the plane, and recognize a positive definite form as an upward bowl with a unique minimum at the origin.
- Define positive definite, positive semidefinite, indefinite, and negative definite, and classify a symmetric matrix into one of these categories.
- State and apply the three definiteness tests — all eigenvalues positive, all pivots positive, all leading principal minors positive (Sylvester's criterion) — and explain via the spectral theorem why they agree.
- Describe the level sets of a positive definite form as ellipses whose axes point along the eigenvectors with half-lengths set by the eigenvalues.
- Connect positive definiteness to optimization (the Hessian and the second-derivative test, convexity) and to statistics (covariance matrices are positive semidefinite).
- State the Cholesky factorization $A = LL^{\mathsf{T}}$ and explain why it exists exactly for positive definite matrices.
In This Chapter
- 28.1 What does a quadratic form look like as a surface?
- 28.2 What does "positive definite" actually mean?
- 28.3 Why do the eigenvalues decide the shape? (the spectral picture)
- 28.4 What are the three definiteness tests, and why do they agree?
- 28.5 What do the level sets look like? Ellipses and their axes
- 28.6 How does positive definiteness power optimization?
- 28.7 Why are covariance matrices always positive semidefinite?
- 28.8 How do you check positive definiteness in code? Cholesky and the toolkit
- 28.9 What have we built, and where does it lead?
Positive Definite Matrices and Quadratic Forms: Energy, Curvature, and the Geometry of Optimization
Learning paths. Math majors — read everything, especially the spectral proof that the three tests agree, the congruence argument behind the pivot test, and the Math-Major Sidebars on Sylvester's law and the Cholesky existence proof. CS / Data Science — focus on the Geometric Intuition (the bowl and its elliptical contours), the numpy, the optimization link, and the covariance/Mahalanobis case study; the deepest sidebars are optional. Physics / Engineering — focus on the energy interpretation, the curvature picture, the ellipse-of-stiffness, and the second-derivative test that decides whether an equilibrium is stable.
Roll a marble into a salad bowl and let go. It rattles around for a moment, then settles at the lowest point and stays there. Tip the bowl, dent it, stretch it into an oval — the marble still finds the bottom, because the surface curves upward in every direction. Now turn the bowl over. The marble will not balance on the dome; the slightest nudge sends it rolling off, because the surface curves downward in every direction. And a Pringle's chip, a saddle, does something stranger still: curve up along one axis and down along another, so a marble placed at the center is stable if you push it one way and unstable if you push it the other. These three shapes — the bowl, the dome, and the saddle — are the entire subject of this chapter, and the astonishing fact we will prove is that which shape you get is decided by the eigenvalues of a single symmetric matrix.
This is the chapter where the spectral theorem of Chapter 27 stops being beautiful and starts being useful. We spent that chapter learning that a symmetric matrix is orthogonally diagonalizable — that it has a full set of perpendicular eigenvectors, and that in coordinates aligned with them it becomes a pure diagonal stretch. Here we ask the natural follow-up question: if those stretch factors — the eigenvalues — are all positive, what does that mean? The answer is positive definiteness, and it is one of the most important ideas in applied mathematics. A positive definite matrix is the algebraic encoding of an upward bowl, and bowls are exactly the surfaces that have a single, findable minimum. That is why positive definiteness is the silent partner in nearly every optimization algorithm, every least-squares fit, every covariance matrix in statistics, and every stable equilibrium in physics. When a machine-learning model trains, it is rolling a marble down a high-dimensional bowl; whether that bowl is genuinely bowl-shaped — positive definite — decides whether training converges at all.
The vehicle for the whole story is the quadratic form $\mathbf{x}^{\mathsf{T}}A\mathbf{x}$, the simplest nonlinear function a matrix can produce. We have spent twenty-seven chapters studying the linear map $\mathbf{x}\mapsto A\mathbf{x}$; now we let the matrix produce a single number, a scalar that rises and falls as $\mathbf{x}$ moves around the plane, and we graph that number as a surface. The shape of that surface — bowl, dome, or saddle — is the geometry of the matrix. By the end of the chapter you will be able to look at a symmetric matrix and see its surface, classify it with three different tests that all secretly agree, read its contour lines as a family of ellipses, and connect it to the curvature of any smooth function near a critical point. Let us draw the picture first, exactly as the book always does.
28.1 What does a quadratic form look like as a surface?
Before any algebra, fix the picture in your mind. Take the simplest possible example, $f(x, y) = x^2 + y^2$. For every point $(x, y)$ in the plane, this hands you back a single number — its squared distance from the origin — and if you plot that number as a height above the plane, you get a perfect circular bowl, a paraboloid, with its lowest point at the origin where $f = 0$. Walk away from the origin in any direction and the height climbs. There is exactly one bottom, and the surface cups upward all around it. That bowl is the mental image to attach to the phrase "positive definite," and we will keep returning to it.
Now ask: what is the matrix behind this bowl? Watch the algebra. We can write $x^2 + y^2$ as a matrix sandwich: $$f(x,y) = x^2 + y^2 = \begin{bmatrix} x & y \end{bmatrix}\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}\begin{bmatrix} x \\ y \end{bmatrix} = \mathbf{x}^{\mathsf{T}}I\,\mathbf{x}.$$ The bowl is the identity matrix in disguise. If we replace $I$ with a different symmetric matrix, we get a different surface. Replace it with $\begin{psmallmatrix}3 & 0\\ 0 & 1\end{psmallmatrix}$ and the form becomes $3x^2 + y^2$ — still a bowl, but one that climbs three times faster in the $x$-direction than the $y$-direction, an elongated bowl whose cross-sections are ellipses, not circles. Replace it with $\begin{psmallmatrix}1 & 0\\ 0 & -1\end{psmallmatrix}$ and the form becomes $x^2 - y^2$ — and now we have a saddle, climbing along $x$ but plunging along $y$. The diagonal entries are doing exactly what your geometric intuition expects: each one sets the curvature along its own axis, and their signs decide bowl versus dome versus saddle.
Geometric Intuition — A quadratic form $\mathbf{x}^{\mathsf{T}}A\mathbf{x}$ is a landscape: at sea level it is zero at the origin, and as you walk away from the origin the ground rises or falls. Positive definite is the landscape that rises in every direction — a bowl, a valley with one lowest point. Negative definite falls in every direction — a hilltop. Indefinite rises some ways and falls others — a mountain pass, a saddle. The whole chapter is about reading this landscape off the matrix.
The reason a quadratic form is the right nonlinear object to study is that it is the simplest one that still has curvature. A linear function $\mathbf{c}^{\mathsf{T}}\mathbf{x}$ graphs as a flat tilted plane — no minimum, no maximum, no curve. To get a surface that cups, you need the squared terms, and the most general "purely squared" function of several variables is precisely a quadratic form. So when we study these forms we are studying curvature in its purest, coordinate-free shape, and that is exactly what we will need when we come to the second-derivative test, where the curvature of any smooth function near a critical point is captured by a quadratic form built from its second derivatives.
28.1.1 The general quadratic form and why the matrix is symmetric
Let us write the general quadratic form in two variables and read off its matrix. A quadratic form is any function that is a sum of degree-two terms: $$q(x, y) = a\,x^2 + b\,xy + c\,y^2.$$ We can package this as $\mathbf{x}^{\mathsf{T}}A\mathbf{x}$, but there is a subtlety in how the cross term $b\,xy$ gets distributed. Expanding $\begin{psmallmatrix} x & y\end{psmallmatrix}\begin{psmallmatrix} a_{11} & a_{12}\\ a_{21} & a_{22}\end{psmallmatrix}\begin{psmallmatrix} x\\ y\end{psmallmatrix}$ gives $$\mathbf{x}^{\mathsf{T}}A\mathbf{x} = a_{11}x^2 + (a_{12} + a_{21})\,xy + a_{22}y^2.$$ The $x^2$ coefficient is $a_{11}$ and the $y^2$ coefficient is $a_{22}$, but the $xy$ coefficient is the sum $a_{12} + a_{21}$ — only the sum of the off-diagonal entries matters, not how the cross term is split between them. To make the matrix unique, we always split it evenly: $a_{12} = a_{21} = b/2$. That choice makes $A$ symmetric, and it is the convention we lock in for the rest of the chapter: $$q(x,y) = a x^2 + b xy + c y^2 \quad\Longleftrightarrow\quad A = \begin{bmatrix} a & b/2 \\ b/2 & c \end{bmatrix}, \qquad A = A^{\mathsf{T}}.$$
Why insist on symmetry? Partly for uniqueness — without it, infinitely many matrices give the same form, since you could shuffle weight between $a_{12}$ and $a_{21}$ without changing the function. But the deeper reason is that only the symmetric part of a matrix affects its quadratic form at all. For any square matrix $M$, the value $\mathbf{x}^{\mathsf{T}}M\mathbf{x}$ depends only on the symmetric matrix $\tfrac12(M + M^{\mathsf{T}})$, because the skew-symmetric part contributes nothing: if $S^{\mathsf{T}} = -S$ then $\mathbf{x}^{\mathsf{T}}S\mathbf{x} = 0$ for every $\mathbf{x}$ (the scalar equals its own negative, since $\mathbf{x}^{\mathsf{T}}S\mathbf{x} = (\mathbf{x}^{\mathsf{T}}S\mathbf{x})^{\mathsf{T}} = \mathbf{x}^{\mathsf{T}}S^{\mathsf{T}}\mathbf{x} = -\mathbf{x}^{\mathsf{T}}S\mathbf{x}$, forcing it to be zero). So we lose nothing by taking $A$ symmetric, and we gain everything: a symmetric $A$ is exactly the kind of matrix the spectral theorem of Chapter 27 governs. This is the hinge of the entire chapter — quadratic forms live on symmetric matrices, and symmetric matrices are precisely the ones with orthogonal eigenvectors and real eigenvalues. Every result that follows is the spectral theorem cashing out as geometry.
Common Pitfall — When you read a quadratic form, remember to halve the cross-term coefficient when filling in the off-diagonal entries. The form $4x^2 + 6xy + y^2$ has matrix $\begin{psmallmatrix}4 & 3\\ 3 & 1\end{psmallmatrix}$, not $\begin{psmallmatrix}4 & 6\\ 6 & 1\end{psmallmatrix}$. Forgetting the factor of $\tfrac12$ is the single most common error in this entire topic — it doubles your off-diagonal entries and corrupts every eigenvalue and minor you compute afterward. The diagonal entries, the coefficients of $x^2$ and $y^2$, are not halved; only the mixed term is split.
The Key Insight — A quadratic form $\mathbf{x}^{\mathsf{T}}A\mathbf{x}$ is fully described by a symmetric matrix $A$, and the shape of its surface — bowl, dome, or saddle — is governed entirely by the eigenvalues of $A$. Because $A$ is symmetric, the spectral theorem applies, and that is the source of everything we prove in this chapter.
28.2 What does "positive definite" actually mean?
Now the central definition, the one the whole chapter orbits. We want to capture, in algebra, the geometric idea of a bowl: a surface whose height is positive everywhere except at the single lowest point. Here it is, with conditions stated carefully.
Definition (definiteness). Let $A$ be a real symmetric $n\times n$ matrix. The form $\mathbf{x}^{\mathsf{T}}A\mathbf{x}$ and the matrix $A$ are called: - positive definite if $\mathbf{x}^{\mathsf{T}}A\mathbf{x} > 0$ for every nonzero $\mathbf{x}\in\mathbb{R}^n$; - positive semidefinite if $\mathbf{x}^{\mathsf{T}}A\mathbf{x} \ge 0$ for every $\mathbf{x}$ (equality allowed for some nonzero $\mathbf{x}$); - negative definite if $\mathbf{x}^{\mathsf{T}}A\mathbf{x} < 0$ for every nonzero $\mathbf{x}$; - negative semidefinite if $\mathbf{x}^{\mathsf{T}}A\mathbf{x} \le 0$ for every $\mathbf{x}$; - indefinite if $\mathbf{x}^{\mathsf{T}}A\mathbf{x}$ takes both positive and negative values.
Read these as statements about the surface. Positive definite means the height is strictly above zero everywhere you go (except at the origin itself, where every quadratic form vanishes because $\mathbf{0}^{\mathsf{T}}A\mathbf{0} = 0$) — a true bowl. Positive semidefinite relaxes this to at least zero, which allows a flat-bottomed trough: a valley that, instead of a single lowest point, has a whole flat line or plane of lowest points. Negative definite flips the bowl into a dome. Indefinite is the saddle, climbing in some directions and falling in others. Every symmetric matrix falls into exactly one of these categories, and the category is its geometric personality.
The phrase "for every nonzero $\mathbf{x}$" is doing enormous work, and it is what makes definiteness hard to check directly. You cannot test infinitely many vectors by hand. The simplest example, $A = I$, is plainly positive definite because $\mathbf{x}^{\mathsf{T}}I\mathbf{x} = \lVert\mathbf{x}\rVert^2 > 0$ for any nonzero $\mathbf{x}$ — the squared length is positive unless the vector is zero. But for a general matrix, with cross terms tangling the variables together, it is not at all obvious whether the form ever dips negative. The whole point of the tests in §28.4 is to replace this impossible "check every vector" with a finite computation: count some eigenvalues, or some pivots, or some determinants. That those finite tests correctly capture the infinite condition is the spectral theorem's gift.
Geometric Intuition — Think of standing at the origin and walking outward along every possible ray. Positive definite means no matter which ray you pick, the ground rises. Indefinite means there exists at least one ray that goes up and at least one that goes down. Semidefinite means the ground never goes down, but along at least one ray it stays perfectly level — you can walk along the floor of the valley without climbing. The definiteness of a matrix is the answer to: "from the bottom, which way is up — and is it always up?"
28.2.1 A first computation: detecting a saddle by hand
Let us make the saddle concrete, because it teaches the most. Take $A = \begin{psmallmatrix}1 & 0\\ 0 & -1\end{psmallmatrix}$, whose form is $q(x,y) = x^2 - y^2$. Is it definite? Test a few vectors. Along the $x$-axis, at $\mathbf{x} = (1, 0)$, we get $q = 1 > 0$. Along the $y$-axis, at $\mathbf{x} = (0, 1)$, we get $q = -1 < 0$. The form takes both signs, so $A$ is indefinite — a saddle, exactly as the $x^2 - y^2$ surface suggested. We did not need to check every vector; finding one positive and one negative value is enough to certify indefiniteness.
This is the easy direction: to prove a matrix is not definite, you just exhibit a vector that breaks the rule. To prove it is positive definite is the hard direction, because no finite list of successful test vectors can ever rule out a sneaky direction you did not try. Here is a slightly less obvious example to feel the difficulty. Take $A = \begin{psmallmatrix}2 & 3\\ 3 & 2\end{psmallmatrix}$, with form $q(x,y) = 2x^2 + 6xy + 2y^2$. Along the axes it looks positive: $q(1,0) = 2$ and $q(0,1) = 2$. But try the direction $\mathbf{x} = (1, -1)$: $q(1,-1) = 2 - 6 + 2 = -2 < 0$. So this matrix is indefinite too, even though both diagonal entries are positive and the axis tests passed. The cross term $6xy$ was strong enough to drag the form negative along the anti-diagonal. This example is a warning we will formalize shortly: positive diagonal entries do not make a matrix positive definite. You have to account for the off-diagonal coupling, and that is precisely what the tests will do.
Common Pitfall — Positive entries do not mean positive definite. The matrix $\begin{psmallmatrix}2 & 3\\ 3 & 2\end{psmallmatrix}$ has every entry positive, yet it is indefinite — its form goes negative along $(1,-1)$. Definiteness is about the eigenvalues, not the entries; a matrix bristling with positive numbers can still describe a saddle if the off-diagonal coupling overwhelms the diagonal. Conversely, a positive definite matrix can have negative entries (any negative off-diagonal entry is fine, as long as it is not too large). Never judge definiteness by glancing at the entries; run one of the three tests in §28.4.
28.3 Why do the eigenvalues decide the shape? (the spectral picture)
Here is the heart of the matter, and it is where Chapter 27 does all the work. The spectral theorem says that a real symmetric matrix $A$ can be written as $$A = QDQ^{\mathsf{T}},$$ where $Q$ is orthogonal (its columns are the orthonormal eigenvectors of $A$) and $D = \operatorname{diag}(\lambda_1, \dots, \lambda_n)$ holds the real eigenvalues. This single factorization, applied to the quadratic form, dissolves all the cross terms and lays the geometry bare. Watch what happens when we substitute it in.
1. Why we care. We want to know when $\mathbf{x}^{\mathsf{T}}A\mathbf{x} > 0$ for all nonzero $\mathbf{x}$ — when the surface is a bowl. If we can show this depends only on the signs of the eigenvalues, then the impossible "check every vector" collapses into the trivial "look at $n$ numbers."
2. Key idea. Rotating to the eigenvector coordinate system turns the tangled form $\mathbf{x}^{\mathsf{T}}A\mathbf{x}$ into a clean sum of squares with the eigenvalues as coefficients. In those coordinates the bowl is obvious.
3. Derivation. Substitute $A = QDQ^{\mathsf{T}}$ into the form and define new coordinates $\mathbf{y} = Q^{\mathsf{T}}\mathbf{x}$ (the components of $\mathbf{x}$ in the eigenvector basis): $$\mathbf{x}^{\mathsf{T}}A\mathbf{x} = \mathbf{x}^{\mathsf{T}}QDQ^{\mathsf{T}}\mathbf{x} = (Q^{\mathsf{T}}\mathbf{x})^{\mathsf{T}}D\,(Q^{\mathsf{T}}\mathbf{x}) = \mathbf{y}^{\mathsf{T}}D\mathbf{y}.$$ Because $D$ is diagonal, the form $\mathbf{y}^{\mathsf{T}}D\mathbf{y}$ has no cross terms at all — it is a pure weighted sum of squares: $$\mathbf{x}^{\mathsf{T}}A\mathbf{x} = \lambda_1 y_1^2 + \lambda_2 y_2^2 + \cdots + \lambda_n y_n^2.$$ Now the sign is transparent. Since each $y_i^2 \ge 0$, the entire sum is positive for all nonzero $\mathbf{y}$ exactly when every coefficient $\lambda_i$ is positive. (And $\mathbf{y} = \mathbf{0}$ if and only if $\mathbf{x} = \mathbf{0}$, because $Q^{\mathsf{T}}$ is invertible — orthogonal change of variables sends nonzero to nonzero.)
4. What this means. The quadratic form, viewed in the eigenvector coordinate system, is just a stack of independent parabolas — one per eigen-direction, each with curvature set by its eigenvalue. The cross terms in the original $A$ were never anything but the tilt of these parabolas relative to the standard axes; rotate to the right axes and they vanish. So the surface is a bowl precisely when every parabola opens upward, which is precisely when every eigenvalue is positive. We have reduced "is this an upward bowl?" to "are all the eigenvalues positive?" — a question about $n$ signs. $\blacksquare$
This argument is worth pausing on, because it is the engine of the whole chapter and it makes the eigenvalue test self-evident rather than mysterious. The eigenvalues are the curvatures of the surface along its principal axes; their signs are the directions of curving. All positive: a bowl, positive definite. All negative: a dome, negative definite. Mixed signs: a saddle, indefinite. A zero eigenvalue: a flat direction, semidefinite (the valley has a level floor along that eigenvector). Read off the signs of the eigenvalues and you have read off the shape.
Theorem (eigenvalue test for definiteness). A real symmetric matrix $A$ is positive definite $\iff$ all its eigenvalues are positive; positive semidefinite $\iff$ all eigenvalues are $\ge 0$; negative definite $\iff$ all eigenvalues are negative; indefinite $\iff$ it has both a positive and a negative eigenvalue.
Warning — This test, and every result in this chapter, requires $A$ to be symmetric. For a non-symmetric matrix the eigenvalues can be complex (Chapter 26), so "all eigenvalues positive" may not even make sense, and a non-symmetric matrix with positive real eigenvalues can still have a quadratic form that goes negative. Positive definiteness is a property of symmetric matrices. If someone hands you a non-symmetric $M$ and asks about $\mathbf{x}^{\mathsf{T}}M\mathbf{x}$, first replace $M$ by its symmetric part $\tfrac12(M + M^{\mathsf{T}})$ — that is the only part the form sees (§28.1.1) — and then apply the test.
Math-Major Sidebar — The eigenvalue characterization also pins down the extreme values of the form on the unit sphere. Restricting $\mathbf{x}$ to $\lVert\mathbf{x}\rVert = 1$ and writing $\mathbf{x}^{\mathsf{T}}A\mathbf{x} = \sum \lambda_i y_i^2$ with $\sum y_i^2 = 1$ (since $Q^{\mathsf{T}}$ preserves length, Chapter 21), the form is a weighted average of the eigenvalues with weights $y_i^2$. Its maximum over the sphere is therefore $\lambda_{\max}$ (achieved at the top eigenvector) and its minimum is $\lambda_{\min}$ (at the bottom eigenvector). This is the Rayleigh quotient characterization, $\lambda_{\min} \le \frac{\mathbf{x}^{\mathsf{T}}A\mathbf{x}}{\mathbf{x}^{\mathsf{T}}\mathbf{x}} \le \lambda_{\max}$, and it is the variational gateway to the Courant–Fischer min-max theorem and to how PCA (Chapter 32) maximizes variance. Positive definiteness is then simply $\lambda_{\min} > 0$: the smallest possible value of the form on the unit sphere is still strictly positive.
28.3.1 The semidefinite case: the flat-bottomed valley
The boundary between definite and indefinite deserves its own look, because it is exactly the case that statistics and data compression live on. A positive semidefinite matrix is one with a zero eigenvalue but no negative ones — all $\lambda_i \ge 0$ with at least one $\lambda_i = 0$. Geometrically, a zero eigenvalue is a flat direction: along its eigenvector the surface neither rises nor falls, so instead of a single lowest point the bowl has a whole trough — a line, or in higher dimensions a subspace, of equally-lowest points. Picture a half-pipe or a rain gutter: it curves upward across its width but runs perfectly level along its length. That level direction is the eigenvector for $\lambda = 0$.
The simplest example is $A = \begin{psmallmatrix}1 & 0\\ 0 & 0\end{psmallmatrix}$, with form $q(x,y) = x^2$. This is never negative ($x^2 \ge 0$ always), so it is positive semidefinite, but it is not positive definite, because it equals zero along the entire $y$-axis — every vector $(0, y)$ gives $q = 0$ without being the zero vector. The surface is a parabolic trough running along the $y$-axis, flat as far as you walk in that direction. A subtler example with a tilted flat direction is $A = \begin{psmallmatrix}1 & 1\\ 1 & 1\end{psmallmatrix}$, whose form factors as $q(x,y) = x^2 + 2xy + y^2 = (x+y)^2$; it vanishes along the whole line $x + y = 0$, the eigenvector for its zero eigenvalue (its eigenvalues are $2$ and $0$). The matrix is positive semidefinite, and its single flat direction is exactly the null space we studied in Chapter 13 — for a symmetric matrix, the eigenvectors of $\lambda = 0$ are the null space, the directions the matrix annihilates.
This is why the distinction matters so much downstream. A covariance matrix (§28.7) is semidefinite precisely when the data has a direction of zero variance — a perfect linear dependence among the features, so the cloud of points lies flat in a lower-dimensional subspace. That flat direction is a redundant feature, the thing PCA (Chapter 32) will discard. So "positive definite versus merely semidefinite" is the difference between data that genuinely fills its space and data that secretly lives on a thinner slab. The zero eigenvalues are the redundant dimensions, and recognizing them is the first step of dimensionality reduction.
28.4 What are the three definiteness tests, and why do they agree?
Checking eigenvalues is conceptually clean, but eigenvalues can be a chore to compute by hand — they require the characteristic polynomial of Chapter 24, which for a $3\times 3$ matrix means factoring a cubic. So mathematicians developed two more tests that are often faster, especially for hand computation, and the remarkable thing is that all three tests give the same answer. For a symmetric matrix $A$, the following are equivalent statements of positive definiteness:
- The eigenvalue test. All $n$ eigenvalues of $A$ are positive.
- The pivot test. All $n$ pivots are positive (the pivots are the diagonal entries you get from Gaussian elimination without row swaps, equivalently the diagonal of $D$ in the $A = LDL^{\mathsf{T}}$ factorization).
- The leading-principal-minor test (Sylvester's criterion). All $n$ leading principal minors are positive — that is, the determinants of the top-left $1\times 1$, $2\times 2$, …, $n\times n$ submatrices are all positive.
Three completely different computations — eigenvalues, elimination, determinants — and yet they always agree on whether the matrix is a bowl. That agreement is not a coincidence; it is the spectral theorem and the structure of elimination working in concert. Let us see why, because understanding the agreement is far more valuable than memorizing three rules.
28.4.1 Why the pivot test works
Recall from Chapters 4 and 10 that Gaussian elimination factors a matrix as a product involving its pivots. For a symmetric matrix that needs no row swaps, this becomes the symmetric factorization $A = LDL^{\mathsf{T}}$, where $L$ is lower triangular with $1$'s on its diagonal and $D$ is the diagonal matrix of pivots. (This is the symmetric cousin of the $LU$ decomposition of Chapter 10 — when $A$ is symmetric, the $U$ factor is just $DL^{\mathsf{T}}$, so the pivots sit in $D$.) Now substitute this into the quadratic form, exactly as we did with the spectral factorization: $$\mathbf{x}^{\mathsf{T}}A\mathbf{x} = \mathbf{x}^{\mathsf{T}}LDL^{\mathsf{T}}\mathbf{x} = (L^{\mathsf{T}}\mathbf{x})^{\mathsf{T}}D\,(L^{\mathsf{T}}\mathbf{x}) = \mathbf{z}^{\mathsf{T}}D\mathbf{z} = d_1 z_1^2 + d_2 z_2^2 + \cdots + d_n z_n^2,$$ where $\mathbf{z} = L^{\mathsf{T}}\mathbf{x}$ and the $d_i$ are the pivots. Since $L^{\mathsf{T}}$ is invertible (triangular with $1$'s on the diagonal, so $\det = 1 \ne 0$), the change of variables $\mathbf{z} = L^{\mathsf{T}}\mathbf{x}$ is a bijection sending nonzero to nonzero. So again the form is a weighted sum of squares, but now the weights are the pivots instead of the eigenvalues — and the form is positive for all nonzero $\mathbf{x}$ exactly when all the pivots are positive. This is the completing-the-square procedure you may have met for a single quadratic, generalized to $n$ variables: elimination is the systematic completion of the square, and the pivots are the coefficients that come out.
Geometric Intuition — Both the spectral factorization $A = QDQ^{\mathsf{T}}$ and the elimination factorization $A = LDL^{\mathsf{T}}$ turn the form into a sum of squares — but along different axes. The spectral one uses orthogonal axes (the eigenvectors, a rigid rotation); the elimination one uses skewed axes (the columns of $(L^{\mathsf{T}})^{-1}$, a shear). The eigenvalues and the pivots are generally different numbers — they are the curvatures measured in different coordinate systems. What they share is their signs: you cannot turn a bowl into a saddle by skewing your viewpoint, so all-positive eigenvalues and all-positive pivots are the same fact about the surface, seen two ways.
28.4.2 Why Sylvester's criterion works, and a key warning
The leading-principal-minor test connects to the pivots through a clean determinant identity. The $k$-th pivot equals the ratio of consecutive leading principal minors: $$d_k = \frac{\Delta_k}{\Delta_{k-1}}, \qquad \Delta_k = \det(\text{top-left } k\times k \text{ block}), \quad \Delta_0 = 1.$$ This is because eliminating the first $k$ columns leaves the top-left block's determinant as the product of the first $k$ pivots, so $\Delta_k = d_1 d_2 \cdots d_k$, and dividing consecutive products isolates a single pivot. Now the logic chains together: if all the leading minors $\Delta_1, \dots, \Delta_n$ are positive, then every ratio $d_k = \Delta_k/\Delta_{k-1}$ is positive, so all pivots are positive, so (by §28.4.1) the matrix is positive definite. Conversely, if the matrix is positive definite, all pivots are positive, so each $\Delta_k = d_1\cdots d_k$ is a product of positives and hence positive. The three tests are linked in a ring: eigenvalues ↔ the surface ↔ pivots ↔ minors. They cannot disagree, because each is a faithful reading of the same bowl.
Warning
— Sylvester's criterion uses the leading principal minors — the nested top-left blocks $1\times1, 2\times 2, \dots$ — and for positive definiteness this nested sequence is exactly right. But two traps lurk. First, for testing positive semidefiniteness the leading minors are not enough; you must check all principal minors (every diagonal-symmetric block, not just the top-left nested ones). The matrix $\begin{psmallmatrix}0 & 0\\ 0 & -1\end{psmallmatrix}$ has leading minors $\Delta_1 = 0$ and $\Delta_2 = 0$, which might fool you into thinking it is semidefinite, yet it is negative semidefinite — its form $-y^2$ is never positive. Second, for negative definiteness the leading minors do not stay positive; they alternate in sign, starting negative: $\Delta_1 < 0, \Delta_2 > 0, \Delta_3 < 0, \dots$ (because $A$ is negative definite exactly when $-A$ is positive definite, and negating an $n\times n$ block multiplies its determinant by $(-1)^n$). Always state which definiteness you are testing before reaching for the minors.
28.4.3 A complete worked classification by all three tests
Let us run all three tests on one matrix to watch them agree. Take $$A = \begin{bmatrix} 2 & -1 \\ -1 & 2 \end{bmatrix},$$ the symmetric matrix whose form is $2x^2 - 2xy + 2y^2$. (This matrix is famous: it is the "second-difference" matrix that appears when you discretize the second derivative, and it governs a chain of springs, as we will see in §28.7.)
Test 1 — eigenvalues. The characteristic polynomial (Chapter 24) is $\det(A - \lambda I) = (2-\lambda)^2 - 1 = \lambda^2 - 4\lambda + 3 = (\lambda - 1)(\lambda - 3)$, so the eigenvalues are $\lambda_1 = 1$ and $\lambda_2 = 3$. Both positive ⇒ positive definite.
Test 2 — pivots. Eliminate: subtract $-\tfrac12$ times row 1 from row 2 (i.e. add half of row 1 to row 2). The first pivot is $2$; the second pivot becomes $2 - \frac{(-1)(-1)}{2} = 2 - \tfrac12 = \tfrac32$. Pivots $2$ and $\tfrac32$, both positive ⇒ positive definite. Notice the pivots ($2, \tfrac32$) differ from the eigenvalues ($1, 3$) — different numbers, same signs, exactly as §28.4.1 promised.
Test 3 — leading principal minors. $\Delta_1 = \det[2] = 2 > 0$, and $\Delta_2 = \det A = (2)(2) - (-1)(-1) = 4 - 1 = 3 > 0$. Both positive ⇒ positive definite. And the pivot–minor identity checks out: $d_1 = \Delta_1 = 2$ and $d_2 = \Delta_2/\Delta_1 = 3/2$, matching the pivots we computed.
All three tests render the same verdict — a bowl — by three independent routes. This is the moment to internalize that they are not three separate facts to memorize but three windows onto one geometric truth.
# Three definiteness tests agree on A = [[2,-1],[-1,2]]: eigenvalues, pivots, minors.
import numpy as np
A = np.array([[2.0, -1.0],
[-1.0, 2.0]])
# Test 1: eigenvalues of a symmetric matrix (eigvalsh is for symmetric/Hermitian)
print("eigenvalues:", np.linalg.eigvalsh(A))
# Test 3: leading principal minors (nested top-left determinants)
print("minor 1x1:", np.linalg.det(A[:1, :1]))
print("minor 2x2:", np.linalg.det(A))
# Test 2: pivots via the diagonal of D in A = L D L^T (here from elimination by hand check)
d1 = A[0, 0]
d2 = A[1, 1] - A[1, 0] * A[0, 1] / A[0, 0]
print("pivots:", d1, d2)
eigenvalues: [1. 3.]
minor 1x1: 2.0
minor 2x2: 2.9999999999999996
pivots: 2.0 1.5
The eigenvalues are $\{1, 3\}$, the leading minors are $\{2, 3\}$, and the pivots are $\{2, 1.5\}$ — three different positive pairs, each certifying positive definiteness. (The minor printing as $2.9999999999999996$ instead of exactly $3$ is ordinary floating-point rounding from np.linalg.det's internal elimination; see the Computational Note in §28.8.)
Math-Major Sidebar — Sylvester's law of inertia is the deep theorem behind why the signs are coordinate-independent while the values are not. It states that for any invertible $C$, the congruent matrix $C^{\mathsf{T}}AC$ has the same number of positive, negative, and zero eigenvalues as $A$ — the inertia $(n_+, n_-, n_0)$ is invariant under congruence (the transformation $A \mapsto C^{\mathsf{T}}AC$ that a change of variables $\mathbf{x} = C\mathbf{y}$ induces on a quadratic form). Both $A = QDQ^{\mathsf{T}}$ (with $C = Q$, orthogonal) and $A = LDL^{\mathsf{T}}$ (with $C = L$, triangular) are congruences, which is why the eigenvalue-diagonal and the pivot-diagonal must share their sign pattern even though their entries differ. The name is attributed to James Joseph Sylvester, who studied these invariants in the 1850s [verify]. This law is the rigorous statement of the Geometric Intuition box in §28.4.1: skewing the coordinates cannot change a bowl into a saddle.
28.5 What do the level sets look like? Ellipses and their axes
We have been graphing the form as a surface in three dimensions. There is a complementary picture, often more useful, that lives entirely in the plane: the level sets, the curves where the form takes a constant value. Slice the bowl horizontally at height $c$ and look down at the curve you cut — that is the level set $\mathbf{x}^{\mathsf{T}}A\mathbf{x} = c$. For a positive definite form these slices are ellipses, nested one inside another like the contour lines of a hill on a topographic map, shrinking to the single point at the origin as $c\to 0$.
The geometry of these ellipses is read directly off the eigen-decomposition, and it is gorgeous. In the eigenvector coordinates the form is $\lambda_1 y_1^2 + \lambda_2 y_2^2 = c$, which is the standard equation of an ellipse. Its semi-axes lie along the $y_1$ and $y_2$ directions — that is, along the eigenvectors of $A$ — and their half-lengths are $\sqrt{c/\lambda_1}$ and $\sqrt{c/\lambda_2}$. So:
- the axes of the ellipse point along the eigenvectors of $A$ (a rigid rotation of the standard axes, since the eigenvectors are orthonormal);
- the half-length of each axis is inversely proportional to the square root of its eigenvalue, $\propto 1/\sqrt{\lambda}$.
Read that second bullet carefully, because the inverse is the source of a persistent confusion. A large eigenvalue means the form rises steeply in that direction, so you reach the contour level $c$ after only a short walk — the ellipse is narrow along a steep (large-$\lambda$) eigen-direction, and wide along a shallow (small-$\lambda$) one. The steepest direction is the short axis of the ellipse; the gentlest direction is the long axis. The eigenvectors set the orientation of the ellipse; the eigenvalues set its aspect ratio, with the long axis pointing along the smallest eigenvalue.
Common Pitfall — The long axis of the contour ellipse points along the smallest eigenvalue, not the largest. Because the half-length is $\sqrt{c/\lambda}$, a small $\lambda$ gives a long axis. Students routinely get this backwards by associating "big eigenvalue" with "big axis." The correct mnemonic: a big eigenvalue is steep curvature, and steep curvature means the contour is reached quickly, so the axis is short. Steep ⇒ short; shallow ⇒ long.
Let us see this on $A = \begin{psmallmatrix}3 & 1\\ 1 & 3\end{psmallmatrix}$. Its eigenvalues are $\lambda = 4$ (eigenvector $(1,1)/\sqrt2$, the diagonal direction) and $\lambda = 2$ (eigenvector $(1,-1)/\sqrt2$, the anti-diagonal). So the contour ellipses of $3x^2 + 2xy + 3y^2 = c$ are tilted $45°$, with their short axis along the diagonal $(1,1)$ (the steep $\lambda = 4$ direction) and their long axis along the anti-diagonal $(1,-1)$ (the gentle $\lambda = 2$ direction). The matrix's off-diagonal entry — the cross term — is exactly what tilts the ellipse off the coordinate axes; a diagonal matrix would give an ellipse aligned with the $x$- and $y$-axes.
28.5.1 The anchor figure: the bowl and its elliptical contours
This is the chapter's anchor image, and it deserves a full plot. We draw the positive definite form $\mathbf{x}^{\mathsf{T}}A\mathbf{x}$ for $A = \begin{psmallmatrix}3 & 1\\ 1 & 3\end{psmallmatrix}$ two ways at once: as a 3D bowl-shaped surface, and as the family of nested elliptical contours looking straight down. The two panels are the same object — the contours are the level slices of the surface — and together they are the picture to carry out of this chapter.
# Figure 28.1 — the anchor: a positive definite bowl and its elliptical contours.
import numpy as np
import matplotlib.pyplot as plt
A = np.array([[3.0, 1.0],
[1.0, 3.0]]) # positive definite: eigenvalues 4 and 2
g = np.linspace(-3, 3, 200)
X, Y = np.meshgrid(g, g)
Z = A[0,0]*X**2 + 2*A[0,1]*X*Y + A[1,1]*Y**2 # the quadratic form x^T A x
fig = plt.figure(figsize=(11, 5))
ax1 = fig.add_subplot(1, 2, 1, projection="3d")
ax1.plot_surface(X, Y, Z, cmap="viridis", alpha=0.9, linewidth=0)
ax1.set_title("Surface: x$^T$A x is an upward bowl")
ax1.set_xlabel("x"); ax1.set_ylabel("y"); ax1.set_zlabel("height")
ax2 = fig.add_subplot(1, 2, 2)
cs = ax2.contour(X, Y, Z, levels=[2, 6, 12, 20, 30], cmap="viridis")
ax2.clabel(cs, inline=True, fontsize=8)
# overlay the eigenvectors (axes of the ellipses)
w, V = np.linalg.eigh(A) # w = eigenvalues, V = eigenvectors (columns)
for i in range(2):
vec = V[:, i] * 2.0
ax2.arrow(0, 0, vec[0], vec[1], color="C3", width=0.04, length_includes_head=True)
ax2.set_aspect("equal"); ax2.grid(True, alpha=0.3)
ax2.set_title("Contours: nested ellipses, axes = eigenvectors")
print("eigenvalues:", w)
print("eigenvectors (columns):\n", np.round(V, 4))
plt.tight_layout(); plt.show()
eigenvalues: [2. 4.]
eigenvectors (columns):
[[-0.7071 0.7071]
[ 0.7071 0.7071]]
Figure 28.1. A positive definite quadratic form, two ways. Left: the surface $z = 3x^2 + 2xy + 3y^2$ is an upward-opening bowl (a paraboloid) with its single lowest point at the origin — height is positive everywhere else, the geometric signature of positive definiteness. Right: the level sets of the same form, looking straight down, are nested ellipses (contours labeled with their height $c$); the red arrows are the eigenvectors of $A$, which lie exactly along the axes of every ellipse. The eigenvector for $\lambda = 4$ (the diagonal $(1,1)$ direction) is the short axis — steep curvature — and the eigenvector for $\lambda = 2$ (the anti-diagonal) is the long axis. Alt-text: a 3D bowl-shaped surface on the left; on the right, concentric tilted ellipses with two perpendicular arrows along their principal axes.
The figure makes the chapter's claim unforgettable: a positive definite matrix is a bowl, its contours are ellipses, and the eigenvectors are the axes of those ellipses with the eigenvalues setting their lengths. Eigenvalues and geometry have fused. If you took the same code and fed it the indefinite matrix $\begin{psmallmatrix}1 & 0\\ 0 & -1\end{psmallmatrix}$, the surface would become a saddle and the contours would become hyperbolas — the level sets of an indefinite form are hyperbolas, not ellipses, and that change in the contour shape is the visible fingerprint of indefiniteness.
In fact the contour shape gives you a complete visual taxonomy of definiteness, one worth memorizing alongside the surface shapes. A positive (or negative) definite form has closed, nested ellipses for contours, shrinking to a point at the center — both eigenvalues the same sign, so every level set is bounded. An indefinite form has hyperbolas, opening toward the steep-up directions and away from the steep-down ones, with the saddle's two "downhill" directions marking the asymptotes — the contour through the saddle point itself is the degenerate pair of crossing lines $\lambda_1 y_1^2 + \lambda_2 y_2^2 = 0$. A semidefinite (but not definite) form, with a zero eigenvalue, has contours that are parallel straight lines running along the flat direction — the level sets of the trough $x^2 = c$ are the pair of lines $x = \pm\sqrt c$, never closing up because the surface never rises along the $y$-axis. Ellipses, hyperbolas, parallel lines: the three conic types are the three definiteness classes, and a single glance at a contour plot tells you which surface you are standing on. This is the same conic-section trichotomy you may recall from analytic geometry, now revealed as a statement about the signs of two eigenvalues.
Real-World Application — Confidence ellipses in statistics. When you fit a model and estimate two parameters jointly, the uncertainty in the estimate is described by a positive (semi)definite covariance matrix, and the region of plausible parameter values is exactly one of these contour ellipses. The eigenvectors tell you the directions of correlated uncertainty (a long, tilted ellipse means the two parameters trade off — you cannot pin down one without the other), and the eigenvalues tell you how much uncertainty there is along each direction. Every "error ellipse" you have seen plotted around a best-fit point is a level set of a quadratic form, read exactly as we read Figure 28.1. We develop this fully in Case Study 28.1, and the same ellipse reappears as the principal-axis picture of PCA in Chapter 32.
28.6 How does positive definiteness power optimization?
We now arrive at the reason positive definiteness is one of the most consequential ideas in applied mathematics: it is the exact condition under which a smooth function has a genuine, findable minimum. This is the bridge from the abstract bowl to the concrete business of minimizing things — fitting models, training networks, designing structures, allocating resources. The link runs through calculus, specifically through the matrix of second derivatives.
Recall the one-variable second-derivative test from calculus: at a critical point where $f'(x) = 0$, you have a local minimum if $f''(x) > 0$ (the curve cups upward) and a local maximum if $f''(x) < 0$. In several variables the first derivative becomes the gradient $\nabla f$ (a vector of first partials, zero at a critical point) and the second derivative becomes the Hessian matrix $H$, the symmetric matrix of all second partial derivatives: $$H = \begin{bmatrix} \dfrac{\partial^2 f}{\partial x_1^2} & \dfrac{\partial^2 f}{\partial x_1 \partial x_2} & \cdots \\[2mm] \dfrac{\partial^2 f}{\partial x_2 \partial x_1} & \dfrac{\partial^2 f}{\partial x_2^2} & \cdots \\ \vdots & & \ddots \end{bmatrix}.$$ The Hessian is symmetric whenever the mixed partials are continuous (Clairaut's theorem from multivariable calculus, which says $\partial^2 f/\partial x\,\partial y = \partial^2 f/\partial y\,\partial x$), so it is a legitimate subject for this chapter's machinery. And near a critical point $\mathbf{x}_0$, Taylor's theorem says the function is approximately a quadratic form built from the Hessian: $$f(\mathbf{x}_0 + \mathbf{h}) \approx f(\mathbf{x}_0) + \tfrac12\,\mathbf{h}^{\mathsf{T}}H\mathbf{h},$$ where the linear term vanished because $\nabla f(\mathbf{x}_0) = \mathbf{0}$ at the critical point. So the local shape of any smooth surface near a critical point is a quadratic form — the very objects we have been studying. The Hessian's definiteness is the surface's shape.
The Key Insight — The multivariable second-derivative test is the definiteness test. At a critical point of a smooth function $f$: if the Hessian is positive definite, the point is a local minimum (the surface is locally a bowl); if negative definite, a local maximum (a dome); if indefinite, a saddle point (neither — up some ways, down others). The classification of critical points in calculus is nothing but the classification of symmetric matrices in this chapter.
This is a profound unification. The reason a second-derivative test for functions of many variables even exists is that the local curvature is encoded in a symmetric Hessian, and the spectral theorem guarantees that symmetric matrix has real eigenvalues whose signs classify the critical point. Single-variable calculus has the easy case $n = 1$, where the $1\times 1$ "Hessian" is just $f''$ and "positive definite" means "$f'' > 0$." Everything generalizes through linear algebra.
28.6.1 Convexity and why optimization loves bowls
There is a global version of this story too, and it is the foundation of an entire field. A function is convex if it curves upward everywhere — if its Hessian is positive semidefinite at every point. Convex functions are the dream of optimization, because they have no bad local minima: any local minimum is automatically the global minimum, since a bowl that curves upward everywhere can have only one bottom. This is why so much effort in machine learning, operations research, and engineering goes into formulating problems as convex ones — once your loss function is convex, an algorithm that rolls downhill is guaranteed to find the best answer, not get stuck in a false valley.
The simplest convex function of all is a positive definite quadratic form $\tfrac12\mathbf{x}^{\mathsf{T}}A\mathbf{x} - \mathbf{b}^{\mathsf{T}}\mathbf{x}$, whose Hessian is the constant matrix $A$. When $A$ is positive definite this is a perfect bowl, and its unique minimum is found by setting the gradient to zero: $\nabla = A\mathbf{x} - \mathbf{b} = \mathbf{0}$, i.e. $A\mathbf{x} = \mathbf{b}$. Minimizing a positive definite quadratic is the same as solving a linear system — a connection that runs in both directions and powers the conjugate gradient method, one of the most important algorithms in scientific computing, which solves enormous positive definite systems $A\mathbf{x} = \mathbf{b}$ by rolling a marble down the corresponding bowl. The least-squares problem of Chapter 17 is the headline example: its normal equations $A^{\mathsf{T}}A\,\hat{\mathbf{x}} = A^{\mathsf{T}}\mathbf{b}$ feature the matrix $A^{\mathsf{T}}A$, which (as we prove next) is positive semidefinite, and positive definite whenever $A$ has independent columns — which is exactly why the least-squares bowl has a unique bottom.
Real-World Application — Training machine-learning models. When you train a model by gradient descent, you are minimizing a loss function by repeatedly stepping downhill, and the local geometry that determines how fast you converge is the Hessian of the loss. Where the loss is locally positive definite, descent behaves beautifully and homes in on the minimum; the condition number of the Hessian (the ratio $\lambda_{\max}/\lambda_{\min}$ of largest to smallest eigenvalue, Chapter 38) controls the speed — a near-spherical bowl (condition number near 1) converges fast, while a long thin valley (large condition number, a badly elongated ellipse) makes gradient descent zigzag slowly down the valley floor. This is precisely why techniques like feature normalization and preconditioning exist: they reshape the loss bowl to be rounder. The full machinery is the subject of optimization, but the geometry is the ellipse of Figure 28.1.
28.6.2 A worked critical-point classification
Let us make the second-derivative test concrete on a function with a genuine saddle, because the saddle is where multivariable calculus most needs linear algebra. Consider $$f(x, y) = x^3 - 3xy + y^2.$$ Its gradient is $\nabla f = (3x^2 - 3y,\; -3x + 2y)$. Setting both partials to zero gives $y = x^2$ from the first equation and $y = \tfrac32 x$ from the second; substituting, $x^2 = \tfrac32 x$, so $x = 0$ or $x = \tfrac32$. The two critical points are $(0, 0)$ and $\left(\tfrac32, \tfrac94\right)$. To classify each, we need the Hessian, the matrix of second partials: $$H(x, y) = \begin{bmatrix} f_{xx} & f_{xy} \\ f_{yx} & f_{yy} \end{bmatrix} = \begin{bmatrix} 6x & -3 \\ -3 & 2 \end{bmatrix}.$$ Notice the Hessian depends on the point — its top-left entry $6x$ changes as we move — which is the whole reason a function can be a bowl in one place and a saddle in another. We evaluate it at each critical point and apply this chapter's definiteness tests.
At $(0,0)$: $H = \begin{psmallmatrix}0 & -3\\ -3 & 2\end{psmallmatrix}$. The leading minors are $\Delta_1 = 0$ and $\Delta_2 = (0)(2) - 9 = -9 < 0$. A negative $\Delta_2$ (the full determinant, which equals the product of the eigenvalues) means the two eigenvalues have opposite signs — one positive, one negative — so the Hessian is indefinite and $(0,0)$ is a saddle point. At $\left(\tfrac32, \tfrac94\right)$: $H = \begin{psmallmatrix}9 & -3\\ -3 & 2\end{psmallmatrix}$, with $\Delta_1 = 9 > 0$ and $\Delta_2 = 18 - 9 = 9 > 0$. Both leading minors positive ⇒ positive definite ⇒ $\left(\tfrac32, \tfrac94\right)$ is a local minimum. The function dips into a bowl at one critical point and threads a mountain pass at the other, and a single $2\times 2$ determinant told us which is which.
# Classify the critical points of f = x^3 - 3xy + y^2 via the Hessian's definiteness.
import numpy as np
def hessian(x, y):
return np.array([[6*x, -3.0],
[-3.0, 2.0]])
for (x, y) in [(0.0, 0.0), (1.5, 2.25)]:
H = hessian(x, y)
ev = np.linalg.eigvalsh(H) # symmetric -> use eigvalsh
kind = ("local min" if (ev > 0).all() else
"local max" if (ev < 0).all() else
"saddle")
print(f"point ({x}, {y}): eigenvalues {np.round(ev,3)} -> {kind}")
point (0.0, 0.0): eigenvalues [-2.162 4.162] -> saddle
point (1.5, 2.25): eigenvalues [ 0.89 10.11] -> local min
The eigenvalues confirm the determinant test: at the origin they straddle zero ($-2.16$ and $4.16$, opposite signs, a saddle), and at $\left(\tfrac32, \tfrac94\right)$ they are both positive ($0.89$ and $10.11$, a bowl, a local minimum). Multivariable critical-point classification is, start to finish, an exercise in matrix definiteness.
Check Your Understanding — A smooth function of two variables has Hessian $H = \begin{psmallmatrix}-2 & 0\\ 0 & -5\end{psmallmatrix}$ at a critical point. Is the point a maximum, minimum, or saddle?
Answer
The matrix is diagonal, so its eigenvalues are the diagonal entries, $-2$ and $-5$ — both negative. A negative definite Hessian means the surface is a dome (curving downward in every direction), so the critical point is a local maximum. Equivalently by the alternating-sign minor test for negative definiteness: $\Delta_1 = -2 < 0$ and $\Delta_2 = 10 > 0$, the negative-then-positive pattern that certifies negative definiteness. The form $-2x^2 - 5y^2$ is plainly $\le 0$ with a single highest point at the origin.
28.7 Why are covariance matrices always positive semidefinite?
Now a second great application, from statistics and data science, and it is the one that makes Part VI's principal component analysis possible. The covariance matrix of a collection of data is the symmetric matrix $\Sigma$ whose $(i,j)$ entry is the covariance between feature $i$ and feature $j$ — how the two features vary together, with the variances of each feature on the diagonal. Covariance matrices are everywhere in data: they summarize the spread and the correlations of a dataset in a single symmetric object. And they are always positive semidefinite — never indefinite, never a saddle. Let us see why, because the reason is a clean one-line argument that also explains the deep link to the least-squares matrix $A^{\mathsf{T}}A$.
Claim. Any covariance matrix $\Sigma$ is positive semidefinite.
Why we care. Positive semidefiniteness is what guarantees that variances are never negative, that PCA's eigenvalues (which are variances) come out $\ge 0$, and that the data's "spread ellipsoid" is a genuine ellipsoid and not some impossible saddle shape. It is the structural fact underneath all of multivariate statistics.
Key idea. A covariance matrix has the form $\Sigma = \frac{1}{N}B^{\mathsf{T}}B$ for the centered data matrix $B$ (each column a mean-subtracted feature), and any matrix of the form $B^{\mathsf{T}}B$ — a Gram matrix, from Chapter 20 — is automatically positive semidefinite.
Proof. Take any vector $\mathbf{w}$ and compute the quadratic form of $\Sigma$: $$\mathbf{w}^{\mathsf{T}}\Sigma\mathbf{w} = \tfrac{1}{N}\,\mathbf{w}^{\mathsf{T}}B^{\mathsf{T}}B\,\mathbf{w} = \tfrac{1}{N}(B\mathbf{w})^{\mathsf{T}}(B\mathbf{w}) = \tfrac{1}{N}\lVert B\mathbf{w}\rVert^2 \ge 0.$$ The form is a squared length divided by $N$, and squared lengths are never negative, so $\mathbf{w}^{\mathsf{T}}\Sigma\mathbf{w} \ge 0$ for every $\mathbf{w}$ — positive semidefinite. $\blacksquare$
What this means. The quantity $\mathbf{w}^{\mathsf{T}}\Sigma\mathbf{w}$ has a beautiful statistical meaning: it is the variance of the data projected onto the direction $\mathbf{w}$. The proof says this projected variance can never be negative — obviously, since variance is an average of squares — and that obvious fact, expressed in matrix language, is exactly positive semidefiniteness. The matrix is positive definite (strictly, with no zero eigenvalues) precisely when no direction has zero variance, i.e. when the data genuinely spreads out in all directions and is not trapped in a lower-dimensional subspace. A zero eigenvalue signals a direction of no variation — a perfect linear dependence among the features — which is the semidefinite, flat-bottomed-valley case.
Geometric Intuition — A cloud of data points has a shape, and that shape is an ellipsoid (in 2D, an ellipse) whose axes are the eigenvectors of the covariance matrix and whose extents are set by the eigenvalues — the same ellipse-from-a-positive-definite-matrix picture as Figure 28.1, now describing the spread of data rather than the contours of a bowl. The longest axis of the data ellipse is the direction of greatest variance; that is the first principal component (Chapter 32). Positive semidefiniteness is the guarantee that this ellipsoid is a real ellipsoid — bounded, convex, never turning inside-out into a hyperbolic saddle. Covariance and curvature are the same geometry.
This is the moment to see how tightly the book's threads are woven. The matrix $A^{\mathsf{T}}A$ from least squares (Chapter 17), the Gram matrix from Gram–Schmidt (Chapter 20), and the covariance matrix from statistics are the same kind of object — symmetric and positive semidefinite by the identical one-line argument ($\mathbf{w}^{\mathsf{T}}B^{\mathsf{T}}B\mathbf{w} = \lVert B\mathbf{w}\rVert^2 \ge 0$). That shared structure is why least squares has a unique solution, why Gram–Schmidt's projections are well-posed, and why PCA's variances are non-negative. One linear-algebra fact, three flagship applications. You can read more about how these matrices organize data in covariance matrices, which develops the statistical side in detail.
28.7.1 A physical reading: energy and stiffness
For physicists and engineers there is a third interpretation, and it is the one that gave positive definite matrices their name in the first place. In mechanics, the energy stored in a system displaced from equilibrium is a quadratic form $E(\mathbf{x}) = \tfrac12\mathbf{x}^{\mathsf{T}}K\mathbf{x}$, where $\mathbf{x}$ is the displacement and $K$ is the stiffness matrix. A stable equilibrium — a system that springs back when you push it — is exactly one where this energy is positive for every nonzero displacement, i.e. where $K$ is positive definite. Push the system any way you like and it costs energy, so it wants to return to the bottom of the bowl. The matrix $\begin{psmallmatrix}2 & -1\\ -1 & 2\end{psmallmatrix}$ we classified in §28.4.3 is precisely the stiffness matrix of two masses connected by three identical springs, and its positive definiteness ($\lambda = 1, 3$) is the statement that the spring system is stable — every way of displacing the masses stores positive energy. Its eigenvectors are the normal modes of vibration (the symmetric and antisymmetric oscillations) and its eigenvalues set their frequencies. The energy bowl and the optimization bowl and the covariance ellipsoid are, mathematically, one and the same.
Real-World Application — Structural engineering and stability. When engineers analyze whether a bridge, a building, or a mechanical linkage is stable, they assemble its stiffness matrix and check that it is positive definite. A non-positive-definite stiffness matrix has a displacement direction that costs no energy (a zero eigenvalue — a mechanism that moves freely) or releases energy (a negative eigenvalue — an instability that runs away, a collapse). The eigenvalues are the squared natural frequencies, so a tiny or negative eigenvalue is a warning of a dangerous low-frequency or unstable mode. The same definiteness test that classifies a critical point in optimization decides whether a structure stands up.
28.8 How do you check positive definiteness in code? Cholesky and the toolkit
We close with the computational side, which has a delightful twist: the fastest way to check positive definiteness in practice is not to compute eigenvalues at all, but to attempt a special factorization and see if it succeeds. That factorization is the Cholesky factorization, the crown jewel of positive definite matrices.
Definition (Cholesky factorization). A symmetric positive definite matrix $A$ factors uniquely as $$A = LL^{\mathsf{T}},$$ where $L$ is a lower-triangular matrix with positive diagonal entries.
The Cholesky factorization is the symmetric, positive definite specialization of the $LU$ decomposition from Chapter 10 — instead of $A = LU$ with two separate triangular factors, the symmetry lets you fold them into a single factor and its transpose, $A = LL^{\mathsf{T}}$. Geometrically, $L$ is the "square root" of the matrix in the sense that it is the linear map turning the unit ball into the data ellipsoid: if $A = LL^{\mathsf{T}}$, then the form $\mathbf{x}^{\mathsf{T}}A\mathbf{x} = \lVert L^{\mathsf{T}}\mathbf{x}\rVert^2$ is again a plain squared length in the coordinates $L^{\mathsf{T}}\mathbf{x}$. (This is the same completing-the-square idea as the pivot test — indeed $L$ here is the Cholesky factor, related to the $A = LDL^{\mathsf{T}}$ pivots by absorbing $\sqrt{D}$ into $L$, so $L_{\text{Chol}} = L_{LDL}\sqrt{D}$.) It is cheaper than the $LU$ factorization, about half the work, because you only compute one triangular factor.
The key fact — and the reason Cholesky is the practitioner's definiteness test — is an existence theorem with a sharp condition:
Theorem (Cholesky existence). A symmetric matrix $A$ has a Cholesky factorization $A = LL^{\mathsf{T}}$ with real, positive diagonal $L$ if and only if $A$ is positive definite.
So the factorization exists exactly when the matrix is a bowl. If you ask a computer to Cholesky-factor a matrix and it succeeds, the matrix is positive definite; if the algorithm hits a non-positive number under a square root and fails, the matrix is not. This is why np.linalg.cholesky raising a LinAlgError is the standard, fast, numerically robust test for positive definiteness — far cheaper than computing all the eigenvalues, and not subject to the rounding wobble that can make a borderline eigenvalue come out as a tiny negative number.
Math-Major Sidebar — Why existence is equivalent to positive definiteness. If $A = LL^{\mathsf{T}}$ with $L$ having positive diagonal, then $L$ is invertible (triangular, nonzero diagonal), so $\mathbf{x}^{\mathsf{T}}A\mathbf{x} = \lVert L^{\mathsf{T}}\mathbf{x}\rVert^2 > 0$ for nonzero $\mathbf{x}$ — positive definite. Conversely, if $A$ is positive definite, all its leading principal minors are positive (Sylvester), so Gaussian elimination needs no row swaps and produces positive pivots $d_k = \Delta_k/\Delta_{k-1} > 0$; taking square roots of those pivots and absorbing them into the triangular factor builds $L$ explicitly, entry by entry, via the recurrence $L_{jj} = \sqrt{a_{jj} - \sum_{k
positive quantities precisely because the matrix is positive definite — the algorithm's success and the matrix's definiteness are the same condition. The factorization is named for André-Louis Cholesky, a French military officer and geodesist who developed it for surveying computations; it was published posthumously around 1924, after his death in the First World War [verify].
Here is the factorization at work, and the verification against numpy that this chapter's numbers demand:
# Cholesky A = L L^T exists iff A is positive definite; verify against eigvalsh.
import numpy as np
A = np.array([[4.0, 2.0],
[2.0, 3.0]]) # symmetric; is it positive definite?
L = np.linalg.cholesky(A) # succeeds => A is positive definite
print("L =\n", np.round(L, 4))
print("L L^T reconstructs A:\n", np.round(L @ L.T, 4))
print("eigenvalues (all > 0?):", np.round(np.linalg.eigvalsh(A), 4))
# Contrast: an indefinite matrix makes Cholesky fail.
B = np.array([[1.0, 2.0],
[2.0, 1.0]]) # eigenvalues 3 and -1: indefinite
try:
np.linalg.cholesky(B)
except np.linalg.LinAlgError as e:
print("B Cholesky failed ->", e)
print("B eigenvalues:", np.round(np.linalg.eigvalsh(B), 4))
L =
[[2. 0. ]
[1. 1.4142]]
L L^T reconstructs A:
[[4. 2.]
[2. 3.]]
eigenvalues (all > 0?): [1.4384 5.5616]
B Cholesky failed -> Matrix is not positive definite
B eigenvalues: [-1. 3.]
The factor $L = \begin{psmallmatrix}2 & 0\\ 1 & \sqrt2\end{psmallmatrix}$ multiplies back to $A$ exactly, and $A$'s eigenvalues come out positive ($\approx 1.44$ and $5.56$), so the Cholesky success and the positive eigenvalues agree — $A$ is positive definite. The matrix $B$, with eigenvalues $-1$ and $3$, is indefinite, and Cholesky correctly refuses to factor it, raising the error that we use as our test. Two independent diagnoses, one verdict each.
Computational Note — Use
np.linalg.eigvalsh(thehis for "Hermitian/symmetric"), notnp.linalg.eig, when your matrix is symmetric.eigvalshexploits symmetry to return real eigenvalues in sorted order and is faster and more accurate; plaineigmay return them complex (with tiny spurious imaginary parts like1e-17jfrom rounding) and unsorted. For testing definiteness, prefer the Cholesky approach (try: np.linalg.cholesky(A)) over thresholding eigenvalues: a borderline-singular positive semidefinite matrix can produce a smallest eigenvalue like-2e-16purely from floating-point error, which would fool a naiveall(eigs > 0)check, whereas Cholesky's pivoted decision is more robust. And always confirm symmetry first (np.allclose(A, A.T)) — every test in this chapter silently assumes it.Build Your Toolkit — Implement
is_positive_definite(A, tol=1e-12)intoolkit/positive_definite.py, in pure Python (no numpy inside the implementation; numpy only to check). Your function should (1) first verify the matrix is symmetric — returnFalseimmediately if $\lvert a_{ij} - a_{ji}\rvert > \texttt{tol}$ for any $i, j$, since definiteness is undefined otherwise — and then (2) decide positive definiteness, ideally by attempting a Cholesky factorization from scratch with the recurrence $L_{jj} = \sqrt{a_{jj} - \sum_{kFalse the moment a diagonal radicand is $\le 0$ (the factorization has failed, so the matrix is not positive definite) and Trueif every diagonal entry came out positive. As an alternative or a cross-check, you may instead test that all leading principal minors are positive using yourdet_cofactorfrom Chapter 11, or all pivots positive using elimination from Chapter 4. Verify three ways: againstnp.linalg.cholesky(succeeds iffTrue), againstall(np.linalg.eigvalsh(A) > 0), and against Sylvester's criterion, on a positive definite matrix, an indefinite one, and a positive semidefinite one (which should returnFalsefor strict positive definiteness). This module sits besidelu.pyfrom Chapter 10 and feeds directly intopca.pyin Chapter 32, where the covariance matrix's positive semidefiniteness guarantees real, non-negative variances.Check Your Understanding — Without computing eigenvalues, decide whether $A = \begin{psmallmatrix}5 & 2\\ 2 & 1\end{psmallmatrix}$ is positive definite, using Sylvester's criterion.
Answer
Check the leading principal minors. $\Delta_1 = \det[5] = 5 > 0$ ✓. $\Delta_2 = \det A = (5)(1) - (2)(2) = 5 - 4 = 1 > 0$ ✓. Both positive, so by Sylvester's criterion $A$ is positive definite. (As a sanity check, the pivots are $d_1 = \Delta_1 = 5$ and $d_2 = \Delta_2/\Delta_1 = 1/5$, both positive; and the eigenvalues, which you did not need, are $\approx 0.17$ and $5.83$ — both positive, agreeing with all three tests. Note that even though one eigenvalue and one pivot are small, they are strictly positive, so the bowl is genuine, just very elongated — a long thin valley.)
28.9 What have we built, and where does it lead?
We started by rolling a marble into a bowl and end with a unifying principle that reaches into optimization, statistics, physics, and numerical computing. The through-line was a single geometric picture made precise: a positive definite matrix is one whose quadratic form $\mathbf{x}^{\mathsf{T}}A\mathbf{x}$ is an upward bowl with a unique minimum, and — because the matrix is symmetric — the spectral theorem of Chapter 27 lets us read that shape directly off the eigenvalues. All eigenvalues positive is a bowl; mixed signs is a saddle; a zero is a flat-bottomed trough. From that one fact flowed the three definiteness tests — eigenvalues, pivots, leading principal minors (Sylvester's criterion) — which agree not by coincidence but because each is a sign-faithful reading of the same surface, a consequence of Sylvester's law of inertia. We saw the contours of the bowl are ellipses whose axes are the eigenvectors and whose lengths run as $1/\sqrt\lambda$, and we cashed that geometry out four ways: the Hessian and the second-derivative test in optimization, convexity and the guarantee of a unique global minimum, the positive-semidefinite covariance matrix that underwrites all of multivariate statistics, and the energy/stiffness matrix that decides whether a physical equilibrium is stable. Finally the Cholesky factorization $A = LL^{\mathsf{T}}$ gave us both a "square root" of a positive definite matrix and the fastest practical test for definiteness.
This chapter is, more than any other in Part V, the place where eigenvalues stop being abstract and become curvature — a quantity you can see, optimize against, and build structures on. Keep the vocabulary close: a quadratic form, the four flavors of definiteness, the three tests and Sylvester's criterion, the leading principal minors and pivots, the Hessian and convexity, the covariance matrix and Mahalanobis distance, and the Cholesky factorization. These are the terms that Part VI will lean on at every step.
The forward references are immediate and important. The covariance matrix we just proved is positive semidefinite is the exact object whose eigenvectors PCA will extract in Chapter 32 — the principal components are its eigenvectors, the variances are its eigenvalues, and the data ellipsoid is Figure 28.1's ellipse. The "square root" and "rotate–stretch" intuition of the Cholesky factor previews the deepest factorization of all, the Singular Value Decomposition of Chapter 30, which generalizes everything in this chapter to non-symmetric and even non-square matrices: where a positive definite symmetric matrix factors as $A = QDQ^{\mathsf{T}}$ with positive eigenvalues, every matrix factors as $A = U\Sigma V^{\mathsf{T}}$ with non-negative singular values — and those singular values are precisely the square roots of the eigenvalues of the positive semidefinite matrix $A^{\mathsf{T}}A$ we met in §28.7. Positive definiteness is the bridge from the spectral theorem to the SVD, and through the SVD to nearly all of modern data science. The marble in the bowl turns out to be rolling toward the heart of machine learning.
# Preview of Chapter 30: singular values of A are sqrt of the eigenvalues of A^T A,
# which is symmetric positive semidefinite (this chapter's §28.7).
import numpy as np
A = np.array([[3.0, 0.0],
[4.0, 5.0]]) # a general (non-symmetric) matrix
AtA = A.T @ A
print("A^T A eigenvalues (>= 0, psd):", np.round(np.linalg.eigvalsh(AtA), 4))
print("sqrt of those: ", np.round(np.sqrt(np.linalg.eigvalsh(AtA)), 4))
print("singular values of A: ", np.round(np.linalg.svd(A, compute_uv=False), 4))
A^T A eigenvalues (>= 0, psd): [ 5. 45.]
sqrt of those: [2.2361 6.7082]
singular values of A: [6.7082 2.2361]
The eigenvalues of $A^{\mathsf{T}}A$ are non-negative — positive semidefinite, exactly as §28.7 guaranteed — and their square roots are precisely the singular values of $A$ (the same two numbers, $6.71$ and $2.24$, just sorted oppositely). That identity is the doorway to Chapter 30, where the bowl-and-ellipse geometry of this chapter becomes the universal language of matrix factorization. Hold onto the picture of the bowl; it is about to describe every matrix there is.