> Learning paths. Math majors — read everything, especially the two motivated proofs (real eigenvalues in §27.4, orthogonal eigenvectors in §27.5) and the Math-Major Sidebars on normal matrices and the full spectral theorem. CS / Data Science —...
Prerequisites
- chapter-25-diagonalization
- chapter-21-orthogonal-matrices-and-rotations
Learning Objectives
- State the Spectral Theorem with its CONDITIONS: a real **symmetric** matrix ($A = A^{\mathsf{T}}$) is always orthogonally diagonalizable as $A = Q\Lambda Q^{\mathsf{T}}$ with real eigenvalues and an orthonormal eigenbasis.
- Explain geometrically why symmetric means pure stretch along orthogonal axes — no shear, no rotation — and see it in the visualizer.
- Prove that a real symmetric matrix has only real eigenvalues, and that eigenvectors for distinct eigenvalues are orthogonal.
- Write the spectral decomposition $A = \sum_i \lambda_i \mathbf{q}_i\mathbf{q}_i^{\mathsf{T}}$ as a sum of rank-1 orthogonal projectors and interpret each term.
- State the **Hermitian** complex analogue ($A = A^{*}$, real eigenvalues, unitary diagonalization) and connect it to quantum observables and the qubit.
- Verify orthogonal diagonalization in numpy with `np.linalg.eigh`, and explain why the theorem FAILS for non-symmetric matrices.
In This Chapter
- 27.1 What does a symmetric matrix DO to space?
- 27.2 What exactly does the Spectral Theorem say?
- 27.3 How do you orthogonally diagonalize a matrix? (a first worked example)
- 27.4 Why are the eigenvalues of a symmetric matrix always real?
- 27.5 Why are eigenvectors for distinct eigenvalues orthogonal?
- 27.6 What is the spectral decomposition? (a sum of rank-one projectors)
- 27.7 What can you DO with the spectral decomposition?
- 27.8 Why does the theorem fail for non-symmetric matrices?
- 27.9 What is the Hermitian analogue, and why does quantum mechanics depend on it?
- 27.10 What do the perpendicular eigen-axes look like? (the visualizer returns)
- 27.11 How does the Spectral Theorem describe a quadratic form?
- 27.12 How do you compute the spectral decomposition in code?
- 27.13 What have we built, and where does it lead?
The Spectral Theorem: Symmetric Matrices Are Always Diagonalizable (and That's Profound)
Learning paths. Math majors — read everything, especially the two motivated proofs (real eigenvalues in §27.4, orthogonal eigenvectors in §27.5) and the Math-Major Sidebars on normal matrices and the full spectral theorem. CS / Data Science — focus on the Geometric Intuition, the visualizer, the spectral decomposition of §27.6, and the PCA application; the proofs build intuition but the sidebars are optional. Physics / Engineering — focus on the geometry of principal axes, the Hermitian section §27.8 (this is the mathematics under quantum observables), and the stress-tensor and PCA case studies.
Of all the matrices in linear algebra, one family is so well-behaved that it borders on the miraculous. Take any symmetric matrix — one that equals its own transpose, $A = A^{\mathsf{T}}$ — and no matter how its entries are arranged, three guarantees fall out together. Its eigenvalues are all real (no complex numbers ever appear, even though Chapter 24 taught us a general matrix can have complex eigenvalues). Its eigenvectors can be chosen orthonormal (mutually perpendicular and unit length, the cleanest basis there is). And as a consequence it is diagonalized not by some arbitrary invertible $P$ but by an orthogonal matrix $Q$ — a rotation. Packaged together, these say that every symmetric matrix factors as
$$A = Q\Lambda Q^{\mathsf{T}},$$
where $Q$ is orthogonal ($Q^{\mathsf{T}}Q = I$) and $\Lambda$ is a real diagonal matrix of eigenvalues. This is the Spectral Theorem, and it is one of the summits of the entire book.
We have spent two parts building toward this moment. Part IV gave us orthogonality — the geometry of perpendicular axes and the rotation matrices $Q$ that satisfy $Q^{\mathsf{T}}Q = I$ (Chapter 21). Part V gave us eigenvalues and eigenvectors — the invariant directions of a transformation (Chapter 23), the characteristic polynomial that finds them (Chapter 24), and diagonalization $A = PDP^{-1}$ when a matrix has enough independent eigenvectors (Chapter 25). The Spectral Theorem is where those two streams fuse. For a symmetric matrix the eigenvectors are not merely independent (which is all general diagonalization promised) — they are orthogonal, so the change-of-basis matrix is a rotation, and a rotation's inverse is just its transpose. The messy $P^{-1}$ of Chapter 25 collapses into a clean $Q^{\mathsf{T}}$.
Why should you care beyond the elegance? Because symmetric matrices are everywhere the real world is. A covariance matrix is symmetric, and its spectral decomposition is precisely Principal Component Analysis (Chapter 32). The Hessian that controls whether an optimization landscape curves up or down is symmetric (Chapter 28). A stress tensor in a beam, a moment-of-inertia tensor in a spinning body, the adjacency matrix of an undirected graph, the kernel matrix at the heart of a support vector machine — all symmetric, all governed by this one theorem. And its complex twin, the Hermitian matrix ($A = A^{*}$), is the mathematical bedrock of quantum mechanics: every observable quantity an experiment can measure is a Hermitian operator, and the theorem's promise of real eigenvalues is the promise that measurements come out real. The Spectral Theorem is not a corner of the subject. It is the structural reason a huge swath of applied mathematics works.
A word of warning before we begin, because it is the single most important caveat in this chapter and we will repeat it relentlessly: all of this requires symmetry. A general square matrix is not orthogonally diagonalizable; it may not be diagonalizable at all, and when it is, its eigenvectors are generally skew, not perpendicular. The phrase "matrices are diagonalizable" is simply false, and the phrase "symmetric matrices are orthogonally diagonalizable" is the precise, true statement. State the condition every time. With that promise in place, let us see what makes symmetric matrices so special — and, as always, let us look at the picture first.
27.1 What does a symmetric matrix DO to space?
Before any theorem, fix the geometric picture, because it explains everything that follows. We have a transformation $T(\mathbf{x}) = A\mathbf{x}$ acting on the plane, and we want to know what it looks like when $A$ is symmetric. The answer is the cleanest possible motion: a symmetric transformation is a pure stretch along a set of mutually perpendicular axes. It picks out some orthogonal directions, stretches space by one factor along the first, by another factor along the second, and so on — and it does nothing else. No rotation of the axes, no shearing, no twisting.
Contrast this with the general matrices we met earlier. A shear (Chapter 1) slants the grid, dragging vertical lines into diagonals. A rotation (Chapter 21) spins every direction. A general matrix (Chapter 7) does some unholy mixture of stretching, shearing, and rotating all at once, and untangling the pieces was hard. A symmetric matrix refuses all of that complication. It has a set of preferred perpendicular directions — its eigenvectors — and along each one it simply scales. If you stand on an eigen-axis, you only move toward or away from the origin; you never get swung sideways. That is the whole geometric content of symmetry.
Geometric Intuition — Picture an ellipse drawn on a rubber sheet. A symmetric transformation grabs the sheet along two perpendicular axes and stretches: maybe by a factor of 3 along one axis and a factor of 1 (no change) along the perpendicular one. A circle becomes an ellipse whose major and minor axes are exactly those two perpendicular eigen-directions. Crucially, the axes of the ellipse line up with the directions you stretched — there is no rotation mixed in. That perpendicular-stretch picture is the Spectral Theorem, drawn before any algebra.
Now connect that picture to the algebra of diagonalization from Chapter 25. To diagonalize $A = PDP^{-1}$ means: change to the coordinate system of the eigenvectors (that is $P^{-1}$), apply the pure scaling $D$ in those coordinates, and change back ($P$). For a general matrix the eigenvector coordinate system is skewed — $P$ is some arbitrary invertible matrix, and changing into and out of skewed coordinates is the awkward part. For a symmetric matrix the eigenvector axes are perpendicular, so the coordinate change is a rotation, $Q$. And rotating into eigen-coordinates, scaling, and rotating back is exactly "rotate–stretch–rotate-back" — a motion you can see in your mind's eye. The eigenvectors being orthogonal is what turns the abstract $A = PDP^{-1}$ into the concrete, visual $A = Q\Lambda Q^{\mathsf{T}}$.
The Key Insight — Symmetry of a matrix is the algebraic shadow of orthogonality of its eigen-axes. A symmetric matrix stretches space along perpendicular directions and does nothing else, which is exactly why it is diagonalized by a rotation: $A = Q\Lambda Q^{\mathsf{T}}$.
27.1.1 Why "symmetric" is exactly the right condition
It is worth pausing on why the condition is symmetry, specifically, and not something else. Recall from Chapter 8 that the transpose has a geometric meaning tied to the dot product: $A^{\mathsf{T}}$ is the unique matrix satisfying $(A\mathbf{x})\cdot\mathbf{y} = \mathbf{x}\cdot(A^{\mathsf{T}}\mathbf{y})$ for all $\mathbf{x}, \mathbf{y}$. So when $A = A^{\mathsf{T}}$, we get the symmetric relationship
$$(A\mathbf{x})\cdot\mathbf{y} = \mathbf{x}\cdot(A\mathbf{y}) \qquad\text{for all } \mathbf{x}, \mathbf{y}.$$
This little identity — "$A$ can move from one slot of the dot product to the other for free" — is the engine behind every proof in this chapter. It is sometimes called the self-adjoint property, and it is the abstract heart of what "symmetric" means. A symmetric matrix interacts with the dot product (and therefore with lengths and angles) in a perfectly balanced way, and from that balance flow the real eigenvalues, the orthogonal eigenvectors, and the whole theorem. We will use this identity twice in the proofs below; keep it in your pocket.
27.2 What exactly does the Spectral Theorem say?
Now the precise statement, with its conditions stated front and center, because the conditions are the theorem.
The Spectral Theorem (real symmetric case). Let $A$ be a real $n\times n$ matrix that is symmetric, $A = A^{\mathsf{T}}$. Then: 1. All eigenvalues of $A$ are real. 2. $A$ has an orthonormal basis of eigenvectors — $n$ mutually perpendicular unit eigenvectors spanning $\mathbb{R}^n$. 3. $A$ is orthogonally diagonalizable: there is an orthogonal matrix $Q$ (its columns the orthonormal eigenvectors) and a real diagonal matrix $\Lambda$ (its entries the eigenvalues) with $$A = Q\Lambda Q^{\mathsf{T}}, \qquad Q^{\mathsf{T}}Q = I.$$
Conversely, any matrix of the form $Q\Lambda Q^{\mathsf{T}}$ with $Q$ orthogonal and $\Lambda$ real diagonal is symmetric. So "real symmetric" and "real orthogonally diagonalizable" are the same class of matrices.
Several things in this statement deserve emphasis. First, the word "always": every symmetric matrix satisfies all three parts, with no exceptions, no fine print about distinct eigenvalues, no defective cases. This is dramatically stronger than the diagonalization theorem of Chapter 25, which only worked when a matrix happened to have enough independent eigenvectors and could fail (the defective matrices). Symmetry is the hypothesis that guarantees diagonalizability — and not just diagonalizability, but the best possible kind, by an orthogonal matrix.
Second, notice that $Q^{-1} = Q^{\mathsf{T}}$, so we can write the factorization with a transpose instead of an inverse. Compared with the general $A = PDP^{-1}$ of Chapter 25, the symmetric case replaces the expensive, possibly unstable $P^{-1}$ (requiring Gaussian elimination, Chapter 9) with a free transpose. This is the same windfall we celebrated for orthogonal matrices in Chapter 21: inverting an orthogonal matrix costs nothing. The Spectral Theorem hands you a diagonalization in which the change of basis is a rotation, so undoing it is trivial.
Third, the "converse" half is easy and worth checking now, because it shows the symmetry is genuinely forced. If $A = Q\Lambda Q^{\mathsf{T}}$, then transposing gives $A^{\mathsf{T}} = (Q\Lambda Q^{\mathsf{T}})^{\mathsf{T}} = (Q^{\mathsf{T}})^{\mathsf{T}}\Lambda^{\mathsf{T}}Q^{\mathsf{T}} = Q\Lambda Q^{\mathsf{T}} = A$, using $(ABC)^{\mathsf{T}} = C^{\mathsf{T}}B^{\mathsf{T}}A^{\mathsf{T}}$ from Chapter 8 and the fact that a diagonal matrix is its own transpose, $\Lambda^{\mathsf{T}} = \Lambda$. So anything built as "rotate, scale by real numbers, rotate back" is automatically symmetric. The two descriptions are airtight equivalents.
Warning
— The Spectral Theorem requires the matrix to be symmetric ($A = A^{\mathsf{T}}$) over the reals, or Hermitian ($A = A^{*}$) over the complexes. It is false for a general matrix. A non-symmetric matrix may have complex eigenvalues (the rotation of Chapter 26), may fail to be diagonalizable at all (a defective matrix, Chapter 25), and even when it is diagonalizable its eigenvectors are generally not orthogonal. Never say "matrices are diagonalizable" or "every matrix has orthogonal eigenvectors" — both are wrong. We will see concrete counterexamples in §27.7. State the symmetry condition every single time you invoke this theorem; it is the whole point.
27.3 How do you orthogonally diagonalize a matrix? (a first worked example)
Let us make all of this concrete with the friendliest symmetric matrix there is, and carry it as the running example of the chapter:
$$A = \begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix}.$$
It is symmetric — the off-diagonal entries match, $a_{12} = a_{21} = 1$ — so the Spectral Theorem applies, and we expect real eigenvalues and perpendicular eigenvectors. Let us find them by hand using the characteristic polynomial of Chapter 24.
The eigenvalues solve $\det(A - \lambda I) = 0$:
$$\det\begin{bmatrix} 2-\lambda & 1 \\ 1 & 2-\lambda \end{bmatrix} = (2-\lambda)^2 - 1 = \lambda^2 - 4\lambda + 3 = (\lambda - 1)(\lambda - 3) = 0.$$
So the eigenvalues are $\lambda_1 = 3$ and $\lambda_2 = 1$ — both real, exactly as the theorem promised (part 1). Now the eigenvectors. For $\lambda_1 = 3$, solve $(A - 3I)\mathbf{v} = \mathbf{0}$:
$$\begin{bmatrix} -1 & 1 \\ 1 & -1 \end{bmatrix}\begin{bmatrix} v_1 \\ v_2 \end{bmatrix} = \mathbf{0} \;\Longrightarrow\; v_1 = v_2 \;\Longrightarrow\; \mathbf{v}_1 = \begin{bmatrix} 1 \\ 1 \end{bmatrix}.$$
For $\lambda_2 = 1$, solve $(A - I)\mathbf{v} = \mathbf{0}$:
$$\begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix}\begin{bmatrix} v_1 \\ v_2 \end{bmatrix} = \mathbf{0} \;\Longrightarrow\; v_1 = -v_2 \;\Longrightarrow\; \mathbf{v}_2 = \begin{bmatrix} 1 \\ -1 \end{bmatrix}.$$
Now look at what the theorem predicted (part 2): the eigenvectors should be orthogonal. Check their dot product: $\mathbf{v}_1\cdot\mathbf{v}_2 = (1)(1) + (1)(-1) = 0$. They are perpendicular — automatically, without our doing anything to arrange it. This is no accident; it is part 2 of the Spectral Theorem in action, and we will prove in §27.5 that distinct eigenvalues of any symmetric matrix force orthogonal eigenvectors.
To build the orthogonal $Q$, we only need to normalize these perpendicular eigenvectors to unit length. Each has length $\sqrt{1^2 + 1^2} = \sqrt 2$, so we divide by $\sqrt 2$:
$$\mathbf{q}_1 = \frac{1}{\sqrt 2}\begin{bmatrix} 1 \\ 1 \end{bmatrix}, \qquad \mathbf{q}_2 = \frac{1}{\sqrt 2}\begin{bmatrix} 1 \\ -1 \end{bmatrix}, \qquad Q = \frac{1}{\sqrt 2}\begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix}, \qquad \Lambda = \begin{bmatrix} 3 & 0 \\ 0 & 1 \end{bmatrix}.$$
That $Q$ is orthogonal — its columns are perpendicular unit vectors, so $Q^{\mathsf{T}}Q = I$ (you can check the three dot products as in Chapter 21). And the Spectral Theorem claims $A = Q\Lambda Q^{\mathsf{T}}$. Let us verify the reconstruction by hand:
$$Q\Lambda Q^{\mathsf{T}} = \frac{1}{\sqrt 2}\begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix}\begin{bmatrix} 3 & 0 \\ 0 & 1 \end{bmatrix}\frac{1}{\sqrt 2}\begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} = \frac{1}{2}\begin{bmatrix} 3 & 1 \\ 3 & -1 \end{bmatrix}\begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} = \frac{1}{2}\begin{bmatrix} 4 & 2 \\ 2 & 4 \end{bmatrix} = \begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix}.$$
Exactly $A$. We have orthogonally diagonalized our first symmetric matrix: rotate into the eigen-axes (the $\pm 45°$ diagonals), stretch by 3 along one and by 1 along the other, rotate back. Notice the recipe is easier than ordinary diagonalization, because the final step is a transpose, not a matrix inverse — there is no elimination to do.
Common Pitfall — When you build $Q$ for a symmetric matrix, do not forget to normalize the eigenvectors to unit length. Ordinary diagonalization (Chapter 25) lets you use any eigenvectors as the columns of $P$, scaled however you like, because $P^{-1}$ adjusts automatically. But orthogonal diagonalization needs $Q^{\mathsf{T}}Q = I$, which requires orthonormal columns — perpendicular and unit length. The perpendicularity comes free from the Spectral Theorem; the unit length you must impose by dividing each eigenvector by its norm. Skip the normalization and you get $A = P\Lambda P^{-1}$ with a non-orthogonal $P$, losing the entire benefit ($P^{-1} \ne P^{\mathsf{T}}$).
27.4 Why are the eigenvalues of a symmetric matrix always real?
Here is the first of the chapter's two signature proofs, and it is genuinely surprising. Chapter 24 warned us that the characteristic polynomial of a real matrix can have complex roots — the rotation matrix of Chapter 26 had eigenvalues $e^{\pm i\theta}$, with no real eigenvalue at all. Yet for a symmetric matrix, the complex numbers never appear: every eigenvalue is real. Why should symmetry rule out complex eigenvalues? Let us earn the answer with the full proof treatment.
1. Why we care. Real eigenvalues are what make the geometric picture of §27.1 possible: a real eigenvalue means a genuine stretch factor along a real direction in space. If a symmetric matrix could have a complex eigenvalue, it would have a hidden rotational component (Chapter 26), and the "pure stretch along perpendicular axes" picture would collapse. Real eigenvalues are also non-negotiable in applications: the principal stresses in a beam, the variances along principal components, and — most strikingly — the possible outcomes of a quantum measurement are all eigenvalues of symmetric (or Hermitian) matrices, and every one of those must be a real number. The theorem is what guarantees it.
2. Key idea. To even talk about whether an eigenvalue is real, we have to allow the possibility that it is complex and then derive a contradiction. So we temporarily work over $\mathbb{C}$, using the conjugate transpose $A^{*}$ (Chapter 21). The crux is a one-line computation of the quantity $\mathbf{v}^{*}A\mathbf{v}$ done two ways: symmetry forces it to equal its own conjugate, and a number that equals its own conjugate must be real.
3. Proof. Let $A$ be real symmetric, so $A = A^{\mathsf{T}}$, and because its entries are real, $\overline{A} = A$, which together give $A^{*} = \overline{A}^{\mathsf{T}} = A^{\mathsf{T}} = A$. (A real symmetric matrix is its own conjugate transpose — it is Hermitian; hold that thought for §27.8.) Suppose $\lambda$ is an eigenvalue with a (possibly complex) eigenvector $\mathbf{v} \ne \mathbf{0}$, so $A\mathbf{v} = \lambda\mathbf{v}$. Consider the scalar
$$\alpha = \mathbf{v}^{*}A\mathbf{v}.$$
Compute it one way using the eigen-equation: $\alpha = \mathbf{v}^{*}(A\mathbf{v}) = \mathbf{v}^{*}(\lambda\mathbf{v}) = \lambda\,(\mathbf{v}^{*}\mathbf{v}) = \lambda\lVert\mathbf{v}\rVert^2$. Now compute the conjugate $\overline{\alpha} = \alpha^{*}$. Since $\alpha$ is a $1\times 1$ scalar, $\alpha^{*} = (\mathbf{v}^{*}A\mathbf{v})^{*} = \mathbf{v}^{*}A^{*}\mathbf{v}$ (taking the conjugate transpose of a product reverses the order, $(\mathbf{v}^{*}A\mathbf{v})^{*} = \mathbf{v}^{*}A^{*}(\mathbf{v}^{*})^{*} = \mathbf{v}^{*}A^{*}\mathbf{v}$). But $A^{*} = A$, so
$$\overline{\alpha} = \mathbf{v}^{*}A\mathbf{v} = \alpha.$$
The number $\alpha$ equals its own complex conjugate, so $\alpha$ is real. Therefore $\lambda\lVert\mathbf{v}\rVert^2$ is real. And $\lVert\mathbf{v}\rVert^2 = \mathbf{v}^{*}\mathbf{v} = \sum_i \lvert v_i\rvert^2$ is a positive real number (the eigenvector is nonzero). A real number divided by a positive real number is real, so $\lambda = \alpha / \lVert\mathbf{v}\rVert^2$ is real. $\blacksquare$
4. What this means. The single fact $A^{*} = A$ — that the matrix is its own conjugate transpose — squeezed the complex possibility out of the eigenvalue. Geometrically, this is the algebraic guarantee behind §27.1's picture: because every eigenvalue is a real stretch factor, a symmetric matrix has no rotational component anywhere, and its action decomposes into honest stretches along real directions. Notice that the proof actually used only $A^{*} = A$, not $A^{\mathsf{T}} = A$ specifically — which is exactly why the same argument will give real eigenvalues for complex Hermitian matrices in §27.8, the matrices of quantum mechanics.
Math-Major Sidebar — The proof quietly used that a complex eigenvalue exists in the first place. This is guaranteed by the Fundamental Theorem of Algebra: the characteristic polynomial $\det(A - \lambda I)$ has degree $n$ and therefore has $n$ roots in $\mathbb{C}$, counted with multiplicity (Chapter 24). The Spectral Theorem then says, for symmetric $A$, that all $n$ of those roots happen to land on the real line. A slicker, coordinate-free proof avoids characteristic polynomials entirely: it maximizes the Rayleigh quotient $R(\mathbf{x}) = \dfrac{\mathbf{x}^{\mathsf{T}}A\mathbf{x}}{\mathbf{x}^{\mathsf{T}}\mathbf{x}}$ over the unit sphere, shows the maximizer is an eigenvector with the largest eigenvalue (a real number, since the quotient is real), then restricts to the orthogonal complement and repeats — building the orthonormal eigenbasis one direction at a time. That variational view is the one that generalizes to infinite-dimensional Hilbert spaces (Chapter 34) and is the rigorous foundation of PCA's "direction of maximum variance" (Chapter 32).
27.5 Why are eigenvectors for distinct eigenvalues orthogonal?
The second signature proof explains the most beautiful feature of symmetric matrices: their eigenvectors are not just independent but perpendicular. We saw it happen in §27.3 — the eigenvectors $(1,1)$ and $(1,-1)$ came out orthogonal on their own. Now we prove it always happens, for any symmetric matrix, whenever the eigenvalues differ.
1. Why we care. Orthogonality of the eigenvectors is precisely what upgrades ordinary diagonalization ($A = PDP^{-1}$) to orthogonal diagonalization ($A = Q\Lambda Q^{\mathsf{T}}$). If the eigenvectors are perpendicular, we can normalize them into an orthonormal basis, the change-of-basis matrix becomes a rotation $Q$, and its inverse becomes a free transpose. Every computational and conceptual advantage of the Spectral Theorem — the cheap inversion, the stable numerics, the clean "rotate–stretch–rotate-back" geometry, the spectral decomposition of §27.6 — rests on this orthogonality. Without it, we would have eigenvectors but no rotation.
2. Key idea. Use the self-adjoint identity from §27.1.1, $(A\mathbf{x})\cdot\mathbf{y} = \mathbf{x}\cdot(A\mathbf{y})$, on two eigenvectors. Evaluating it makes the two different eigenvalues appear on the two sides; since they differ, the dot product they multiply is forced to be zero.
3. Proof. Let $A$ be real symmetric, and let $\mathbf{u}, \mathbf{w}$ be eigenvectors for distinct eigenvalues $\lambda \ne \mu$:
$$A\mathbf{u} = \lambda\mathbf{u}, \qquad A\mathbf{w} = \mu\mathbf{w}.$$
Compute the scalar $(A\mathbf{u})\cdot\mathbf{w}$ in two ways. First, substitute the eigen-equation for $\mathbf{u}$:
$$(A\mathbf{u})\cdot\mathbf{w} = (\lambda\mathbf{u})\cdot\mathbf{w} = \lambda\,(\mathbf{u}\cdot\mathbf{w}).$$
Second, use the symmetry of $A$ to slide it across the dot product. Because $A = A^{\mathsf{T}}$, the identity $(A\mathbf{u})\cdot\mathbf{w} = \mathbf{u}\cdot(A^{\mathsf{T}}\mathbf{w}) = \mathbf{u}\cdot(A\mathbf{w})$ holds, and now substitute the eigen-equation for $\mathbf{w}$:
$$(A\mathbf{u})\cdot\mathbf{w} = \mathbf{u}\cdot(A\mathbf{w}) = \mathbf{u}\cdot(\mu\mathbf{w}) = \mu\,(\mathbf{u}\cdot\mathbf{w}).$$
We have computed the same number two ways, so the results are equal:
$$\lambda\,(\mathbf{u}\cdot\mathbf{w}) = \mu\,(\mathbf{u}\cdot\mathbf{w}) \;\Longrightarrow\; (\lambda - \mu)\,(\mathbf{u}\cdot\mathbf{w}) = 0.$$
By hypothesis the eigenvalues are distinct, so $\lambda - \mu \ne 0$. The only way the product can vanish is if the other factor is zero:
$$\mathbf{u}\cdot\mathbf{w} = 0.$$
The eigenvectors are orthogonal. $\blacksquare$
4. What this means. The symmetry of $A$ let us move it from one side of the dot product to the other, and that single move made the two distinct eigenvalues collide. Their difference, being nonzero, had nowhere to hide except by forcing the eigenvectors perpendicular. Geometrically: along each eigen-direction the symmetric matrix stretches by a different factor, and a transformation that stretches by different amounts along non-perpendicular directions would have to shear — but symmetry forbids shear (§27.1), so the directions are pinned at right angles. The algebra ("$\lambda \ne \mu$ forces $\mathbf{u}\cdot\mathbf{w} = 0$") and the geometry ("different stretch factors live on perpendicular axes") are the same fact.
Check Your Understanding — A symmetric matrix has eigenvalues $5, 2, -3$ with eigenvectors $\mathbf{u}_1, \mathbf{u}_2, \mathbf{u}_3$. Without computing anything, what is $\mathbf{u}_1 \cdot \mathbf{u}_3$? What about the angle between $\mathbf{u}_2$ and $\mathbf{u}_3$?
Answer
Since the eigenvalues are all distinct, §27.5 guarantees the eigenvectors are mutually orthogonal. So $\mathbf{u}_1 \cdot \mathbf{u}_3 = 0$, and the angle between $\mathbf{u}_2$ and $\mathbf{u}_3$ is $90°$ — for free, with no calculation. (This is the payoff of symmetry: orthogonality is automatic for distinct eigenvalues, which is what lets us build the rotation $Q$.)
27.5.1 What about repeated eigenvalues?
The proof in §27.5 handled distinct eigenvalues. What if an eigenvalue is repeated — say $\lambda$ appears twice? Then the two eigenvectors share an eigenvalue, $\lambda - \mu = 0$, and the argument no longer forces them orthogonal. Does the Spectral Theorem break?
It does not, and here is why. When an eigenvalue $\lambda$ is repeated with multiplicity $k$, a symmetric matrix always has a full $k$-dimensional eigenspace for it (its geometric multiplicity equals its algebraic multiplicity — symmetric matrices are never defective, unlike the general case of Chapter 25). Within that $k$-dimensional eigenspace, every nonzero vector is an eigenvector for $\lambda$, so we are free to choose an orthonormal basis of the eigenspace — using Gram–Schmidt from Chapter 20 if we like. Eigenvectors from different eigenspaces are orthogonal by §27.5; eigenvectors within one eigenspace we orthogonalize by hand. Either way we end up with a full orthonormal basis of eigenvectors. The cleanest illustration is the identity matrix $I$: every eigenvalue is $1$ (maximally repeated), every vector is an eigenvector, and we simply pick any orthonormal basis of $\mathbb{R}^n$ — the standard basis will do. The theorem holds; repeated eigenvalues give us freedom in choosing the eigen-axes, not an obstruction.
Math-Major Sidebar — The claim that a symmetric matrix is never defective — that geometric multiplicity always equals algebraic multiplicity — is itself a consequence of the Spectral Theorem and is false for general matrices (the shear of §27.7 is defective). The cleanest rigorous route proves the whole theorem by induction on $n$: extract one unit eigenvector $\mathbf{q}_1$ for a real eigenvalue (which exists by §27.4), restrict $A$ to the orthogonal complement $\mathbf{q}_1^{\perp}$, observe that symmetry makes this complement invariant under $A$ (if $\mathbf{x}\perp\mathbf{q}_1$ then $A\mathbf{x}\perp\mathbf{q}_1$, because $(A\mathbf{x})\cdot\mathbf{q}_1 = \mathbf{x}\cdot(A\mathbf{q}_1) = \lambda_1(\mathbf{x}\cdot\mathbf{q}_1) = 0$), and apply the inductive hypothesis to the $(n-1)\times(n-1)$ symmetric restriction. The orthogonal-complement-is-invariant step is precisely where symmetry is indispensable; it fails for non-symmetric matrices, which is the deep reason they need not be orthogonally diagonalizable.
27.6 What is the spectral decomposition? (a sum of rank-one projectors)
There is a second, even more illuminating way to write the Spectral Theorem, and it reveals what $A$ "really is" with unusual clarity. Instead of the factored form $A = Q\Lambda Q^{\mathsf{T}}$, we expand the product into a sum. The result, the spectral decomposition, is
$$\boxed{\,A = \sum_{i=1}^{n} \lambda_i\, \mathbf{q}_i\mathbf{q}_i^{\mathsf{T}}\, } \;=\; \lambda_1\,\mathbf{q}_1\mathbf{q}_1^{\mathsf{T}} + \lambda_2\,\mathbf{q}_2\mathbf{q}_2^{\mathsf{T}} + \cdots + \lambda_n\,\mathbf{q}_n\mathbf{q}_n^{\mathsf{T}},$$
where the $\mathbf{q}_i$ are the orthonormal eigenvectors and the $\lambda_i$ their eigenvalues. Let us first understand each piece, then verify the formula, then see why it is so useful.
27.6.1 What is a rank-one projector $\mathbf{q}\mathbf{q}^{\mathsf{T}}$?
Each term contains the object $\mathbf{q}_i\mathbf{q}_i^{\mathsf{T}}$ — a column vector times a row vector. By the outer-product rule of Chapter 8, this is an $n\times n$ matrix (not a scalar; that would be $\mathbf{q}_i^{\mathsf{T}}\mathbf{q}_i = 1$). And it is a familiar one: when $\mathbf{q}$ is a unit vector, $P_i = \mathbf{q}_i\mathbf{q}_i^{\mathsf{T}}$ is exactly the orthogonal projection matrix onto the line spanned by $\mathbf{q}_i$, the projection we built in Chapter 19. Applying it to any vector $\mathbf{x}$ gives
$$P_i\mathbf{x} = \mathbf{q}_i\mathbf{q}_i^{\mathsf{T}}\mathbf{x} = \mathbf{q}_i\,(\mathbf{q}_i^{\mathsf{T}}\mathbf{x}) = (\mathbf{q}_i\cdot\mathbf{x})\,\mathbf{q}_i,$$
which is "the component of $\mathbf{x}$ along $\mathbf{q}_i$." It is a rank-one matrix (its column space is the single line through $\mathbf{q}_i$), it is symmetric ($P_i^{\mathsf{T}} = P_i$), and it is idempotent ($P_i^2 = P_i$ — projecting twice is the same as projecting once). These projectors interlock perfectly: because the eigenvectors are orthonormal, $P_i P_j = 0$ for $i \ne j$ (projecting onto one axis then a perpendicular one annihilates everything), and they sum to the identity, $\sum_i P_i = I$ (every vector is the sum of its components along a complete set of perpendicular axes — the orthonormal-basis expansion of Chapter 20). The set $\{P_1, \dots, P_n\}$ is called a resolution of the identity.
27.6.2 Reading the decomposition
With the projectors understood, the spectral decomposition tells a vivid story. The formula $A = \sum_i \lambda_i P_i$ says: to apply $A$ to a vector, split the vector into its components along the orthonormal eigen-axes, scale the $i$-th component by $\lambda_i$, and add the pieces back up. That is the entire action of the matrix, laid out as a recipe. Apply $A$ to $\mathbf{x}$:
$$A\mathbf{x} = \sum_i \lambda_i\,\mathbf{q}_i\mathbf{q}_i^{\mathsf{T}}\mathbf{x} = \sum_i \lambda_i (\mathbf{q}_i\cdot\mathbf{x})\,\mathbf{q}_i.$$
Decompose $\mathbf{x}$ along the eigen-axes, stretch each coordinate by its eigenvalue, reassemble. This is the same "pure stretch along perpendicular axes" of §27.1, now written as an explicit sum. A symmetric matrix is nothing but a weighted sum of perpendicular projections, the weights being the eigenvalues.
Let us verify the decomposition on the running example $A = \begin{psmallmatrix}2 & 1\\ 1 & 2\end{psmallmatrix}$, with $\lambda_1 = 3, \mathbf{q}_1 = \tfrac{1}{\sqrt2}(1,1)$ and $\lambda_2 = 1, \mathbf{q}_2 = \tfrac{1}{\sqrt2}(1,-1)$. The two rank-one projectors are
$$\mathbf{q}_1\mathbf{q}_1^{\mathsf{T}} = \frac12\begin{bmatrix} 1 \\ 1 \end{bmatrix}\begin{bmatrix} 1 & 1 \end{bmatrix} = \frac12\begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix}, \qquad \mathbf{q}_2\mathbf{q}_2^{\mathsf{T}} = \frac12\begin{bmatrix} 1 \\ -1 \end{bmatrix}\begin{bmatrix} 1 & -1 \end{bmatrix} = \frac12\begin{bmatrix} 1 & -1 \\ -1 & 1 \end{bmatrix}.$$
Now form the weighted sum $\lambda_1\mathbf{q}_1\mathbf{q}_1^{\mathsf{T}} + \lambda_2\mathbf{q}_2\mathbf{q}_2^{\mathsf{T}}$:
$$3\cdot\frac12\begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix} + 1\cdot\frac12\begin{bmatrix} 1 & -1 \\ -1 & 1 \end{bmatrix} = \frac12\begin{bmatrix} 3 & 3 \\ 3 & 3 \end{bmatrix} + \frac12\begin{bmatrix} 1 & -1 \\ -1 & 1 \end{bmatrix} = \frac12\begin{bmatrix} 4 & 2 \\ 2 & 4 \end{bmatrix} = \begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix} = A.$$
It reconstructs $A$ exactly. The matrix is the projector onto the $45°$ line weighted by 3, plus the projector onto the $-45°$ line weighted by 1. That is the most honest description of what this matrix does: stretch by 3 along $(1,1)$, leave $(1,-1)$ alone.
Geometric Intuition — The spectral decomposition is the Spectral Theorem's deepest geometric statement: a symmetric matrix is a weighted sum of perpendicular projections. Each eigenvalue $\lambda_i$ is the weight, and each projector $\mathbf{q}_i\mathbf{q}_i^{\mathsf{T}}$ picks out one perpendicular axis. Big eigenvalues mark the directions the matrix stretches hard; small (or zero, or negative) eigenvalues mark the directions it leaves alone (or flips). This is why dropping the small-$\lambda$ terms gives the best low-rank approximation of $A$ — the foundation of PCA (Chapter 32) and, generalized to non-symmetric matrices, the SVD-based image compression of Chapter 31.
27.6.3 numpy verification with eigh
Now confirm everything in code, using the function numpy provides specifically for symmetric (and Hermitian) matrices: np.linalg.eigh. (Recall the indexing convention from earlier chapters: mathematics writes $\mathbf{q}_1, \lambda_1$ one-indexed, while numpy's Q[:,0], w[0] are zero-indexed — the same objects, shifted labels.)
# Orthogonal diagonalization of a symmetric matrix via np.linalg.eigh.
import numpy as np
A = np.array([[2.0, 1.0],
[1.0, 2.0]]) # symmetric: A == A.T
w, Q = np.linalg.eigh(A) # eigh: for symmetric/Hermitian A
print("symmetric? ", np.allclose(A, A.T))
print("eigenvalues w =", w) # ascending order
print("Q =\n", np.round(Q, 6))
print("Q^T Q =\n", np.round(Q.T @ Q, 12)) # orthonormal columns
print("Q Λ Q^T =\n", np.round(Q @ np.diag(w) @ Q.T, 10)) # reconstructs A
symmetric? True
eigenvalues w = [1. 3.]
Q =
[[-0.707107 0.707107]
[ 0.707107 0.707107]]
Q^T Q =
[[1. 0.]
[0. 1.]]
Q Λ Q^T =
[[2. 1.]
[1. 2.]]
Three confirmations: the eigenvalues are real ([1, 3]), $Q$ has orthonormal columns ($Q^{\mathsf{T}}Q = I$ to twelve decimals), and $Q\Lambda Q^{\mathsf{T}}$ reconstructs $A$. Now the rank-one sum:
# The spectral decomposition: A = Σ λ_i q_i q_i^T (sum of rank-1 projectors).
recon = np.zeros_like(A)
for i in range(A.shape[0]):
q_i = Q[:, i] # the i-th orthonormal eigenvector
P_i = np.outer(q_i, q_i) # rank-1 projector q_i q_i^T
recon += w[i] * P_i
print(f"λ={w[i]:.0f}, q q^T =\n{np.round(P_i, 4)}")
print("Σ λ_i q_i q_i^T =\n", np.round(recon, 8)) # equals A
λ=1, q q^T =
[[ 0.5 -0.5]
[-0.5 0.5]]
λ=3, q q^T =
[[0.5 0.5]
[0.5 0.5]]
Σ λ_i q_i q_i^T =
[[2. 1.]
[1. 2.]]
The sum of the two weighted rank-one projectors reproduces $A$ to machine precision. The hand computation of §27.6.2 and the code agree completely.
Computational Note — Always use
np.linalg.eigh(the "h" is for Hermitian) for symmetric or Hermitian matrices, not the generalnp.linalg.eig. Three reasons. First,eighexploits symmetry to run faster and far more accurately. Second,eighis guaranteed to return real eigenvalues and orthonormal eigenvectors — exactly the Spectral Theorem's promise — whereaseigmay return eigenvalues with tiny spurious imaginary parts like3+2e-16jfrom rounding, and eigenvectors that are not quite orthonormal. Third,eighreturns the eigenvalues in ascending order (here[1, 3]), which is convenient and predictable;eigreturns them in no particular order. Note that the sign of each eigenvector column is arbitrary —eighreturned $\mathbf{q}$ for $\lambda=1$ as $(-0.707, 0.707)$, the negative of our hand choice $(0.707, -0.707)$ — which is fine, since $-\mathbf{q}$ is just as valid an eigenvector and the projector $\mathbf{q}\mathbf{q}^{\mathsf{T}} = (-\mathbf{q})(-\mathbf{q})^{\mathsf{T}}$ is identical either way.
27.7 What can you DO with the spectral decomposition?
The spectral decomposition is not just a pretty way to write a symmetric matrix — it is a computational superpower, because once you know the eigenvalues and the orthonormal eigenvectors, a whole family of hard matrix operations become trivial. The reason is simple and worth stating as a slogan: in eigen-coordinates a symmetric matrix is just a list of numbers (the eigenvalues), so anything you can do to numbers you can do to the matrix. Let us see three payoffs, each of which we will lean on later.
27.7.1 Powers of a symmetric matrix are free
Suppose you need $A^k$ for a symmetric $A$ — perhaps to run a dynamical system $\mathbf{x}_{k} = A^k\mathbf{x}_0$ forward many steps, as in the Markov chains and population models of Chapter 25. Multiplying $A$ by itself $k$ times costs $k$ matrix multiplications. But the spectral form collapses the work. Because $Q^{\mathsf{T}}Q = I$, the middle factors telescope:
$$A^2 = (Q\Lambda Q^{\mathsf{T}})(Q\Lambda Q^{\mathsf{T}}) = Q\Lambda(Q^{\mathsf{T}}Q)\Lambda Q^{\mathsf{T}} = Q\Lambda^2 Q^{\mathsf{T}},$$
and the same cancellation, repeated, gives the general rule
$$A^k = Q\Lambda^k Q^{\mathsf{T}}, \qquad \Lambda^k = \operatorname{diag}(\lambda_1^k, \dots, \lambda_n^k).$$
Raising a diagonal matrix to the $k$-th power means raising each eigenvalue to the $k$-th power — trivial. So computing $A^k$ reduces to raising $n$ scalars to a power, no matter how large $k$ is. Equivalently, in spectral-sum form, $A^k = \sum_i \lambda_i^k\,\mathbf{q}_i\mathbf{q}_i^{\mathsf{T}}$: each perpendicular projection is simply re-weighted by $\lambda_i^k$. For our running matrix $A = \begin{psmallmatrix}2&1\\1&2\end{psmallmatrix}$ with eigenvalues $3$ and $1$, the cube is
$$A^3 = 3^3\,\mathbf{q}_1\mathbf{q}_1^{\mathsf{T}} + 1^3\,\mathbf{q}_2\mathbf{q}_2^{\mathsf{T}} = \frac{27}{2}\begin{bmatrix}1&1\\1&1\end{bmatrix} + \frac{1}{2}\begin{bmatrix}1&-1\\-1&1\end{bmatrix} = \begin{bmatrix}14&13\\13&14\end{bmatrix},$$
which you can confirm equals $A\cdot A\cdot A$ directly. Notice what the eigenvalues predict about the long run: the $\lambda = 3$ direction grows like $3^k$ while the $\lambda = 1$ direction stays put, so for large $k$ the matrix $A^k$ is dominated by its largest-eigenvalue projector — the entries grow like $3^k/2$ and the direction $(1,1)$ takes over. This is exactly the dominant-eigenvector phenomenon that powers PageRank in Chapter 29, here made precise by the spectral decomposition.
27.7.2 Square roots, exponentials, and functions of a matrix
The same trick does far more than powers. Apply any scalar function $f$ to a symmetric matrix by applying it to the eigenvalues:
$$f(A) = Q\,f(\Lambda)\,Q^{\mathsf{T}} = \sum_i f(\lambda_i)\,\mathbf{q}_i\mathbf{q}_i^{\mathsf{T}}, \qquad f(\Lambda) = \operatorname{diag}\bigl(f(\lambda_1),\dots,f(\lambda_n)\bigr).$$
This is the definition of a matrix function via the spectral decomposition, and it is the cleanest way to make sense of objects that would otherwise be mysterious. Two examples matter throughout the rest of the book. The matrix square root: if every eigenvalue is non-negative, $\sqrt{A} = Q\,\operatorname{diag}(\sqrt{\lambda_1},\dots,\sqrt{\lambda_n})\,Q^{\mathsf{T}}$ is a symmetric matrix whose square is $A$. For our running example, $\sqrt{A} = \begin{psmallmatrix}1.366 & 0.366\\ 0.366 & 1.366\end{psmallmatrix}$ (taking $\sqrt 3 \approx 1.732$ and $\sqrt 1 = 1$ along the eigen-axes), and squaring it returns $\begin{psmallmatrix}2&1\\1&2\end{psmallmatrix}$ exactly. The matrix square root is what makes "whitening" a covariance matrix possible (Chapter 32) and underlies the geometry of the SVD (Chapter 30). The matrix exponential $e^{A} = Q\,\operatorname{diag}(e^{\lambda_1},\dots,e^{\lambda_n})\,Q^{\mathsf{T}}$ is the tool that solves systems of differential equations $\mathbf{x}'(t) = A\mathbf{x}(t)$ in Chapter 37; for symmetric $A$ the eigenvalues are real, so $e^{\lambda_i t}$ is a clean growth or decay, and the spectral decomposition tells you the system's behavior is just independent exponentials along the perpendicular eigen-axes.
# Functions of a symmetric matrix act on its eigenvalues: f(A) = Q f(Λ) Q^T.
import numpy as np
A = np.array([[2.0, 1.0], [1.0, 2.0]])
w, Q = np.linalg.eigh(A) # w = [1, 3]
A3 = Q @ np.diag(w**3) @ Q.T # A cubed
sqrtA = Q @ np.diag(np.sqrt(w)) @ Q.T # symmetric square root
expA = Q @ np.diag(np.exp(w)) @ Q.T # matrix exponential
print("A^3 =\n", np.round(A3, 6))
print("sqrt(A) =\n", np.round(sqrtA, 6))
print("sqrt(A)^2 =\n", np.round(sqrtA @ sqrtA, 6)) # back to A
print("e^A =\n", np.round(expA, 6))
A^3 =
[[14. 13.]
[13. 14.]]
sqrt(A) =
[[1.366025 0.366025]
[0.366025 1.366025]]
sqrt(A)^2 =
[[2. 1.]
[1. 2.]]
e^A =
[[11.401909 8.683628]
[ 8.683628 11.401909]]
The cube matches §27.7.1, the square root squares back to $A$, and the exponential (whose entries come from $e^1 \approx 2.718$ and $e^3 \approx 20.09$ spread across the eigen-axes) agrees with scipy.linalg.expm. Every one of these would be painful to compute directly; the spectral decomposition reduces each to "do it to the eigenvalues."
Real-World Application — Graph diffusion and network analysis. The adjacency matrix of an undirected graph is symmetric, as is the closely related graph Laplacian. Applying a function of the Laplacian — most commonly the matrix exponential $e^{-tL}$, the heat kernel — models how information, influence, or heat diffuses across a network over time, and is computed exactly by the recipe above: exponentiate the eigenvalues and reassemble. Spectral clustering, which partitions a social network or an image into communities, works by reading off the eigenvectors of the symmetric Laplacian (the small-eigenvalue ones reveal the cluster structure). These are direct, large-scale uses of the Spectral Theorem in data science — no physics in sight.
27.7.3 Trace and determinant are the eigenvalues' sum and product
The spectral decomposition also makes two familiar scalars transparent. Because $A = Q\Lambda Q^{\mathsf{T}}$ is similar to the diagonal matrix $\Lambda$ (Chapter 25 showed similar matrices share trace and determinant), and similarity preserves both, we get the clean identities
$$\operatorname{tr}(A) = \sum_{i=1}^{n}\lambda_i, \qquad \det(A) = \prod_{i=1}^{n}\lambda_i.$$
The trace — the sum of the diagonal entries — equals the sum of the eigenvalues, and the determinant equals their product. For $A = \begin{psmallmatrix}2&1\\1&2\end{psmallmatrix}$ this reads $\operatorname{tr}(A) = 2 + 2 = 4 = 3 + 1$ and $\det(A) = 4 - 1 = 3 = 3\cdot 1$ — both check. These hold for any matrix's eigenvalues, but for symmetric matrices they are especially useful because the eigenvalues are guaranteed real, so the trace is a real sum of real stretch factors and the determinant is a real product. We will use the determinant-as-product fact in Chapter 28 to test positive definiteness (all eigenvalues positive $\Rightarrow$ positive determinant), and the trace-as-sum fact in Chapter 32, where the total variance of a dataset is the trace of its covariance matrix — equivalently, the sum of the variances along the principal axes.
Check Your Understanding — A symmetric $3\times 3$ matrix has eigenvalues $4, 4, 1$. What are its trace, its determinant, and the entries of $A^2$'s eigenvalues? Is $A$ invertible?
Answer
Trace $= 4 + 4 + 1 = 9$; determinant $= 4\cdot 4\cdot 1 = 16$. The eigenvalues of $A^2$ are the squares, $16, 16, 1$. Since the determinant is $16 \ne 0$ (equivalently, no eigenvalue is $0$), $A$ is invertible — and its inverse is $A^{-1} = Q\operatorname{diag}(\tfrac14,\tfrac14,1)Q^{\mathsf{T}}$, with eigenvalues $\tfrac14,\tfrac14,1$, by the matrix-function rule with $f(\lambda) = 1/\lambda$. (The repeated eigenvalue $4$ has a two-dimensional eigenspace, in which we are free to pick any orthonormal pair, per §27.5.1.)
27.8 Why does the theorem fail for non-symmetric matrices?
The Spectral Theorem's power lives entirely in its hypothesis, so the most instructive thing we can do is watch what goes wrong when we drop the symmetry. There are three distinct ways a non-symmetric matrix can violate the theorem's conclusions, and seeing all three cements why "symmetric" is not a decorative assumption but the load-bearing one.
Failure 1: complex eigenvalues (no real eigen-axes). The rotation $R = \begin{psmallmatrix}0 & -1\\ 1 & 0\end{psmallmatrix}$ is not symmetric ($R^{\mathsf{T}} = -R$). Its eigenvalues are $\pm i$ — purely imaginary, no real eigenvalue at all (Chapter 26). There is no real direction it merely stretches; it rotates every real vector by $90°$. Part 1 of the theorem (real eigenvalues) fails the moment symmetry is dropped.
Failure 2: not diagonalizable at all (defective). The shear $S = \begin{psmallmatrix}1 & 1\\ 0 & 1\end{psmallmatrix}$ is not symmetric. Its only eigenvalue is $\lambda = 1$ (repeated), but it has a single independent eigenvector, $(1,0)$ — the eigenspace is one-dimensional even though the eigenvalue has algebraic multiplicity two. Such a matrix is defective (Chapter 25); it cannot be diagonalized by any invertible $P$, let alone an orthogonal $Q$. Part 2 of the theorem (a full eigenbasis) fails.
Failure 3: real eigenvalues, but non-orthogonal eigenvectors. This is the subtle one. Consider $M = \begin{psmallmatrix}2 & 1\\ 0 & 3\end{psmallmatrix}$, which is not symmetric. It does have two distinct real eigenvalues, $2$ and $3$, and two independent eigenvectors, so it is diagonalizable (Chapter 25). But its eigenvectors are $(1,0)$ and $(1,1)$ — and their dot product is $1 \ne 0$, so they are not perpendicular. There is no orthogonal $Q$ here; the change-of-basis matrix $P$ is genuinely skewed, and $P^{-1} \ne P^{\mathsf{T}}$. Part 3 of the theorem (orthogonal diagonalizability) fails even though ordinary diagonalizability holds. This is the case that traps people: diagonalizable does not imply orthogonally diagonalizable. Only symmetry guarantees the eigenvectors come out perpendicular.
# Three ways a NON-symmetric matrix violates the Spectral Theorem.
import numpy as np
R = np.array([[0., -1.], [1., 0.]]) # rotation: NOT symmetric
S = np.array([[1., 1.], [0., 1.]]) # shear: NOT symmetric
M = np.array([[2., 1.], [0., 3.]]) # NOT symmetric
print("R eigenvalues:", np.linalg.eigvals(R)) # complex: no real eigen-axis
w_S, V_S = np.linalg.eig(S)
print("S eigenvalues:", w_S, " eigenvectors:\n", np.round(V_S, 4)) # defective
w_M, V_M = np.linalg.eig(M)
print("M eigenvectors dot product:",
round(V_M[:, 0] @ V_M[:, 1], 4), "(≠ 0 → NOT orthogonal)")
R eigenvalues: [0.+1.j 0.-1.j]
S eigenvalues: [1. 1.] eigenvectors:
[[ 1. -1.]
[ 0. 0.]]
M eigenvectors dot product: 0.7071 (≠ 0 → NOT orthogonal)
The rotation has complex eigenvalues; the shear's two eigenvector columns are both multiples of $(1,0)$ (defective — only one direction); and $M$'s eigenvectors meet at a non-right angle ($\cos\theta \approx 0.707$, i.e. $45°$). Each non-symmetric matrix breaks the theorem in its own way. Only symmetry buys you all three guarantees at once.
Common Pitfall — "Diagonalizable" and "orthogonally diagonalizable" are different properties, and conflating them is a classic error. Every symmetric matrix is orthogonally diagonalizable (Spectral Theorem). Many non-symmetric matrices are diagonalizable but not orthogonally so — their eigenvectors are independent but skew, as with $M$ above. And some matrices (the defective ones) are not diagonalizable at all. The hierarchy is: orthogonally diagonalizable (symmetric/Hermitian) $\subsetneq$ diagonalizable (enough independent eigenvectors) $\subsetneq$ all square matrices. When in doubt, check $A = A^{\mathsf{T}}$ before claiming an orthonormal eigenbasis.
27.9 What is the Hermitian analogue, and why does quantum mechanics depend on it?
Everything so far lived in real space $\mathbb{R}^n$, where the transpose $A^{\mathsf{T}}$ is the right tool and "symmetric" is the right condition. But quantum mechanics — and signal processing, and much of pure mathematics — lives in complex space $\mathbb{C}^n$. There the correct analogue of the transpose is the conjugate transpose $A^{*}$ (transpose and conjugate every entry), which we met for unitary matrices in Chapter 21. The complex cousin of a symmetric matrix is a Hermitian matrix, and the Spectral Theorem carries over almost word for word.
Definition (Hermitian matrix). A complex square matrix $A \in \mathbb{C}^{n\times n}$ is Hermitian if it equals its own conjugate transpose: $$A = A^{*}, \qquad\text{i.e.}\qquad a_{ij} = \overline{a_{ji}} \text{ for all } i, j.$$ Equivalently, its diagonal entries are real and its off-diagonal entries are complex conjugates across the diagonal. (A real symmetric matrix is the special case where the entries are already real, so $A^{*} = A^{\mathsf{T}} = A$.)
The Spectral Theorem (Hermitian case). Let $A$ be Hermitian, $A = A^{*}$. Then its eigenvalues are all real, it has an orthonormal basis of eigenvectors (with respect to the complex inner product $\langle\mathbf{u},\mathbf{v}\rangle = \mathbf{u}^{*}\mathbf{v}$), and it is unitarily diagonalizable: $$A = U\Lambda U^{*}, \qquad U^{*}U = I, \qquad \Lambda \text{ real diagonal,}$$ where $U$ is a unitary matrix (Chapter 21) whose columns are the orthonormal eigenvectors. The spectral decomposition becomes $A = \sum_i \lambda_i\,\mathbf{q}_i\mathbf{q}_i^{*}$.
The proofs are the same proofs with $\mathsf{T}\to{*}$. Indeed, the real-eigenvalue proof of §27.4 already used only $A^{*} = A$, so it applies to Hermitian matrices verbatim — that was the foreshadowing. The orthogonality proof of §27.5 carries over by replacing the real dot product with the complex inner product and using the self-adjoint identity $\langle A\mathbf{u}, \mathbf{w}\rangle = \langle\mathbf{u}, A\mathbf{w}\rangle$, which holds exactly when $A = A^{*}$. The geometric story is identical too: a Hermitian matrix is a pure stretch (by real factors) along orthogonal complex directions, diagonalized by a unitary "rotation."
Warning
— State the field, every time. For a real matrix the condition is symmetry, $A = A^{\mathsf{T}}$, and the diagonalizer is orthogonal, $A = Q\Lambda Q^{\mathsf{T}}$. For a complex matrix the condition is Hermitian, $A = A^{*}$, and the diagonalizer is unitary, $A = U\Lambda U^{*}$. Using $A = A^{\mathsf{T}}$ (plain transpose) on a genuinely complex matrix is a real error — a complex symmetric matrix (equal to its plain transpose) need not have real eigenvalues and need not be unitarily diagonalizable. The conjugate is not optional; it is exactly what forces the eigenvalues real. The general condition that makes a matrix unitarily diagonalizable is normality, $A A^{*} = A^{*}A$, of which Hermitian ($A = A^{*}$) and unitary ($A^{*}A = I$, Chapter 21) are the two famous special cases.
Now the anchor we have tracked since Chapter 1: the qubit and quantum measurement. In quantum mechanics, the state of a system is a unit vector in a complex space (a qubit is a unit vector $\begin{psmallmatrix}\alpha\\\beta\end{psmallmatrix} \in \mathbb{C}^2$), and every observable — every physical quantity you can measure, such as energy, position, or a spin component — is represented by a Hermitian operator. This is not a convention; it is forced by physics through the Spectral Theorem, and the link is exactly part 1 of the theorem. When you measure an observable, the only possible outcomes are its eigenvalues, and the system jumps into the corresponding eigenvector ("collapse"). Measured energies, measured positions, measured spins are real numbers — so the operator's eigenvalues must be real — so the operator must be Hermitian. The mathematical guarantee that measurements come out real is §27.4. And because eigenvectors of distinct eigenvalues are orthogonal (§27.5), the distinct measurement outcomes correspond to perpendicular states, which is why a measurement of "spin up versus spin down" is an unambiguous, mutually exclusive question. The orthonormal eigenbasis is the set of states with definite values of the observable.
The simplest concrete examples are the Pauli matrices, the Hermitian operators for the three spin components of a qubit:
$$X = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}, \qquad Y = \begin{bmatrix} 0 & -i \\ i & 0 \end{bmatrix}, \qquad Z = \begin{bmatrix} 1 & 0 \\ 0 & -1 \end{bmatrix}.$$
Each is Hermitian: $X$ and $Z$ are real symmetric, and $Y$ satisfies $Y^{*} = Y$ because the off-diagonal entries $-i$ and $i$ are complex conjugates across the diagonal. The Spectral Theorem guarantees real eigenvalues, and indeed each Pauli matrix has eigenvalues exactly $+1$ and $-1$ — the two possible outcomes of a spin measurement ("up" and "down"), in natural units.
The spectral decomposition of §27.6 is what turns this into quantitative predictions, and it is worth seeing the connection explicitly, because it shows the rank-one projectors are not an abstraction but the very thing an experiment measures. Write the observable in spectral form, $A = \sum_i \lambda_i\,\mathbf{q}_i\mathbf{q}_i^{*}$, with orthonormal eigenstates $\mathbf{q}_i$. When you measure $A$ on a state $\boldsymbol\psi$, quantum mechanics says the probability of getting the outcome $\lambda_i$ is the squared length of the projection of $\boldsymbol\psi$ onto the corresponding eigenstate — exactly the rank-one projector applied to $\boldsymbol\psi$:
$$\Pr(\text{outcome } \lambda_i) = \lVert \mathbf{q}_i\mathbf{q}_i^{*}\boldsymbol\psi\rVert^2 = \lvert\mathbf{q}_i^{*}\boldsymbol\psi\rvert^2.$$
This is the Born rule, and the orthonormality of the eigenstates (§27.5) is what makes these probabilities sum to 1: $\sum_i \lvert\mathbf{q}_i^{*}\boldsymbol\psi\rvert^2 = \lVert\boldsymbol\psi\rVert^2 = 1$, because $\sum_i \mathbf{q}_i\mathbf{q}_i^{*} = I$ is the resolution of the identity. As a concrete example, measure the spin-$Z$ observable on the superposition state $\boldsymbol\psi = \tfrac{1}{\sqrt2}\begin{psmallmatrix}1\\1\end{psmallmatrix}$ (the "$|+\rangle$" state). The eigenstates of $Z$ are $\begin{psmallmatrix}1\\0\end{psmallmatrix}$ (eigenvalue $+1$) and $\begin{psmallmatrix}0\\1\end{psmallmatrix}$ (eigenvalue $-1$), and the squared overlaps are $\lvert\tfrac{1}{\sqrt2}\rvert^2 = \tfrac12$ each — so the qubit returns "up" or "down" with equal probability $\tfrac12$. The eigenvalues are what you can measure; the squared projections onto the eigenstates are how likely each is. The entire predictive content of a quantum measurement is the spectral decomposition of a Hermitian matrix. Let us verify the genuinely complex Pauli operator, $Y$, with eigh:
# A qubit observable is Hermitian (A* = A) and has REAL eigenvalues. Pauli-Y.
import numpy as np
Y = np.array([[0, -1j],
[1j, 0]]) # Pauli-Y: genuinely complex
print("Hermitian? (Y* == Y):", np.allclose(Y.conj().T, Y))
w, U = np.linalg.eigh(Y) # eigh handles Hermitian too
print("eigenvalues:", np.round(w, 6)) # REAL: the measurement outcomes
print("U* U =\n", np.round(U.conj().T @ U, 8)) # unitary: orthonormal eigenstates
print("reconstructs Y?", np.allclose(U @ np.diag(w) @ U.conj().T, Y))
Hermitian? (Y* == Y): True
eigenvalues: [-1. 1.]
U* U =
[[1.+0.j 0.+0.j]
[0.+0.j 1.+0.j]]
reconstructs Y? True
The Pauli-$Y$ matrix is Hermitian, its eigenvalues are exactly the real numbers $-1$ and $+1$ (the two spin outcomes), its eigenvectors are orthonormal ($U^{*}U = I$), and $U\Lambda U^{*}$ reconstructs it. Every promise of the Spectral Theorem holds in the complex case. You can pursue this thread much further into the physics in the companion volume's treatment of Hermitian operators in quantum mechanics, where the same $A = A^{*}$ condition and the same real-eigenvalue guarantee underpin the entire theory of measurement.
Real-World Application — Quantum computing and chemistry. The single most important computation in quantum chemistry and materials science is finding the eigenvalues of a Hermitian matrix — the Hamiltonian — because its eigenvalues are the allowed energy levels of a molecule and its lowest eigenvalue (the ground-state energy) determines the molecule's stability and reactivity. Algorithms from classical eigensolvers to the quantum variational eigensolver (VQE) on today's quantum computers exist precisely to extract the spectrum of a Hermitian operator. The Spectral Theorem is the reason the question "what energies can this system have?" has a clean, real-valued answer at all.
27.10 What do the perpendicular eigen-axes look like? (the visualizer returns)
It is time to bring back the recurring 2D transformation visualizer from Chapter 1 — the same tool, unchanged, that showed us shears, scalings, rotations, and (in Chapter 23) invariant directions. For a symmetric matrix it tells the cleanest story of all: the transformation stretches the plane along two perpendicular eigen-axes, and the eigenvectors are visibly at right angles. We use our running example $A = \begin{psmallmatrix}2 & 1\\ 1 & 2\end{psmallmatrix}$.
# toolkit/visualizer.py — the recurring 2D transformation visualizer.
# Shows what a 2x2 matrix A does to the unit square and the basis vectors.
import numpy as np
import matplotlib.pyplot as plt
def visualize_2d(A, title="", ax=None):
"""Plot the action of 2x2 matrix A on the unit square and i-hat, j-hat."""
A = np.asarray(A, dtype=float)
square = np.array([[0, 1, 1, 0, 0],
[0, 0, 1, 1, 0]]) # unit-square corners (closed)
out = A @ square # transformed square
e1, e2 = A @ np.array([1, 0]), A @ np.array([0, 1]) # images of basis vectors
if ax is None:
_, ax = plt.subplots(figsize=(5, 5))
ax.plot(square[0], square[1], "b--", lw=1, label="input (unit square)")
ax.fill(out[0], out[1], alpha=0.25, color="C1")
ax.plot(out[0], out[1], "C1-", lw=2, label="A · (unit square)")
ax.arrow(0, 0, *e1, color="C3", width=0.02, length_includes_head=True) # A e1
ax.arrow(0, 0, *e2, color="C2", width=0.02, length_includes_head=True) # A e2
ax.axhline(0, color="gray", lw=0.5)
ax.axvline(0, color="gray", lw=0.5)
ax.set_aspect("equal")
ax.grid(True, alpha=0.3)
ax.set_title(title or f"det = {np.linalg.det(A):.2f}")
ax.legend(loc="best", fontsize=8)
return ax
# Example: a horizontal shear
# visualize_2d([[1, 1], [0, 1]], title="Shear")
# plt.show()
Now the experiment that makes this chapter visible. We draw the symmetric matrix's action on the unit square, and overlay its two orthonormal eigenvectors and their stretched images:
# A symmetric matrix stretches along PERPENDICULAR eigen-axes. Overlay the eigenvectors.
import numpy as np, matplotlib.pyplot as plt
from visualizer import visualize_2d
A = np.array([[2.0, 1.0], [1.0, 2.0]]) # symmetric
w, Q = np.linalg.eigh(A) # w = [1, 3]; columns of Q orthonormal
ax = visualize_2d(A, title="Symmetric A: stretch along ⊥ eigen-axes")
for i in range(2):
q = Q[:, i]
ax.plot([-q[0], q[0]], [-q[1], q[1]], "k:", lw=1) # the eigen-axis (a line)
ax.arrow(0, 0, *(w[i]*q), color="C4", width=0.03, # A q_i = λ_i q_i
length_includes_head=True)
plt.show()
Figure 27.1. A symmetric transformation stretches the plane along two perpendicular eigen-axes. The dashed blue unit square (area 1) is carried to the solid orange parallelogram; the basis-vector images $\mathbf{e}_1\mapsto(2,1)$ (red) and $\mathbf{e}_2\mapsto(1,2)$ (green) are not the eigenvectors. The two dotted black lines are the eigen-axes, at exactly $+45°$ and $-45°$ — perpendicular, as the Spectral Theorem guarantees. Along the $+45°$ axis (eigenvector $\tfrac{1}{\sqrt2}(1,1)$) the transformation stretches by $\lambda = 3$; along the $-45°$ axis (eigenvector $\tfrac{1}{\sqrt2}(1,-1)$) it leaves length unchanged, $\lambda = 1$. The purple arrows show each eigenvector scaled by its eigenvalue, $A\mathbf{q}_i = \lambda_i\mathbf{q}_i$, pointing along its own dotted axis with no sideways swing. Alt-text: a plot of a unit square stretched into a parallelogram, with two perpendicular dotted lines at ±45° marking the eigen-axes and arrows along them showing pure stretching by factors 3 and 1.
The picture is the whole theorem in one image. The two eigen-axes meet at a right angle — that is part 2 (orthogonal eigenvectors). The arrows along them point straight out along their own axes, scaled and not rotated — that is the "pure stretch, no shear" geometry of §27.1. And because the eigen-axes are perpendicular, the rotation $Q$ that aligns the standard axes with them is what diagonalizes $A$. Contrast this with the non-symmetric matrix $M$ of §27.8: if you ran the visualizer on $M = \begin{psmallmatrix}2 & 1\\ 0 & 3\end{psmallmatrix}$, its eigen-axes would meet at $45°$, not $90°$ — visibly not perpendicular — which is exactly why $M$ has no orthogonal diagonalization. Symmetry is what makes the eigen-axes square up.
Geometric Intuition — Compare this figure with the rotation and shear figures from Chapters 21 and 1. A rotation moves every direction (no real eigen-axis to draw). A shear has a single eigen-direction and drags everything else sideways. A symmetric matrix has a full set of perpendicular eigen-axes and moves each one purely outward or inward. Watching the purple arrows stay glued to their dotted axes — never swinging off them — is watching "no rotation, no shear, pure stretch" with your own eyes. That is what symmetry looks like.
27.11 How does the Spectral Theorem describe a quadratic form?
There is one more picture worth drawing before we close, because it is the bridge to the very next chapter and to a great deal of optimization and statistics. A quadratic form is a function $q:\mathbb{R}^n\to\mathbb{R}$ built from a symmetric matrix,
$$q(\mathbf{x}) = \mathbf{x}^{\mathsf{T}}A\mathbf{x} = \sum_{i,j} a_{ij}\,x_i x_j, \qquad A = A^{\mathsf{T}},$$
and it is the multivariable generalization of the humble parabola $q(x) = ax^2$. Quadratic forms are everywhere applications need to measure "size" or "energy": the kinetic energy of a system, the variance of a projection, the squared error in a regression, the curvature of a loss landscape near a minimum. And the reason we always take $A$ symmetric is that only the symmetric part of a matrix contributes to $\mathbf{x}^{\mathsf{T}}A\mathbf{x}$ — so we lose nothing by assuming symmetry, and we gain the entire Spectral Theorem. For our running matrix, $q(\mathbf{x}) = 2x_1^2 + 2x_1 x_2 + 2x_2^2$, which evaluates to $q(1,0) = 2$ and $q(1,1) = 6$.
The cross-term $2x_1 x_2$ is what makes a quadratic form awkward to read: it couples the variables, so you cannot tell at a glance whether the form is a bowl, a saddle, or a ridge. The Spectral Theorem removes the cross-terms. Substitute $A = Q\Lambda Q^{\mathsf{T}}$ and change to the eigen-coordinates $\mathbf{y} = Q^{\mathsf{T}}\mathbf{x}$ (a rotation, since $Q$ is orthogonal):
$$q(\mathbf{x}) = \mathbf{x}^{\mathsf{T}}Q\Lambda Q^{\mathsf{T}}\mathbf{x} = (Q^{\mathsf{T}}\mathbf{x})^{\mathsf{T}}\Lambda(Q^{\mathsf{T}}\mathbf{x}) = \mathbf{y}^{\mathsf{T}}\Lambda\mathbf{y} = \sum_{i=1}^{n}\lambda_i\,y_i^2.$$
In the rotated coordinates the form is a pure sum of squares, with the eigenvalues as coefficients — no cross-terms at all. This is the principal axis theorem: every quadratic form becomes $\lambda_1 y_1^2 + \cdots + \lambda_n y_n^2$ once you rotate to the eigen-axes of its matrix. For $A = \begin{psmallmatrix}2&1\\1&2\end{psmallmatrix}$, the messy $2x_1^2 + 2x_1x_2 + 2x_2^2$ becomes the clean $3y_1^2 + 1y_2^2$ along the $\pm 45°$ eigen-axes.
Geometric Intuition — The level set $q(\mathbf{x}) = 1$ of a quadratic form with a symmetric, all-positive-eigenvalue matrix is an ellipse (or ellipsoid), and the Spectral Theorem tells you everything about it: its axes point along the eigenvectors, and its semi-axis lengths are $1/\sqrt{\lambda_i}$. A large eigenvalue makes a short axis (the form rises steeply, so the level curve is close in); a small eigenvalue makes a long axis. For our example the ellipse $3y_1^2 + y_2^2 = 1$ has a short semi-axis $1/\sqrt 3 \approx 0.577$ along the $\lambda = 3$ eigenvector $(1,1)$ and a long semi-axis $1/\sqrt 1 = 1$ along the $\lambda = 1$ eigenvector $(1,-1)$. The eigen-axes of the matrix are the principal axes of the ellipse — which is the geometric meaning of the name "principal axis theorem."
This is exactly the doorway into Chapter 28. The signs of the eigenvalues classify the form: if all $\lambda_i > 0$ the form is positive definite (a bowl opening upward, level sets are ellipses, the origin is a minimum); if all $\lambda_i < 0$ it is a dome; and if the signs are mixed it is a saddle (level sets are hyperbolas). Because the Spectral Theorem guarantees the eigenvalues are real, asking "what sign are they?" is a well-posed question with a clean answer — and that question is the second-derivative test for whether an optimization landscape curves up or down, the test that decides whether a critical point of a machine-learning loss is a minimum or a saddle. The Spectral Theorem turns the geometry of every quadratic form into a list of real numbers and their signs; Chapter 28 makes that the foundation of positive-definite matrices, energy, and optimization.
Check Your Understanding — A quadratic form $q(\mathbf{x}) = \mathbf{x}^{\mathsf{T}}A\mathbf{x}$ has a symmetric $A$ with eigenvalues $5$ and $-2$. Is the level set $q(\mathbf{x}) = 1$ an ellipse or a hyperbola? Is the origin a minimum, maximum, or saddle of $q$?
Answer
The eigenvalues have mixed signs ($5 > 0$, $-2 < 0$), so in eigen-coordinates $q = 5y_1^2 - 2y_2^2$ — a difference of squares, whose level set $q = 1$ is a hyperbola, not an ellipse. The origin is a saddle: moving along the $\lambda = 5$ eigen-axis increases $q$ (a valley climbing up), while moving along the $\lambda = -2$ eigen-axis decreases it (a ridge falling away). Mixed-sign eigenvalues always mean a saddle — the precise statement we develop in Chapter 28.
27.12 How do you compute the spectral decomposition in code?
We close the exposition with the chapter's contribution to the from-scratch toolkit you have been assembling since Chapter 2. The natural deliverable here is a function that takes a symmetric matrix and returns its complete spectral data — the orthonormal $Q$, the real eigenvalues $\Lambda$, and the list of rank-one projectors that sum back to $A$ — together with verification that $A = Q\Lambda Q^{\mathsf{T}}$ and $Q^{\mathsf{T}}Q = I$.
A subtlety worth stating plainly, in the spirit of the rest of Part V: the hard part of this computation — actually finding the eigenvalues and eigenvectors of a symmetric matrix — is exactly what the QR algorithm of toolkit/eigen.py (Chapters 23 and 29) does, and computing eigenvalues from scratch is genuinely involved. So the toolkit function here is allowed to call your earlier eigen-routine (or numpy's eigh) to obtain the orthonormal eigenpairs, and then its own job is to assemble the spectral decomposition: normalize, sort, stack the eigenvectors into $Q$, build the rank-one projectors $\mathbf{q}_i\mathbf{q}_i^{\mathsf{T}}$, and verify the two identities. The assembly and verification are pure linear algebra you can write by hand.
Build Your Toolkit — Implement
spectral_decomposition(A)intoolkit/spectral.pyfor a symmetric matrix $A$ (state and check this precondition — raise an error if $A \ne A^{\mathsf{T}}$ within tolerance). It should return three things: the orthogonal matrix $Q$ whose columns are the orthonormal eigenvectors, the real eigenvalues $\Lambda$ (as a diagonal matrix or a sorted list), and the list of rank-one projectors $[\,\lambda_1\mathbf{q}_1\mathbf{q}_1^{\mathsf{T}},\dots,\lambda_n\mathbf{q}_n\mathbf{q}_n^{\mathsf{T}}\,]$ whose sum is $A$. You may obtain the raw eigenpairs from yourpower_iteration/qr_algorithmintoolkit/eigen.py(or, for now, fromnp.linalg.eigh); the from-scratch part is the assembly — Gram–Schmidt (Chapter 20) any repeated-eigenvalue block into an orthonormal set, normalize, and form the outer products $\mathbf{q}_i\mathbf{q}_i^{\mathsf{T}}$ in pure Python. Then verify with numpy: confirm $Q^{\mathsf{T}}Q = I$ and $A = Q\Lambda Q^{\mathsf{T}} = \sum_i \lambda_i\mathbf{q}_i\mathbf{q}_i^{\mathsf{T}}$, both to tolerance, againstnp.linalg.eigh. This module joinseigen.pyfrom Chapters 23/29 andorthogonal.pyfrom Chapter 21, and it is the direct ancestor of the PCA routine you will write in Chapter 32.
Here is the kind of check your finished function should pass — written with numpy here so you can confirm the expected behavior before coding the from-scratch assembly:
# Expected behavior of spectral_decomposition(A) — verify against np.linalg.eigh.
import numpy as np
def spectral_decomposition(A):
A = np.asarray(A, dtype=float)
assert np.allclose(A, A.T), "spectral_decomposition requires a SYMMETRIC matrix"
w, Q = np.linalg.eigh(A) # real eigenvalues, orthonormal Q
projectors = [w[i] * np.outer(Q[:, i], Q[:, i]) for i in range(len(w))]
return Q, np.diag(w), projectors
A = np.array([[4.0, 1.0, 0.0],
[1.0, 4.0, 1.0],
[0.0, 1.0, 4.0]]) # symmetric 3x3
Q, Lam, projs = spectral_decomposition(A)
print("eigenvalues:", np.round(np.diag(Lam), 6))
print("Q^T Q = I? ", np.allclose(Q.T @ Q, np.eye(3)))
print("A = QΛQ^T? ", np.allclose(Q @ Lam @ Q.T, A))
print("A = Σ λ q q^T?", np.allclose(sum(projs), A))
eigenvalues: [2.585786 4. 5.414214]
Q^T Q = I? True
A = QΛQ^T? True
A = Σ λ q q^T? True
The symmetric tridiagonal matrix has three real eigenvalues ($4 - \sqrt2,\ 4,\ 4 + \sqrt2$), an orthonormal eigenvector matrix, and both reconstructions — the factored $Q\Lambda Q^{\mathsf{T}}$ and the summed $\sum_i\lambda_i\mathbf{q}_i\mathbf{q}_i^{\mathsf{T}}$ — recover $A$ exactly. That is the Spectral Theorem, operationalized.
Historical Note — The term "spectrum" for the set of eigenvalues was introduced by David Hilbert in the early 1900s, in his work on integral equations and infinite-dimensional operators — strikingly, before the spectral lines of atomic physics were understood to be eigenvalues of a quantum Hamiltonian, so the later physical "spectrum" and Hilbert's mathematical "spectrum" turned out to be the same idea [verify]. The finite-dimensional theorem for symmetric matrices is older, with roots in Cauchy's 1829 study of quadratic forms and the principal-axis theorem [verify]. The full operator-theoretic spectral theorem in Hilbert space, which powers quantum mechanics, was completed by Hilbert and later von Neumann in the 1920s–30s.
27.13 What have we built, and where does it lead?
We began with a picture — a rubber sheet stretched along perpendicular axes — and ended with the mathematical bedrock of quantum measurement, and one idea ran through all of it: a symmetric matrix is a pure stretch along orthogonal eigen-axes, so it is diagonalized by a rotation, $A = Q\Lambda Q^{\mathsf{T}}$. From the single hypothesis $A = A^{\mathsf{T}}$ we proved three guarantees that no general matrix enjoys: real eigenvalues (§27.4, because $A^{*} = A$ forces $\mathbf{v}^{*}A\mathbf{v}$ real), orthogonal eigenvectors for distinct eigenvalues (§27.5, because symmetry slides $A$ across the dot product and collides the eigenvalues), and therefore orthogonal diagonalizability. We rewrote the factorization as the spectral decomposition $A = \sum_i \lambda_i\mathbf{q}_i\mathbf{q}_i^{\mathsf{T}}$ — a weighted sum of perpendicular rank-one projectors — which is the most honest description of what a symmetric matrix does. We watched the perpendicular eigen-axes in the visualizer, saw exactly how non-symmetric matrices break each guarantee, and carried the whole theorem into complex space, where Hermitian matrices ($A = A^{*}$) govern the observables of quantum mechanics and guarantee that measurements come out real.
This chapter is the high point of the book's deepest theme — that eigenvalues and eigenvectors reveal what a matrix really does, stripped of coordinate-system artifacts — and of its other great theme, that geometry and algebra are two views of one object. "Symmetric" is a statement about symbols ($a_{ij} = a_{ji}$); "pure stretch along perpendicular axes" is a statement about pictures; we proved they are the same statement. That equivalence is the threshold idea to carry forward: once you see that the symmetry of a matrix is the algebraic shadow of the orthogonality of its eigen-axes, symmetric matrices stop being a special case to memorize and become a geometric fact you can picture. Keep the vocabulary close — symmetric and Hermitian matrices, orthogonally and unitarily diagonalizable, the orthonormal eigenbasis, the spectral decomposition as a sum of rank-one projectors, real eigenvalues, and the normal matrices that are the most general unitarily diagonalizable class — because the rest of the book leans hard on all of it.
The forward references write themselves, and they are some of the most important in the book. Chapter 28 (Positive Definite Matrices and Quadratic Forms) is the immediate sequel: a quadratic form $\mathbf{x}^{\mathsf{T}}A\mathbf{x}$ has a symmetric $A$, and the Spectral Theorem turns it into a sum of squares $\sum_i \lambda_i y_i^2$ in eigen-coordinates — so the signs of the eigenvalues (all positive? mixed?) decide whether the form is a bowl, a dome, or a saddle, which is exactly the second-derivative test for optimization and the curvature of an energy landscape. Chapter 32 (Principal Component Analysis) is the data-science payoff we have teased throughout: the covariance matrix of a dataset is symmetric, its spectral decomposition's eigenvectors are the principal components (the perpendicular directions of greatest variance), and its eigenvalues are the variances along them — PCA is the Spectral Theorem applied to a covariance matrix, and the orthogonality of the components is the §27.5 orthogonality we just proved. And Chapter 30 (the Singular Value Decomposition) generalizes the whole picture to every matrix, symmetric or not, square or not, by applying the Spectral Theorem to the always-symmetric $A^{\mathsf{T}}A$ — the SVD is, at its heart, the spectral theorem in disguise. The orthogonal diagonalization you learned to perform here is the engine inside the three most consequential tools in the back half of this book.
# Forward look: the covariance matrix is symmetric → its spectral decomposition IS PCA.
import numpy as np
C = np.array([[3.0, 1.0],
[1.0, 3.0]]) # a (symmetric) covariance matrix
w, Q = np.linalg.eigh(C)
print("variances along principal axes:", w) # eigenvalues = variances
print("principal components (⊥ columns of Q):\n", np.round(Q, 4))
print("fraction of variance on the top component:", round(w[-1] / w.sum(), 4))
variances along principal axes: [2. 4.]
principal components (⊥ columns of Q):
[[-0.707107 0.707107]
[ 0.707107 0.707107]]
fraction of variance on the top component: 0.6667
The covariance matrix's eigenvalues are the variances along its perpendicular principal axes ($2$ and $4$), the larger one capturing two-thirds of the total spread — a first glimpse of the dimensionality reduction that Chapter 32 builds into full Principal Component Analysis. The same orthogonal diagonalization that we proved makes symmetric matrices special is, it turns out, how machines find the hidden structure in data. Hold onto the perpendicular-eigen-axes picture; it is about to do extraordinary work.