Chapter 25 — Key Takeaways

DataField.Dev

Chapter 25 — Key Takeaways

The big ideas

A diagonalizable matrix is a stretch in disguise. In its own eigenbasis, the transformation does nothing but scale each axis independently. The messy entries of $A$ are an artifact of the wrong coordinate system; the eigenbasis is the coordinate system where the transformation's true simplicity shows. This is recurring theme #6 (eigenvalues reveal what a matrix really does) at its operational peak.
The factorization is $A = PDP^{-1}$. The columns of $P$ are the eigenvectors; the diagonal of $D$ holds the matching eigenvalues. This is exactly Chapter 16's similarity $D = P^{-1}AP$ applied to the best possible basis — the eigenbasis. Read right to left: $P^{-1}$ goes into eigenbasis coordinates, $D$ stretches each axis, $P$ comes back.
Diagonalization is precisely the existence of an eigenbasis. An $n \times n$ matrix is diagonalizable if and only if it has $n$ linearly independent eigenvectors. Equivalently, every eigenvalue's geometric multiplicity equals its algebraic multiplicity (Chapter 24).
$n$ distinct eigenvalues is sufficient (not necessary). Distinct eigenvalues force independent eigenvectors, so the matrix diagonalizes. But repeated eigenvalues are fine too, as long as each supplies its full quota of eigenvectors (the identity matrix is the extreme case).
A defective matrix cannot be diagonalized. When some eigenvalue is short on eigenvectors (geometric $<$ algebraic), there is no eigenbasis, $P$ is singular, and $A = PDP^{-1}$ is impossible. The shear $\begin{bmatrix}2&1\\0&2\end{bmatrix}$ is the canonical example; Jordan normal form (Chapter 36) is the repair.
The payoff is fast powers: $A^k = PD^kP^{-1}$. Raising $D$ to the $k$-th power is just $n$ scalar exponentiations, so $A^{1000}$ costs no more than $A^2$. The exponent drops out of the cost entirely.
The long run is the dominant eigenvalue. Expanding $A^k\mathbf{x}_0 = \sum_i c_i\lambda_i^k\mathbf{v}_i$, the largest-magnitude eigenvalue wins as $k \to \infty$: $|\lambda_1| > 1$ means growth, $|\lambda_1| < 1$ means decay, $|\lambda_1| = 1$ means a steady state. The dominant eigenvector is the limiting direction; the gap to the next eigenvalue sets the convergence speed.
Functions of a matrix work the same way: $f(A) = Pf(D)P^{-1}$. Apply $f$ to each eigenvalue and sandwich. This previews the matrix exponential $e^A = Pe^DP^{-1}$ (Chapter 37), the engine of continuous dynamical systems.

Skills you should now have

Diagonalize a small matrix end to end: characteristic polynomial $\to$ eigenvalues $\to$ eigenvectors $\to$ assemble $P$, $D$ $\to$ compute $P^{-1}$ $\to$ reconstruct and confirm $PDP^{-1} = A$.
State the diagonalizability condition precisely and decide whether a given matrix satisfies it (including recognizing a defective matrix).
Compute $A^k$ via $PD^kP^{-1}$ by hand and in numpy, and explain why it is cheap.
Turn a linear recurrence into a companion matrix, diagonalize it, and read off the closed form and the asymptotic growth rate (Fibonacci, and the CS recurrence of Case Study 2).
Find a Markov chain's steady state as the $\lambda = 1$ eigenvector and its convergence rate as the second eigenvalue (Case Study 1).
Avoid the two ordering errors: $P$ versus $P^{-1}$ direction (Chapter 16) and column-of-$P$ versus diagonal-of-$D$ pairing.
Implement diagonalize(A) and matrix_power_via_diag(A, k) from scratch and verify against numpy.

Terms to know

diagonalization, diagonalizable, $A = PDP^{-1}$ (eigendecomposition), eigenbasis, defective matrix, distinct eigenvalues (sufficient condition), geometric vs. algebraic multiplicity (the diagonalizability test), matrix power ($A^k = PD^kP^{-1}$), decoupling (into independent stretches/modes), mode (an eigenvector of a dynamical system), dominant eigenvalue, companion matrix, steady state, mixing rate (second eigenvalue), function of a matrix ($f(A) = Pf(D)P^{-1}$), minimal polynomial (the polynomial diagonalizability test).

How this connects to the rest of the book

Back to Chapter 16. Diagonalization is the direct payoff of change of basis. The re-gridding that turned $\begin{bmatrix}2&1\\1&2\end{bmatrix}$ into the diagonal $\begin{bmatrix}3&0\\0&1\end{bmatrix}$ was a preview of this chapter; now the special basis has a name (the eigenbasis) and the factorization has a use (powers and dynamics). Trace and determinant, the similarity invariants of Chapter 16, are revealed here as the sum and product of the eigenvalues — read straight off the diagonal of $D$.
Built on Chapters 23–24. Chapter 23 gave the eigenvectors as invariant directions; Chapter 24 gave the characteristic polynomial and the multiplicity distinction. Diagonalization is those two chapters fused: collect the eigenvectors into $P$, the eigenvalues into $D$, and the multiplicity bookkeeping of Chapter 24 becomes the exact condition for the fusion to succeed.
Forward to the Spectral Theorem (Chapter 27) — the next summit. This chapter's diagonalizing $P$ was, in general, a skewed basis, so $P^{-1}$ was a genuine inverse and the eigenvectors were not perpendicular (witness the non-orthogonal $(1,1), (1,-2)$ of §25.2.2). The spectral theorem reveals the spectacular special case: when $A$ is symmetric (real) or Hermitian (complex), its eigenvectors are automatically orthogonal, so the diagonalizing matrix can be taken orthogonal — $P^{-1} = P^{\mathsf{T}}$, a free transpose instead of an inversion — and the change of basis is a pure rotation. That is the cleanest, most stable diagonalization possible, it underlies PCA and the geometry of covariance (Chapter 28, and the PCA connection), and it is the mathematical foundation of changing representations in quantum mechanics, where Hermitian operators represent observables and their orthogonal eigenbases are the measurement outcomes.
Forward to dynamics and decompositions. Chapter 26 handles the matrices with no real eigenbasis (rotations) via complex eigenvalues. Chapter 29 (PageRank) is the dominant-eigenvector limit of this chapter applied to a web-sized stochastic matrix. Chapter 36 (Jordan form) is what to do when diagonalization fails — the canonical form for defective matrices. Chapter 37 (matrix exponential) is the continuous twin: $e^{At}$ decouples a system of ODEs into independent exponentials, exactly as $A^k$ decoupled the discrete iteration here.

The one image to keep

The three-panel visualizer of §25.2.3: a transformation that looks like a lopsided parallelogram on the standard grid is, in its eigenbasis, just a clean axis-aligned stretch — a $3 \times 1$ rectangle. The outer panels ($P^{-1}$ and $P$) are merely the round trip into and out of the right coordinate system; the middle panel ($D$) is the transformation's true, simple nature. Diagonalization is opening the right window — and once it is open, powers, exponentials, and the long-run fate of the system are all read off the diagonal.