47 min read

> Learning paths. Math majors — read everything, especially the diagonalizability proof in §25.3, the defective-matrix analysis, and the Math-Major Sidebar on the eigenspace decomposition; the precise condition ($n$ independent eigenvectors) and the...

Prerequisites

  • chapter-24-characteristic-polynomial
  • chapter-16-change-of-basis

Learning Objectives

  • Write a diagonalizable matrix as $A = PDP^{-1}$ with eigenvectors in the columns of $P$ and eigenvalues on the diagonal of $D$, and explain the factorization as a change to the eigenbasis.
  • State precisely the condition for diagonalizability — $n$ linearly independent eigenvectors — and the sufficient condition of $n$ distinct eigenvalues, and recognize a defective matrix that fails it.
  • Compute matrix powers efficiently with $A^k = PD^kP^{-1}$, and explain why this is the payoff of diagonalization.
  • Use diagonalization to find a closed form for a linear recurrence (Fibonacci) and the long-run behavior of a Markov chain.
  • Preview functions of a matrix $f(A) = Pf(D)P^{-1}$ and tie it to the matrix exponential of Chapter 37.
  • Implement diagonalize(A) and matrix_power_via_diag(A, k) and verify $A = PDP^{-1}$ against numpy.

Diagonalization: When a Matrix Reveals Its True Nature

Learning paths. Math majors — read everything, especially the diagonalizability proof in §25.3, the defective-matrix analysis, and the Math-Major Sidebar on the eigenspace decomposition; the precise condition ($n$ independent eigenvectors) and the role of multiplicity from Chapter 24 are the heart of the chapter. CS / Data Science — focus on the Geometric Intuition callouts, the power formula $A^k = PD^kP^{-1}$ in §25.4, the Fibonacci and Markov worked examples, the numpy verifications, and the two case studies (long-run economics; a recurrence in CS); the idea that "powers are trivial in the eigenbasis" is the engine behind dynamical systems and PageRank. Physics / Engineering — focus on the geometry of decoupling into independent stretches, the functions-of-a-matrix preview in §25.7, and the recurrence/oscillation applications that lead into the matrix exponential of Chapter 37. This chapter assumes the eigenvalues and eigenvectors of Chapters 23–24 and the change-of-basis viewpoint of Chapter 16.

25.1 What does it mean to diagonalize a matrix?

In Chapter 16 we performed a small piece of magic and then walked away from it. We took the symmetric matrix $A = \begin{bmatrix}2&1\\1&2\end{bmatrix}$ — a dense thing with cross-terms, whose action on the plane looked like a lopsided parallelogram — and we re-gridded it into the basis $\{(1,1),\,(-1,1)\}$. In that basis the very same transformation became the diagonal matrix $\begin{bmatrix}3&0\\0&1\end{bmatrix}$: a pure stretch by $3$ along one new axis and by $1$ along the other. We said then that this was a preview of "the most powerful idea ahead." This is that idea, and now we have the vocabulary to say exactly what happened. The directions $(1,1)$ and $(-1,1)$ were the eigenvectors of $A$ (Chapter 23) — the directions it scaled without rotating — and the diagonal entries $3$ and $1$ were the eigenvalues, the scaling factors. To diagonalize a matrix is to choose its eigenbasis as the coordinate system, and watch the matrix collapse into the diagonal of its eigenvalues.

That sentence is the whole chapter. Everything else is making it precise, proving when it is possible, and cashing in the payoff. The payoff is enormous, because a diagonal matrix is the easiest object in all of linear algebra: it does nothing but scale each coordinate axis on its own, independently of the others. There are no interactions, no cross-terms, no coupling. If you can re-describe a complicated transformation as "stretch axis one by $\lambda_1$, axis two by $\lambda_2$, and so on," then anything you want to do to that transformation — apply it a thousand times, take its square root, run it backward, follow it as a continuous flow — you can do one axis at a time, on a single number each. The tangle becomes a list of independent scalars.

The Key Insight — A diagonalizable matrix is a stretch in disguise. The messy entries of $A$ are an artifact of describing independent stretches in the wrong coordinate system — the standard basis. Switch to the eigenbasis, and the disguise drops: the transformation is revealed to be nothing but a set of independent one-dimensional scalings, recorded on the diagonal of $D$. "Diagonalizing $A$" and "finding the coordinate system in which $A$ does the simplest possible thing" are the same act.

This is recurring theme #6 of the book reaching its operational peak: eigenvalues and eigenvectors reveal what a matrix really does, stripped of coordinate-system artifacts. Chapter 23 gave us the invariant directions as a geometric phenomenon you could watch in the visualizer; Chapter 24 gave us the characteristic polynomial to compute them and the crucial distinction between algebraic and geometric multiplicity. Diagonalization is where those two threads are tied into a single factorization — and where we discover that the multiplicity bookkeeping of Chapter 24 was not pedantry but the exact condition for the magic to work.

The motivating question of the chapter is brutally practical. Suppose you have a transformation you must apply many times: a Markov chain stepped forward a hundred months, a population model iterated across fifty generations, the Fibonacci recurrence pushed out to the thousandth term, a discrete dynamical system run to its long-run limit. Computing $A^{100}$ by repeated multiplication is a hundred matrix products — expensive, and worse, opaque: the answer is a wall of numbers that tells you nothing about why the system behaves as it does. Diagonalization answers both complaints at once. It makes $A^{100}$ almost free to compute, and it makes the answer legible — the long-run behavior is dictated by the largest eigenvalue, and you can read the system's fate straight off the diagonal of $D$. Let us build the factorization first, then collect the reward.

25.1.1 The factorization, stated

Here is the central object of the chapter, named and frozen. We met its grammar in Chapter 16 as similarity; now we apply that grammar to the one basis that matters most.

A square matrix $A$ ($n \times n$) is diagonalizable if it can be written $$\boxed{\;A = PDP^{-1}\;}$$ where $D$ is a diagonal matrix and $P$ is invertible. When this holds, the diagonal entries of $D$ are the eigenvalues of $A$, and the columns of $P$ are the corresponding eigenvectors, in matching order. This factorization is called the eigendecomposition of $A$. Equivalently (Chapter 16), $A$ is similar to a diagonal matrix: $D = P^{-1}AP$.

Notice that this is exactly the similarity relation $B = P^{-1}AP$ from Chapter 16, solved the other way around and with the special demand that the result $B$ be diagonal. In Chapter 16 we asked "what does $A$ look like in some arbitrary new basis?" and got a (usually messy) similar matrix. Here we ask the sharper question: "is there a basis in which $A$ looks diagonal, and if so, which one?" The answer — when it exists — is the eigenbasis, and the change-of-basis matrix $P$ is built from the eigenvectors exactly as in Chapter 16's recipe: its columns are the new basis vectors written in the old (standard) coordinates.

25.2 Why are the columns of $P$ the eigenvectors? (the eigenbasis-as-coordinates story)

We could verify the formula $A = PDP^{-1}$ by plugging in and checking, but that would betray the book's method. Let us instead derive it, geometry first, so that it is forced rather than memorized — and so that the columns of $P$ being eigenvectors is something you could have predicted.

Start from the geometric fact we already own from Chapter 23. An eigenvector $\mathbf{v}_i$ with eigenvalue $\lambda_i$ satisfies $A\mathbf{v}_i = \lambda_i \mathbf{v}_i$: the transformation $A$ does nothing to $\mathbf{v}_i$ but scale it by $\lambda_i$. Now suppose $A$ has $n$ eigenvectors $\mathbf{v}_1, \dots, \mathbf{v}_n$ that are linearly independent — enough of them to form a basis of $\mathbb{R}^n$. Call this the eigenbasis. The whole point of a basis is that we may use it as a coordinate system, and the whole point of this basis is that $A$ acts on each of its axes in the simplest possible way.

Geometric Intuition — Stand inside the eigenbasis and ask what $A$ does. Along the first eigen-axis (the direction of $\mathbf{v}_1$), $A$ just stretches by $\lambda_1$ — the axis does not tilt, does not leak into the other axes, it simply scales. Along the second eigen-axis, $A$ stretches by $\lambda_2$, independently. And so on. From inside the eigenbasis, $A$ is not a tangled mixing of coordinates at all — it is $n$ separate, independent stretches, one per axis. A transformation that scales each coordinate axis independently is a diagonal matrix. So in the eigenbasis, $A$ is diagonal, with the stretch factors $\lambda_1, \dots, \lambda_n$ down the diagonal. The eigenbasis is the coordinate system in which $A$ stops mixing and starts merely scaling.

Now let us turn that picture into the factorization, using nothing but Chapter 16's change-of-basis machinery. Build the matrix $P$ whose columns are the eigenvectors, written in standard coordinates: $$P = \big[\,\mathbf{v}_1 \mid \mathbf{v}_2 \mid \cdots \mid \mathbf{v}_n\,\big].$$ By Chapter 16, this $P$ is precisely the change-of-basis matrix that converts eigenbasis coordinates into standard coordinates (its columns are the new basis vectors in old coordinates). Its inverse $P^{-1}$ converts the other way, standard into eigenbasis coordinates. And the matrix of the transformation $A$ in the eigenbasis is the similarity $D = P^{-1}AP$ — which, by the geometric argument above, must be diagonal. Rearranging $D = P^{-1}AP$ by multiplying on the left by $P$ and on the right by $P^{-1}$ gives the boxed formula: $$A = PDP^{-1}.$$

There is an even cleaner way to see that $D$ is the diagonal of eigenvalues, one that makes the whole thing a single line of algebra. Apply $A$ to the matrix $P$ — that is, to all the eigenvectors at once. Because matrix multiplication acts column by column (Chapter 8), $AP$ has the columns $A\mathbf{v}_1, A\mathbf{v}_2, \dots, A\mathbf{v}_n$. But each of those is just $\lambda_i \mathbf{v}_i$, by the eigen-equation. So $$AP = \big[\,A\mathbf{v}_1 \mid \cdots \mid A\mathbf{v}_n\,\big] = \big[\,\lambda_1\mathbf{v}_1 \mid \cdots \mid \lambda_n\mathbf{v}_n\,\big] = \big[\,\mathbf{v}_1 \mid \cdots \mid \mathbf{v}_n\,\big]\begin{bmatrix}\lambda_1 & & \\ & \ddots & \\ & & \lambda_n\end{bmatrix} = PD.$$ The last equality is worth pausing on: scaling each column of $P$ by its own factor $\lambda_i$ is exactly what multiplying $P$ on the right by the diagonal matrix $D = \operatorname{diag}(\lambda_1, \dots, \lambda_n)$ does. (Right-multiplying by a diagonal matrix scales columns; left-multiplying scales rows — a fact worth committing to memory.) So we have shown $AP = PD$ in one stroke. If the eigenvectors are independent, $P$ is invertible, and we may multiply on the right by $P^{-1}$ to land on $A = PDP^{-1}$, or on the left by $P^{-1}$ to land on $D = P^{-1}AP$. The two boxed forms are the same statement.

The Key Insight — Read $A = PDP^{-1}$ right to left, as the recipe it is: $P^{-1}$ translates a vector from standard coordinates into eigenbasis coordinates; $D$ stretches each eigen-axis independently; $P$ translates the stretched result back into standard coordinates. The middle matrix $D$ is the real transformation, laid bare as pure scaling. The outer $P$ and $P^{-1}$ are just the round-trip into and out of the coordinate system where that scaling is visible. This is the exact "go to good coordinates, act simply, come back" pattern of Chapter 16's similarity — now applied to the best coordinates of all.

25.2.1 The equation $AP = PD$ is the whole story

The relation $AP = PD$ deserves its own name in your head, because it is the diagonalization condition in its most transparent form. It says: applying $A$ to the eigenvector matrix is the same as scaling the columns of that matrix by the eigenvalues. Read column by column, it is just $n$ copies of the eigen-equation $A\mathbf{v}_i = \lambda_i\mathbf{v}_i$ stacked side by side. The only extra ingredient needed to get from $AP = PD$ to $A = PDP^{-1}$ is that $P$ be invertible — and $P$ is invertible exactly when its columns (the eigenvectors) are linearly independent. That single requirement is the entire content of "when does diagonalization work," and we examine it next.

25.2.2 A complete diagonalization, computed by hand

The running example $\begin{bmatrix}2&1\\1&2\end{bmatrix}$ borrowed its eigenbasis from Chapter 16, so let us diagonalize a fresh matrix from nothing — eigenvalues, eigenvectors, $P^{-1}$, and the reconstruction — to see every step of the recipe in one place. Take the non-symmetric matrix $$A = \begin{bmatrix}4 & 1 \\ 2 & 3\end{bmatrix},$$ which we choose deliberately because it is not symmetric: its eigenvectors will not be orthogonal (that perpendicular bonus is reserved for symmetric matrices in Chapter 27), so this is the honest, general case where $P^{-1}$ must be computed as a true inverse.

Step 1 — Eigenvalues (Chapter 24). The characteristic polynomial is $$\det(A - \lambda I) = \det\begin{bmatrix}4-\lambda & 1 \\ 2 & 3-\lambda\end{bmatrix} = (4-\lambda)(3-\lambda) - 2 = \lambda^2 - 7\lambda + 10 = (\lambda - 5)(\lambda - 2).$$ The eigenvalues are $\lambda_1 = 5$ and $\lambda_2 = 2$ — distinct, so by §25.3.1 (which we are about to prove) the matrix is diagonalizable, and we are guaranteed two independent eigenvectors. As a quick sanity check using the invariants of Chapter 16: the eigenvalues sum to $5 + 2 = 7 = \operatorname{tr}(A)$ and multiply to $5 \cdot 2 = 10 = \det(A)$. ✓

Step 2 — Eigenvectors (Chapter 23). For $\lambda_1 = 5$, solve $(A - 5I)\mathbf{v} = \mathbf{0}$: $$A - 5I = \begin{bmatrix}-1 & 1 \\ 2 & -2\end{bmatrix} \;\Longrightarrow\; -v_1 + v_2 = 0 \;\Longrightarrow\; v_1 = v_2,$$ so $\mathbf{v}_1 = (1, 1)$ spans the eigenspace. For $\lambda_2 = 2$, solve $(A - 2I)\mathbf{v} = \mathbf{0}$: $$A - 2I = \begin{bmatrix}2 & 1 \\ 2 & 1\end{bmatrix} \;\Longrightarrow\; 2v_1 + v_2 = 0 \;\Longrightarrow\; v_2 = -2v_1,$$ so $\mathbf{v}_2 = (1, -2)$ spans that eigenspace. The two eigenvectors $(1,1)$ and $(1,-2)$ are visibly not perpendicular (their dot product is $1 - 2 = -1 \neq 0$) — exactly as promised for a non-symmetric matrix — but they are independent (neither is a multiple of the other), which is all diagonalization requires.

Step 3 — Assemble $P$, $D$, and $P^{-1}$. Stack the eigenvectors as columns of $P$, matching eigenvalues on the diagonal of $D$ in the same order: $$P = \begin{bmatrix}1 & 1 \\ 1 & -2\end{bmatrix}, \qquad D = \begin{bmatrix}5 & 0 \\ 0 & 2\end{bmatrix}, \qquad P^{-1} = \frac{1}{\det P}\begin{bmatrix}-2 & -1 \\ -1 & 1\end{bmatrix} = \frac{1}{-3}\begin{bmatrix}-2 & -1 \\ -1 & 1\end{bmatrix} = \begin{bmatrix}\tfrac{2}{3} & \tfrac{1}{3} \\[2pt] \tfrac{1}{3} & -\tfrac{1}{3}\end{bmatrix},$$ using $\det P = (1)(-2) - (1)(1) = -3$ and the $2\times 2$ inverse formula from Chapter 9 (swap the diagonal, negate the off-diagonal, divide by the determinant).

Step 4 — Reconstruct and confirm. The claim $A = PDP^{-1}$ should return the original matrix. Multiply $PD$ first (scaling the columns of $P$ by $5$ and $2$ — the right-multiply-by-diagonal rule from §25.2): $$PD = \begin{bmatrix}1 & 1 \\ 1 & -2\end{bmatrix}\begin{bmatrix}5 & 0 \\ 0 & 2\end{bmatrix} = \begin{bmatrix}5 & 2 \\ 5 & -4\end{bmatrix}, \qquad PDP^{-1} = \begin{bmatrix}5 & 2 \\ 5 & -4\end{bmatrix}\begin{bmatrix}\tfrac{2}{3} & \tfrac{1}{3} \\[2pt] \tfrac{1}{3} & -\tfrac{1}{3}\end{bmatrix} = \begin{bmatrix}4 & 1 \\ 2 & 3\end{bmatrix}. ✓$$ The original $A$ is recovered exactly. Let numpy confirm the whole pipeline, including the ordering of eigenvalues against eigenvectors:

# A full diagonalization of a non-symmetric matrix, verified end to end.
import numpy as np
np.set_printoptions(suppress=True, precision=6)
A = np.array([[4., 1.], [2., 3.]])
P = np.array([[1., 1.], [1., -2.]])      # columns: eigenvectors (1,1) and (1,-2)
D = np.array([[5., 0.], [0., 2.]])       # eigenvalues 5 and 2, MATCHING order
Pinv = np.linalg.inv(P)
print("P^-1 =\n", Pinv)                   # [[ 0.666667  0.333333] [ 0.333333 -0.333333]]
print("reconstruct PDP^-1 =\n", P @ D @ Pinv)          # [[4. 1.] [2. 3.]]
print("matches A:", np.allclose(P @ D @ Pinv, A))      # True
# numpy's own eigendecomposition (eigenvectors normalized, possibly reordered):
w, V = np.linalg.eig(A)
print("numpy eigenvalues:", w)            # [5. 2.]

The output confirms reconstruct PDP^-1 = [[4. 1.] [2. 3.]] with matches A: True, and numpy independently reports the eigenvalues [5. 2.]. This is the entire diagonalization workflow — characteristic polynomial, eigenvectors, assemble, invert, reconstruct — and you will run it, in this exact order, every time you diagonalize by hand.

Common PitfallMismatched ordering of $P$'s columns and $D$'s diagonal. The factorization $A = PDP^{-1}$ is correct only when the $i$-th column of $P$ is the eigenvector belonging to the $i$-th diagonal entry of $D$. If you list the eigenvectors as columns of $P$ in the order $(1,1), (1,-2)$ but write $D = \operatorname{diag}(2, 5)$ — eigenvalues in the wrong order — then $PDP^{-1}$ produces some other matrix entirely, not $A$. The eigenvector and its eigenvalue are a married pair; you may reorder the pairs freely (any ordering of the eigenbasis gives a valid diagonalization), but you may never separate a column of $P$ from its matching diagonal entry of $D$. When you swap two columns of $P$, you must swap the two corresponding entries of $D$ in lockstep. This pairing error and the direction error of Chapter 16 ($P$ versus $P^{-1}$) are the two mistakes that account for nearly every wrong diagonalization.

25.2.3 What does diagonalization look like in the visualizer?

This is the chapter's promised return of the recurring 2D visualizer from Chapter 1, and it makes the "stretch in disguise" slogan something you can see. Recall the visualizer draws the unit square and its image under a $2\times 2$ matrix, with the images of the two basis vectors as arrows. We use it exactly as in Chapter 16's §16.4, reusing the frozen tool verbatim and changing only the matrix and the narration.

The factorization $A = PDP^{-1}$ says that the single transformation $A$ is the composition of three simpler transformations, read right to left: first $P^{-1}$ (rotate-and-skew the plane into eigenbasis coordinates), then $D$ (a clean axis-aligned stretch), then $P$ (skew back). Let us watch each stage for the running example $A = \begin{bmatrix}2&1\\1&2\end{bmatrix}$, whose middle stage is the diagonal $D = \begin{bmatrix}3&0\\0&1\end{bmatrix}$:

# Using the visualizer from Chapter 1: A = P D P^-1 as three stages on the unit square.
import numpy as np
import matplotlib.pyplot as plt
from toolkit.visualizer import visualize_2d        # the FROZEN 2D visualizer (Ch.1)
A = np.array([[2., 1.], [1., 2.]])
P = np.array([[1., -1.], [1., 1.]])                # eigenvectors (1,1), (-1,1) as columns
D = np.array([[3., 0.], [0., 1.]])                 # eigenvalues 3, 1 on the diagonal
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
visualize_2d(np.linalg.inv(P), title="(1) P^-1: into eigenbasis coords", ax=axes[0])
visualize_2d(D,                title="(2) D: stretch axes by 3 and 1",   ax=axes[1])
visualize_2d(A,                title="(3) A = P D P^-1: the whole map",  ax=axes[2])
# plt.show()
print("det at each stage:", round(np.linalg.det(np.linalg.inv(P)), 3),
      round(np.linalg.det(D), 3), round(np.linalg.det(A), 3))  # 0.5 3.0 3.0

Figure 25.1 — Diagonalization as stretch-in-disguise. Three panels showing the unit square under (1) $P^{-1}$, which re-grids the plane into eigenbasis coordinates; (2) $D = \begin{bmatrix}3&0\\0&1\end{bmatrix}$, a pure axis-aligned stretch tripling one direction and fixing the other; and (3) the full $A$, identical in net effect to applying the three stages in sequence. Alt-text: three side-by-side transformation plots; the middle one is a clean 3-by-1 rectangle stretch, and the rightmost shows the composite parallelogram that A produces.

Geometric Intuition — The middle panel is the heart of diagonalization: in eigenbasis coordinates, $A$ is just the stretch $D$ — a $3\times 1$ rectangle, nothing tilted, no cross-terms. The outer panels are merely the round trip into and out of those special coordinates. The reason $A$ on the standard grid (panel 3) looks like a lopsided parallelogram is that we are watching a clean stretch through the wrong window. Diagonalization opens the right window. This is the same picture as Chapter 16's Figures 16.1–16.2 — the dense $\begin{bmatrix}2&1\\1&2\end{bmatrix}$ and the clean $\begin{bmatrix}3&0\\0&1\end{bmatrix}$ being two views of one transformation — now named: the clean view is the diagonalization, and the eigenvectors $(1,1), (-1,1)$ are the axes of the right window.

There is a quiet confirmation hidden in the determinants printed above: $\det(P^{-1}) \cdot \det(D) \cdot \det(P) = 0.5 \cdot 3 \cdot 2 = 3 = \det(A)$, with the $\det(P^{-1})$ and $\det(P)$ canceling to leave $\det(D) = 3$. The product of the eigenvalues is the determinant ($3 \times 1 = 3$), and it equals the area-scaling factor you can watch in any panel — a fact we first met as a similarity invariant in Chapter 16 and now read straight off the diagonal of $D$. Likewise the sum of the eigenvalues $3 + 1 = 4$ is the trace. The diagonal of $D$ literally displays the invariants: trace is the sum of the diagonal, determinant is the product, no longer hidden inside the entries of $A$.

Check Your Understanding — In Figure 25.1, the middle panel ($D$) stretches one axis by $3$ and leaves the other fixed. If instead $A$ had eigenvalues $0.5$ and $2$, what would the middle panel look like, and would $A$ shrink or expand area overall?

Answer

The middle panel would stretch one eigen-axis by $2$ (expanding it) and shrink the other by $0.5$ (compressing it to half), turning the unit square into a $2 \times 0.5$ rectangle. The net area scaling is the product of the eigenvalues, $\det = 2 \cdot 0.5 = 1$, so $A$ would preserve area exactly — even though it stretches one direction and squeezes another. A determinant of $1$ does not mean "no change"; it means the expansion in one eigen-direction is exactly compensated by compression in another. This is precisely the behavior of an area-preserving "squeeze" map.

25.3 When is a matrix diagonalizable?

Not every matrix can be diagonalized. The factorization $A = PDP^{-1}$ demands an invertible $P$, and $P$ is built from eigenvectors, so the question reduces to: does $A$ have enough independent eigenvectors to fill the columns of an invertible $n \times n$ matrix? The answer is the precise condition of the chapter, and we state it carefully because — per the style of this book — a special case must never be presented as the general rule.

Diagonalizability Theorem (the precise condition). An $n \times n$ matrix $A$ is diagonalizable if and only if it has $n$ linearly independent eigenvectors — equivalently, if and only if $\mathbb{R}^n$ (or $\mathbb{C}^n$, allowing complex eigenvalues) has a basis consisting of eigenvectors of $A$. When this holds, gather those eigenvectors as the columns of $P$ and their eigenvalues on the diagonal of $D$, and $A = PDP^{-1}$.

The "if and only if" is exactly the derivation of §25.2 run in both directions. If there are $n$ independent eigenvectors, then $P$ is invertible and $AP = PD$ gives the factorization. Conversely, if $A = PDP^{-1}$ for some invertible $P$ and diagonal $D$, then rearranging to $AP = PD$ shows each column of $P$ is an eigenvector with eigenvalue the matching diagonal entry of $D$; and $P$ invertible means those $n$ columns are independent. There is no third possibility. Diagonalizability is precisely the existence of an eigenbasis.

This is where Chapter 24's distinction between algebraic multiplicity (how many times $\lambda$ is a root of the characteristic polynomial) and geometric multiplicity (the dimension of the eigenspace $N(A - \lambda I)$, i.e. how many independent eigenvectors $\lambda$ actually supplies) becomes the load-bearing idea. We always have $1 \le \text{geometric} \le \text{algebraic}$ for each eigenvalue (Chapter 24). The total number of independent eigenvectors we can collect is the sum of the geometric multiplicities, and we have a full eigenbasis exactly when that sum reaches $n$.

The Key Insight — A matrix is diagonalizable if and only if, for every eigenvalue, the geometric multiplicity equals the algebraic multiplicity — that is, every eigenvalue supplies as many independent eigenvectors as its multiplicity as a root demands. When some eigenvalue is "short" — geometric multiplicity strictly less than algebraic — the eigenvectors do not add up to a basis, $P$ cannot be made invertible, and the matrix is defective: it cannot be diagonalized. The defect is a deficit of eigenvectors, and Chapter 36's Jordan normal form is the repair.

25.3.1 The sufficient condition: distinct eigenvalues

Checking multiplicities is real work, so it helps to have a quick test that often settles the question in one glance. Here it is, and it is the most useful sufficient condition in the subject.

Sufficient condition. If an $n \times n$ matrix has $n$ distinct eigenvalues, it is diagonalizable. (Distinct eigenvalues guarantee independent eigenvectors, which is all diagonalization needs.)

Why distinct eigenvalues force independence. Suppose $\lambda_1, \dots, \lambda_k$ are distinct eigenvalues with eigenvectors $\mathbf{v}_1, \dots, \mathbf{v}_k$, and suppose, for contradiction, that the eigenvectors were dependent. Take a dependence relation with the fewest nonzero terms, say $c_1\mathbf{v}_1 + \dots + c_m\mathbf{v}_m = \mathbf{0}$ with all $c_i \neq 0$ and $m$ minimal. Apply $A$ to both sides: since $A\mathbf{v}_i = \lambda_i\mathbf{v}_i$, we get $c_1\lambda_1\mathbf{v}_1 + \dots + c_m\lambda_m\mathbf{v}_m = \mathbf{0}$. Now subtract $\lambda_m$ times the original relation: the $\mathbf{v}_m$ term cancels, leaving $c_1(\lambda_1 - \lambda_m)\mathbf{v}_1 + \dots + c_{m-1}(\lambda_{m-1} - \lambda_m)\mathbf{v}_{m-1} = \mathbf{0}$. Because the eigenvalues are distinct, each factor $(\lambda_i - \lambda_m) \neq 0$, so this is a shorter nonzero dependence relation — contradicting minimality. Hence no dependence exists: eigenvectors for distinct eigenvalues are always independent. $\blacksquare$

So $n$ distinct eigenvalues immediately give $n$ independent eigenvectors and a guaranteed diagonalization. But note the logic carefully — this is sufficient, not necessary. A matrix can have repeated eigenvalues and still be perfectly diagonalizable, provided each repeated eigenvalue is generous enough to supply its full quota of eigenvectors. The cleanest example is the identity matrix $I$: its only eigenvalue is $1$, repeated $n$ times (algebraic multiplicity $n$), yet every nonzero vector is an eigenvector, so its eigenspace is all of $\mathbb{R}^n$ (geometric multiplicity $n$). The identity is already diagonal — gloriously diagonalizable — despite a wildly repeated eigenvalue. Repetition is not the enemy; a shortage of eigenvectors is.

Common Pitfall"A repeated eigenvalue means the matrix is not diagonalizable." False, and the identity matrix refutes it instantly. Repeated eigenvalues only risk non-diagonalizability; the matrix fails to diagonalize only if some repeated eigenvalue's geometric multiplicity falls short of its algebraic multiplicity. The matrix $2I = \begin{bmatrix}2&0\\0&2\end{bmatrix}$ has the doubly-repeated eigenvalue $2$ and is trivially diagonal; the shear $\begin{bmatrix}2&1\\0&2\end{bmatrix}$ has the same doubly-repeated eigenvalue $2$ but only a one-dimensional eigenspace, and is therefore defective. Same eigenvalues, opposite fates — the difference is the eigenvector count, never the eigenvalue count alone.

25.3.2 The defective case, seen concretely

Let us make the defect tangible, because the abstract phrase "geometric multiplicity less than algebraic" hides a vivid geometric story. Consider the shear $$A = \begin{bmatrix}2 & 1 \\ 0 & 2\end{bmatrix}.$$ Its characteristic polynomial is $\det(A - \lambda I) = (2-\lambda)^2 = 0$ (Chapter 24), so $\lambda = 2$ with algebraic multiplicity $2$. Now hunt for eigenvectors by solving $(A - 2I)\mathbf{v} = \mathbf{0}$: $$A - 2I = \begin{bmatrix}0 & 1 \\ 0 & 0\end{bmatrix}, \qquad \begin{bmatrix}0 & 1 \\ 0 & 0\end{bmatrix}\begin{bmatrix}v_1 \\ v_2\end{bmatrix} = \mathbf{0} \;\Longrightarrow\; v_2 = 0.$$ The only constraint is $v_2 = 0$, leaving a single free parameter $v_1$. The eigenspace is the line spanned by $(1, 0)$ — one dimension, geometric multiplicity $1$. So this matrix offers only one independent eigenvector, but a $2\times 2$ matrix needs two to fill an invertible $P$. There is no eigenbasis, no invertible $P$ built from eigenvectors, and therefore no diagonalization. The matrix is defective.

Warning

— The defective case is a genuine mathematical wall, not a computational nuisance you can work around. A defective matrix has strictly fewer independent eigenvectors than its dimension, so the columns of any eigenvector matrix $P$ are dependent, $P$ is singular, and $A = PDP^{-1}$ is impossible — there is no diagonal matrix similar to it. This is not floating-point trouble; it is a structural fact about the transformation. The remedy is to relax the goal from "diagonal" to "as close to diagonal as possible," which yields the Jordan normal form of Chapter 36: eigenvalues on the diagonal with a few $1$'s parked just above it, exactly accounting for the missing eigenvectors. For this chapter, simply learn to recognize the defect: a repeated eigenvalue whose eigenspace is too small.

Geometric Intuition — Why can't a shear be a clean set of independent stretches? Picture the shear $\begin{bmatrix}2&1\\0&2\end{bmatrix}$ acting on the plane. Along the horizontal line $(1,0)$ it merely scales by $2$ — that is its one eigen-direction. But every other direction gets tilted toward the horizontal as it is stretched: the shear genuinely rotates those vectors a little, so they are not scaled-without-rotating, so they are not eigenvectors. A pure diagonal transformation has $n$ directions it scales without rotating; a shear has only one and smears the rest. There is no coordinate system that turns this smearing into independent stretches, because the smearing is the transformation's true nature. Some matrices are simply not stretches in disguise — they are shears in disguise, and Chapter 36 gives them their own canonical form.

Math-Major Sidebar — Diagonalizability has a clean structural restatement: $A$ is diagonalizable if and only if $\mathbb{R}^n$ (or $\mathbb{C}^n$) is the direct sum of the eigenspaces, $\mathbb{R}^n = E_{\lambda_1} \oplus E_{\lambda_2} \oplus \cdots \oplus E_{\lambda_m}$, where $E_{\lambda} = N(A - \lambda I)$ is the eigenspace for eigenvalue $\lambda$. A direct sum means every vector decomposes uniquely as a sum of eigenvectors, one from each eigenspace — which is exactly an eigenbasis. Because eigenvectors for distinct eigenvalues are independent (§25.3.1), these eigenspaces always meet only at $\mathbf{0}$; the question is solely whether their dimensions add up to $n$, i.e. whether $\sum_i \dim E_{\lambda_i} = n$. Since $\dim E_{\lambda_i}$ is the geometric multiplicity and the algebraic multiplicities always sum to $n$ (the characteristic polynomial has degree $n$, counting complex roots — Chapter 24), the condition $\sum \dim E_{\lambda_i} = n$ is equivalent to "geometric $=$ algebraic for every eigenvalue," recovering the §25.3 criterion. There is also an elegant polynomial test, which Chapter 36 develops: $A$ is diagonalizable if and only if its minimal polynomial — the lowest-degree monic polynomial $m(x)$ with $m(A) = 0$ — factors into distinct linear factors, $m(x) = (x - \lambda_1)(x - \lambda_2)\cdots(x - \lambda_m)$ with no repeated roots. The defective shear fails this: its minimal polynomial is $(x-2)^2$, the squared factor, which is precisely the algebraic signature of the missing eigenvector.

25.4 How do you compute powers of a matrix with diagonalization?

Now the payoff, and it is the reason diagonalization earns a whole chapter. The single most common thing we want to do with a transformation is apply it repeatedly — and that is exactly the operation diagonalization makes nearly free.

Begin with the geometric picture, because it makes the algebra obvious. If $A$ is a stretch by $\lambda_i$ along each eigen-axis, then applying $A$ twice stretches each axis by $\lambda_i$ and then by $\lambda_i$ again — total $\lambda_i^2$. Applying it $k$ times stretches axis $i$ by $\lambda_i^k$. The eigen-axes never interact, so iterating the transformation just iterates each independent scaling. In the eigenbasis, $A^k$ is the diagonal matrix of $k$-th powers of the eigenvalues. The algebra confirms it in one beautiful collapse. Watch what happens when we square $A = PDP^{-1}$: $$A^2 = (PDP^{-1})(PDP^{-1}) = PD\,\underbrace{(P^{-1}P)}_{I}\,DP^{-1} = PD^2P^{-1}.$$ The inner $P^{-1}P$ annihilates to the identity, and we are left with $D^2$ sandwiched by $P$ and $P^{-1}$. The same telescoping happens for any power: every adjacent $P^{-1}P$ in the middle collapses, leaving only the outer $P \cdots P^{-1}$ and $D$ multiplied by itself $k$ times.

The power formula (the payoff of diagonalization). If $A = PDP^{-1}$, then for any positive integer $k$, $$\boxed{\;A^k = PD^kP^{-1}\;}$$ where $D^k$ is computed by raising each diagonal entry to the $k$-th power: $D^k = \operatorname{diag}(\lambda_1^k, \lambda_2^k, \dots, \lambda_n^k)$. Computing $D^k$ is $n$ scalar exponentiations — trivial. The cost of $A^k$ collapses from $k$ matrix multiplications to one eigendecomposition plus two matrix multiplications, regardless of how large $k$ is.

To feel the asymptotic win concretely: computing $A^{1000}$ by naive repeated multiplication is $999$ matrix products, each costing on the order of $n^3$ scalar multiplications — a thousandfold pile of work that grows linearly with the exponent. (Repeated squaring is cleverer, cutting it to about $\log_2 1000 \approx 10$ products, but still tied to the exponent.) Diagonalization severs that tie entirely: one eigendecomposition up front, then $A^{1000}$ costs the same as $A^2$ — raise $n$ diagonal numbers to the $1000$th power (a handful of scalar operations) and do two matrix multiplications to sandwich. The exponent $k$ has vanished from the cost. When you must evaluate a transformation at many different powers — say, to plot a population over a century of years — you diagonalize once and read off every power for free, which is exactly what makes long-run simulation tractable.

This is a genuine asymptotic win, but the deeper prize is insight. The power $A^k$ is dominated by the largest eigenvalue: if $|\lambda_1| > |\lambda_i|$ for all other $i$, then as $k$ grows, $\lambda_1^k$ dwarfs the rest, and the long-run behavior of $A^k$ is governed almost entirely by $\lambda_1$ and its eigenvector. That single observation explains the long-run fate of every linear dynamical system — whether a population settles, explodes, or oscillates; whether a Markov chain converges to a steady state; which web page PageRank ranks highest (Chapter 29). The eigenvalues are the system's growth rates, and diagonalization is the lens that isolates them.

Let us verify the formula on the running example $A = \begin{bmatrix}2&1\\1&2\end{bmatrix}$ from Chapter 16, whose eigenbasis we already know: eigenvectors $(1,1)$ and $(-1,1)$, eigenvalues $3$ and $1$. So $P = \begin{bmatrix}1&-1\\1&1\end{bmatrix}$, $D = \begin{bmatrix}3&0\\0&1\end{bmatrix}$, and $P^{-1} = \begin{bmatrix}0.5&0.5\\-0.5&0.5\end{bmatrix}$. To find $A^5$ we raise $D$ to the fifth — $D^5 = \begin{bmatrix}3^5&0\\0&1^5\end{bmatrix} = \begin{bmatrix}243&0\\0&1\end{bmatrix}$ — and sandwich: $$A^5 = PD^5P^{-1} = \begin{bmatrix}1&-1\\1&1\end{bmatrix}\begin{bmatrix}243&0\\0&1\end{bmatrix}\begin{bmatrix}0.5&0.5\\-0.5&0.5\end{bmatrix} = \begin{bmatrix}243&-1\\243&1\end{bmatrix}\begin{bmatrix}0.5&0.5\\-0.5&0.5\end{bmatrix} = \begin{bmatrix}122&121\\121&122\end{bmatrix}.$$ No fifth powers of the original matrix were taken — only $3^5$ and $1^5$, two scalar exponentiations. Confirm against direct multiplication with numpy:

# Powers via diagonalization: A^5 = P D^5 P^-1, checked against direct multiplication.
import numpy as np
A = np.array([[2., 1.], [1., 2.]])
P = np.array([[1., -1.], [1., 1.]])     # columns are eigenvectors (1,1), (-1,1)
D = np.array([[3., 0.], [0., 1.]])      # eigenvalues 3 and 1 on the diagonal
Pinv = np.linalg.inv(P)

A5_diag = P @ np.diag(np.diag(D)**5) @ Pinv      # P D^5 P^-1
print("A^5 via PD^5P^-1 =\n", A5_diag)            # [[122. 121.] [121. 122.]]
print("matrix_power(A,5) =\n", np.linalg.matrix_power(A, 5))   # [[122 121] [121 122]]
print("match:", np.allclose(A5_diag, np.linalg.matrix_power(A, 5)))  # True

The output is A^5 = [[122. 121.] [121. 122.]] from both routes, and match: True. The hand computation, the diagonalization, and np.linalg.matrix_power all agree.

Computational Notenumpy's np.linalg.eig(A) returns the eigenvalues in a 1-D array w and the eigenvectors as the columns of a 2-D array V — already arranged as your $P$. But beware three things. First, the eigenvalues come back in no guaranteed order, and V's columns are ordered to match w, so never reorder one without the other. Second, numpy normalizes each eigenvector to unit length, so its $P$ will differ from a hand-chosen $P$ by per-column scale factors — which is fine, because any nonzero scaling of an eigenvector is still an eigenvector, and the factorization $PDP^{-1}$ is invariant to it (the scales cancel between $P$ and $P^{-1}$). Third, for a real matrix with complex eigenvalues, w and V come back complex — the subject of Chapter 26. Remember that math indexes eigenvalues from $1$ ($\lambda_1$) while w[0] is the first in numpy.

Check Your Understanding — Using the same diagonalization of $A = \begin{bmatrix}2&1\\1&2\end{bmatrix}$, what is $A^{10}$, and which eigenvalue dominates it? (You do not need to multiply matrices — reason from $D^{10}$.)

Answer

$D^{10} = \operatorname{diag}(3^{10}, 1^{10}) = \operatorname{diag}(59049, 1)$, so $A^{10} = PD^{10}P^{-1}$ is dominated overwhelmingly by the eigenvalue $3$: the $1^{10}=1$ contribution is utterly negligible beside $3^{10}=59049$. Concretely $A^{10} = \begin{bmatrix}29525 & 29524 \\ 29524 & 29525\end{bmatrix}$ — every entry is about $\tfrac{1}{2}\cdot 3^{10}$, because for large $k$ the matrix $A^k \approx 3^k \cdot (\text{the projection onto the } \lambda=3 \text{ eigenvector})$. The largest eigenvalue sets the growth rate; the eigenvector it belongs to sets the direction everything is pulled toward. This is the seed of power iteration and PageRank (Chapter 29).

25.5 A worked example: closed-form Fibonacci via diagonalization

Few results show off diagonalization as beautifully as a closed-form formula for the Fibonacci numbers. The Fibonacci sequence $0, 1, 1, 2, 3, 5, 8, 13, \dots$ is defined by the recurrence $F_{n+1} = F_n + F_{n-1}$ — each term is the sum of the two before. A recurrence couples each step to the previous ones, which is exactly the kind of coupling diagonalization is built to decouple. The trick, used constantly in computer science and applied math, is to package the recurrence as a single matrix step.

Stack two consecutive Fibonacci numbers into a state vector $\mathbf{x}_n = \begin{bmatrix}F_{n+1}\\F_n\end{bmatrix}$. One step of the recurrence is one multiplication by a fixed matrix: $$\begin{bmatrix}F_{n+1}\\F_n\end{bmatrix} = \begin{bmatrix}1 & 1 \\ 1 & 0\end{bmatrix}\begin{bmatrix}F_n\\F_{n-1}\end{bmatrix}, \qquad A = \begin{bmatrix}1 & 1 \\ 1 & 0\end{bmatrix}.$$ Check the top row: $1\cdot F_n + 1\cdot F_{n-1} = F_{n+1}$ ✓; the bottom row $1\cdot F_n + 0\cdot F_{n-1} = F_n$ just shifts the window forward. (A matrix of this shape, built to turn a scalar recurrence into a one-step vector map, is called a companion matrix.) Starting from $\mathbf{x}_1 = \begin{bmatrix}F_2\\F_1\end{bmatrix} = \begin{bmatrix}1\\1\end{bmatrix}$, the $n$-th state is $\mathbf{x}_n = A^{n-1}\mathbf{x}_1$. The Fibonacci numbers are powers of a matrix — and we now know powers are easy in the eigenbasis.

Diagonalize $A$. The characteristic polynomial (Chapter 24) is $\det(A - \lambda I) = \lambda^2 - \lambda - 1 = 0$, whose roots are the famous golden ratio and its conjugate: $$\lambda_1 = \varphi = \frac{1 + \sqrt5}{2} \approx 1.618034, \qquad \lambda_2 = \psi = \frac{1 - \sqrt5}{2} \approx -0.618034.$$ Two distinct eigenvalues, so by §25.3.1 the matrix is diagonalizable. The eigenvector for $\lambda$ solves $(A - \lambda I)\mathbf{v} = \mathbf{0}$; a short computation gives $\mathbf{v} = (\lambda, 1)$, so the eigenvectors are $(\varphi, 1)$ and $(\psi, 1)$. Now form $A^{n} = PD^{n}P^{-1}$ and read off the bottom-left entry, which is $F_n$. The independent stretches by $\varphi^n$ and $\psi^n$ combine, after the sandwich, into a single clean formula.

The Key Insight (Binet's formula). Diagonalization turns the coupled Fibonacci recurrence into two independent geometric sequences — one growing as $\varphi^n$, one shrinking as $\psi^n$ — and recombines them into the closed form $$F_n = \frac{\varphi^n - \psi^n}{\sqrt5}, \qquad \varphi = \frac{1+\sqrt5}{2},\;\; \psi = \frac{1-\sqrt5}{2}.$$ The $n$-th Fibonacci number, an integer, equals a difference of irrational powers divided by $\sqrt5$. Because $|\psi| < 1$, the $\psi^n$ term vanishes rapidly, so $F_n \approx \varphi^n/\sqrt5$ — the Fibonacci numbers grow geometrically at the golden ratio, and the ratio of consecutive terms $F_{n+1}/F_n \to \varphi$. This is why $\varphi$, the dominant eigenvalue, governs the sequence's growth: it is the largest stretch factor of $A$.

Let us verify both the closed form and the matrix-power picture against numpy, with matching numbers:

# Fibonacci by diagonalization: A^k powers and Binet's closed form agree.
import numpy as np
A = np.array([[1., 1.], [1., 0.]])
w, V = np.linalg.eig(A)
print("eigenvalues (phi, psi):", np.sort(w)[::-1])    # [ 1.618034 -0.618034]

A10 = np.linalg.matrix_power(A, 10)
print("A^10 =\n", A10.astype(int))                    # [[89 55] [55 34]]  -> F11=89, F10=55

phi, psi = (1 + 5**0.5) / 2, (1 - 5**0.5) / 2
F10 = (phi**10 - psi**10) / 5**0.5                    # Binet's formula
print("Binet F_10 =", round(F10, 6))                  # 55.0
print("ratio F11/F10 =", 89/55, " vs phi =", round(phi, 6))  # 1.6181... ~ 1.618034

The output reads eigenvalues (phi, psi): [1.618034 -0.618034], then A^10 = [[89 55] [55 34]] — so $F_{11} = 89$ and $F_{10} = 55$ sit in the matrix exactly as the state-vector picture predicts — and Binet F_10 = 55.0, with the consecutive ratio $89/55 \approx 1.6181$ already hugging the golden ratio $\varphi \approx 1.618034$. The integer Fibonacci numbers fall out of irrational eigenvalue powers, and the dominant eigenvalue $\varphi$ is visibly the growth rate. Diagonalization did not just compute Fibonacci faster; it explained it.

Real-World Application — The same companion-matrix-plus-diagonalization recipe solves any linear recurrence with constant coefficients, which is the bread and butter of algorithm analysis in computer science. When you solve a recurrence like $T(n) = 3T(n-1) - 2T(n-2)$ to find an algorithm's running time, you are diagonalizing its companion matrix: the eigenvalues are the bases of the exponential terms in the closed form, and the largest eigenvalue is the asymptotic growth rate $\Theta(\lambda_{\max}^n)$. Case Study 2 works exactly this kind of recurrence end to end. The dominant eigenvalue of the recurrence is its big-O.

25.6 A second worked example: the long-run behavior of a Markov chain

The Fibonacci example grew without bound; the next one settles, and the contrast is instructive. Markov chains — random processes that hop between states with fixed probabilities — are everywhere: weather models, web surfers (PageRank, Chapter 29), customer churn, board-game positions. Their entire long-run story is an eigenvalue story, and diagonalization tells it cleanly.

Take a two-state weather model. Each day is Sunny or Rainy. If today is sunny, tomorrow is sunny with probability $0.9$ and rainy with probability $0.1$; if today is rainy, tomorrow is sunny with probability $0.5$ and rainy with probability $0.5$. Encode the state as a probability vector $\mathbf{x} = \begin{bmatrix}\text{P(sunny)}\\\text{P(rainy)}\end{bmatrix}$, and one day's evolution is multiplication by the transition matrix $$P_{\text{trans}} = \begin{bmatrix}0.9 & 0.5 \\ 0.1 & 0.5\end{bmatrix}.$$ (We call it $P_{\text{trans}}$ to avoid colliding with the eigenvector matrix $P$.) Each column sums to $1$ — every starting state must go somewhere tomorrow — which makes this a column-stochastic matrix. After $k$ days, the distribution is $P_{\text{trans}}^k\,\mathbf{x}_0$, again a matrix power. We want the limit as $k \to \infty$, and diagonalization delivers it.

Diagonalize. The characteristic polynomial gives eigenvalues $\lambda_1 = 1$ and $\lambda_2 = 0.4$ (two distinct values, so diagonalizable). The eigenvalue $1$ is no accident: every column-stochastic matrix has $1$ as an eigenvalue, because the all-ones row vector is a left eigenvector (the columns sum to $1$), and that forces $1$ into the spectrum [verify]. The eigenvector for $\lambda = 1$, normalized so its entries sum to $1$, is the steady state $\mathbf{x}_\infty = \begin{bmatrix}5/6 \\ 1/6\end{bmatrix} \approx \begin{bmatrix}0.8333\\0.1667\end{bmatrix}$. Now watch the power formula explain the convergence. In the eigenbasis, $$P_{\text{trans}}^k = P\begin{bmatrix}1^k & 0 \\ 0 & 0.4^k\end{bmatrix}P^{-1} = P\begin{bmatrix}1 & 0 \\ 0 & 0.4^k\end{bmatrix}P^{-1}.$$ The first eigenvalue contributes $1^k = 1$ forever; the second contributes $0.4^k$, which races to $0$. So as $k \to \infty$, the second eigen-direction dies and only the steady-state direction survives. No matter where you start, the distribution converges to $\mathbf{x}_\infty$.

Geometric Intuition — A Markov chain's long-run behavior is the eigenvalue $1$ winning a war of attrition. The transition matrix stretches the steady-state eigen-direction by $1$ (it is preserved exactly) and shrinks every other eigen-direction by a factor less than $1$ in magnitude. Iterate, and the shrinking directions vanish while the preserved direction stands; the state is funneled, step by step, onto the line of the dominant eigenvector. The second-largest eigenvalue (here $0.4$) controls the speed of convergence — the smaller it is, the faster the chain forgets where it started. The steady state is what the chain converges to; the spectral gap $1 - |\lambda_2|$ is how fast.

Verify the eigenvalues, the steady state, and the convergence numerically:

# Markov chain steady state and convergence, via the dominant eigenvector.
import numpy as np
np.set_printoptions(suppress=True, precision=6)
Pt = np.array([[0.9, 0.5],
               [0.1, 0.5]])               # column-stochastic transition matrix
w, V = np.linalg.eig(Pt)
print("eigenvalues:", np.sort(w)[::-1])   # [1.  0.4]

i = np.argmin(np.abs(w - 1))              # the eigenvalue equal to 1
ss = np.real(V[:, i]); ss = ss / ss.sum()  # normalize so entries sum to 1
print("steady state:", ss)                # [0.833333 0.166667]  = [5/6, 1/6]

x0 = np.array([1., 0.])                    # start certain it is Sunny
for k in (1, 2, 5, 20):
    print(f"after {k:>2} days:", np.linalg.matrix_power(Pt, k) @ x0)

The output is eigenvalues: [1. 0.4], steady state: [0.833333 0.166667], then the distribution marching toward it: after 1 days: [0.9 0.1], after 2 days: [0.86 0.14], after 5 days: [0.83504 0.16496], and after 20 days: [0.833333 0.166667] — fully converged to $[5/6, 1/6]$. The chain forgets its sunny start and settles into "sunny five-sixths of the time." That $5/6$ is an eigenvector entry, and the convergence is the geometric decay of $0.4^k$. This exact mechanism, scaled to a matrix the size of the web, is how Google's PageRank finds the importance of every page (Chapter 29) — a connection we will make precise there, and one that Case Study 1 develops in an economic setting.

25.6.1 Reading long-run behavior straight off the eigenvalues

The two worked examples — Fibonacci exploding at rate $\varphi$, the weather chain settling onto a steady state — are two faces of a single principle that diagonalization makes obvious. Write any initial vector $\mathbf{x}_0$ in the eigenbasis as a combination of eigenvectors, $\mathbf{x}_0 = c_1\mathbf{v}_1 + c_2\mathbf{v}_2 + \dots + c_n\mathbf{v}_n$. Because each eigenvector is merely scaled when $A$ acts, applying $A$ a total of $k$ times scales each piece by its eigenvalue to the $k$-th power: $$A^k\mathbf{x}_0 = c_1\lambda_1^k\mathbf{v}_1 + c_2\lambda_2^k\mathbf{v}_2 + \dots + c_n\lambda_n^k\mathbf{v}_n.$$ This formula — the eigenbasis-expansion of a matrix power — is arguably the most useful single equation in applied linear algebra, because it decomposes the future of a linear system into independent modes. Each eigenvector $\mathbf{v}_i$ is a mode of the system; each eigenvalue $\lambda_i$ is that mode's per-step growth factor. The whole trajectory is a superposition of modes growing or shrinking at their own rates, never interacting — the decoupling promised by the eigenbasis, written out in time.

The Key Insight — Order the eigenvalues by magnitude, $|\lambda_1| \ge |\lambda_2| \ge \dots$. As $k$ grows, the term with the largest $|\lambda_i|$ dominates the sum, because $\lambda_1^k$ outpaces every other power. So for large $k$, $A^k\mathbf{x}_0 \approx c_1\lambda_1^k\mathbf{v}_1$ — the system's long-run state is the dominant eigenvector $\mathbf{v}_1$, growing at rate $\lambda_1$, regardless of the starting point (as long as $c_1 \neq 0$). The magnitude of $\lambda_1$ decides the system's fate: $|\lambda_1| > 1$ means unbounded growth, $|\lambda_1| < 1$ means decay to zero, $|\lambda_1| = 1$ means a finite steady state. This is the entire theory of long-run behavior for linear systems, and it falls out of diagonalization in one line.

The three regimes are worth fixing concretely, because they classify every linear dynamical system you will ever meet:

  • $|\lambda_1| > 1$ (growth). Fibonacci is this case: $\lambda_1 = \varphi \approx 1.618 > 1$, so $F_n$ grows geometrically without bound, and the ratio of consecutive terms tends to $\varphi$. Population models, compound interest, and unstable feedback loops live here. The dominant eigenvector is the stable age distribution a population converges to even as its total size explodes.
  • $|\lambda_1| = 1$ (steady state). Markov chains are this case: the dominant eigenvalue is exactly $1$, so the dominant mode neither grows nor shrinks — it is the steady-state distribution — while every other mode (here $0.4^k$) decays away. The system settles.
  • $|\lambda_1| < 1$ (decay). A damped system, a contraction, a process that loses energy each step. Every mode shrinks, so $A^k\mathbf{x}_0 \to \mathbf{0}$: the system relaxes to the origin. The slowest decay (largest $|\lambda_i| < 1$) sets how long relaxation takes.

Real-World Application — In population ecology and economics, the Leslie matrix models an age-structured population: entry by entry it encodes how many offspring each age class produces and what fraction survives to the next age class. Iterating it projects the population forward year by year — another matrix power. Its dominant eigenvalue $\lambda_1$ is the population's long-run growth rate (the number ecologists call $\lambda$, with $\lambda_1 > 1$ meaning the population grows, $< 1$ meaning it declines toward extinction), and its dominant eigenvector is the stable age distribution — the fixed proportions of young, middle-aged, and old individuals the population approaches no matter its initial makeup. Demographers use exactly this to project national populations; economists use the same eigen-analysis on input–output models, where the dominant eigenvalue governs whether an economy's sectors amplify or dampen a shock. Case Study 1 works a Markov version of this in detail for a labor market.

Notice how this reframes the two worked examples as one. Fibonacci grows because its dominant eigenvalue $\varphi$ exceeds $1$; the weather chain settles because its dominant eigenvalue equals exactly $1$. The Markov steady state we computed is nothing but the dominant-eigenvector limit with $\lambda_1 = 1$ holding it fixed while the subordinate mode $0.4^k$ evaporates. Both stories — and PageRank's, and a population's, and a damped circuit's — are the same eigenbasis expansion read at large $k$. Diagonalization does not merely speed up matrix powers; it tells you, in advance and without any computation, what a repeated linear process will eventually do.

25.7 What is a function of a matrix? (a preview)

Once you see $A^k = PD^kP^{-1}$, an irresistible generalization suggests itself. We applied the function "raise to the $k$" to each eigenvalue and sandwiched the result. Why stop at powers? Any function we can apply to the eigenvalues, we can apply to the matrix the same way.

Functions of a matrix (preview). If $A = PDP^{-1}$ is diagonalizable and $f$ is a function defined on its eigenvalues, define $$f(A) = P\,f(D)\,P^{-1}, \qquad f(D) = \operatorname{diag}\big(f(\lambda_1), \dots, f(\lambda_n)\big).$$ The function acts independently on each eigenvalue, exactly because the eigenbasis decouples $A$ into independent one-dimensional pieces — and on a single number, "apply $f$" is unambiguous.

This is not a formal trick; it is forced by the same telescoping that gave the power formula. Any function expressible as a power series $f(x) = c_0 + c_1 x + c_2 x^2 + \cdots$ has $f(A) = c_0 I + c_1 A + c_2 A^2 + \cdots$, and substituting $A^k = PD^kP^{-1}$ into every term and factoring out the common $P$ and $P^{-1}$ leaves exactly $P\,f(D)\,P^{-1}$. Two cases matter enormously downstream:

  • Matrix square root. $\sqrt{A} = P\sqrt{D}\,P^{-1}$, taking $\sqrt{\lambda_i}$ on the diagonal (when the eigenvalues permit). For a positive definite matrix (Chapter 28) all eigenvalues are positive, so a real symmetric square root exists — the foundation of "whitening" data and of defining the geometry of covariance.
  • Matrix exponential. $e^{A} = Pe^{D}P^{-1}$, taking $e^{\lambda_i}$ on the diagonal. This is the single most important function of a matrix in all of applied mathematics, because $e^{At}$ solves the continuous linear system $\mathbf{x}'(t) = A\mathbf{x}(t)$ — the differential-equation analogue of the discrete powers we computed here. Chapter 37 builds the matrix exponential and the theory of linear ODEs on exactly this foundation: in the eigenbasis, a coupled system of differential equations decouples into independent scalar exponentials $e^{\lambda_i t}$, and the eigenvalues become growth/decay rates and (for complex eigenvalues, Chapter 26) oscillation frequencies.

Real-World Application — In continuous dynamical systems — chemical kinetics, electrical circuits, population dynamics, control systems — the state evolves as $\mathbf{x}'(t) = A\mathbf{x}(t)$, and the solution is $\mathbf{x}(t) = e^{At}\mathbf{x}(0)$. Diagonalizing $A$ turns this coupled system into $n$ uncoupled equations $y_i' = \lambda_i y_i$, each solved by a pure exponential $y_i(t) = y_i(0)e^{\lambda_i t}$. The sign of each eigenvalue's real part decides stability: negative real parts mean the mode decays (the system returns to equilibrium), positive means it blows up, imaginary parts mean it oscillates. An engineer reads a system's entire qualitative fate — stable, unstable, oscillatory — directly from the eigenvalues of $A$. The decoupling-into-independent-stretches anchor of this chapter is the decoupling-into-independent-exponentials of Chapter 37; powers and exponentials are the discrete and continuous faces of the same eigenbasis idea.

The pattern to internalize is that every operation on a matrix becomes the same operation done coordinate-by-coordinate on its eigenvalues, once you pass to the eigenbasis. Powers $A^k$ became $\lambda_i^k$; the exponential $e^A$ becomes $e^{\lambda_i}$; the inverse $A^{-1}$ becomes $1/\lambda_i$ (which immediately reproves that $A$ is invertible exactly when no eigenvalue is zero — Chapter 24); a polynomial $p(A)$ becomes $p(\lambda_i)$. This is the unifying reason eigenvalues are called the spectrum of the matrix, by analogy with the frequencies a physical system resonates at: just as decomposing light into its spectrum lets you understand how a prism acts on each color independently, decomposing a matrix into its eigenbasis lets you understand how the transformation acts on each mode independently. The function $f$ never sees the coupling of $A$; it sees only the clean list of eigenvalues, acts on each, and the eigenbasis reassembles the result. Diagonalization is what makes "a function of a matrix" a sensible idea at all.

25.8 How do you build diagonalization from scratch?

By hand, diagonalizing a matrix is the assembly of two pieces you already own from Part V: find the eigenvalues (Chapter 24, roots of the characteristic polynomial) and the eigenvectors (Chapter 23, null space of $A - \lambda I$), then stack the eigenvectors as the columns of $P$ and the eigenvalues on the diagonal of $D$. The factorization $A = PDP^{-1}$ is then just one matrix inversion and two multiplications — operations the toolkit has known since Chapters 8 and 9. It is time to package diagonalization and fast matrix powers into reusable code, in pure Python, and verify against numpy exactly as we verified every hand computation above.

Build Your Toolkit — Add two functions to toolkit/eigen.py (the module that already holds power_iteration from Chapter 23 and will gain qr_algorithm in Chapter 29):

  • diagonalize(A) — given a square matrix A, return the pair (P, D) where P's columns are the eigenvectors and D is the diagonal matrix of eigenvalues, so that A == P @ D @ inverse(P). Use your from-scratch eigenvalue routine (Chapter 24's characteristic-polynomial roots for small matrices, or the qr_algorithm of Chapter 29 for larger ones) to get the eigenvalues, then solve $(A - \lambda I)\mathbf{x} = \mathbf{0}$ with the toolkit's row_reduce (Chapter 4) to get each eigenvector. Raise an exception when A is defective — when the eigenvectors collected do not number $n$ independent vectors, there is no diagonalization, and your code must say so rather than return a singular P. (Test the count: the assembled P must be invertible.)
  • matrix_power_via_diag(A, k) — return $A^k$ by calling diagonalize(A) to get (P, D), raising the diagonal entries of D to the k-th power, and returning P @ D_to_the_k @ inverse(P). This must work for k in the hundreds with no more cost than k = 2, because the only k-dependence is n scalar exponentiations.

Verify (numpy only for checking, never inside the implementation): for $A = \begin{bmatrix}2&1\\1&2\end{bmatrix}$, confirm np.allclose(P @ D @ inverse(P), A) and that D holds the eigenvalues $\{3, 1\}$; confirm matrix_power_via_diag(A, 5) matches np.linalg.matrix_power(A, 5) (the $\begin{bmatrix}122&121\\121&122\end{bmatrix}$ we computed by hand); and confirm your diagonalize raises on the defective shear $\begin{bmatrix}2&1\\0&2\end{bmatrix}$. The exception on the defective case is the most important test — it is the difference between a correct implementation and one that silently lies.

A sketch of the verification harness, to show how cleanly the pieces fit (the full module is assembled separately; implement it yourself before peeking):

# Sketch: diagonalize then verify A = P D P^-1 and fast powers. numpy used ONLY to check.
import numpy as np
A = np.array([[4., 1.], [2., 3.]])            # distinct eigenvalues 5, 2 -> diagonalizable
P, D = np.linalg.eig(A)[1], np.diag(np.linalg.eig(A)[0])   # stand-in for your diagonalize
print("A = P D P^-1 ?", np.allclose(P @ D @ np.linalg.inv(P), A))         # True
k = 3
fast = P @ np.diag(np.diag(D)**k) @ np.linalg.inv(P)        # P D^k P^-1
print("A^3 matches matrix_power:", np.allclose(fast, np.linalg.matrix_power(A, k)))  # True

The implementation never imports numpy; numpy appears only in the tests, to confirm the reconstruction $PDP^{-1}$ and the fast power. That is the toolkit's contract throughout the book (Chapter 4): build it from scratch, check it against the library. This contribution will be reused implicitly whenever a later chapter needs powers, exponentials, or long-run behavior — most directly in the PageRank power iteration of Chapter 29 and the matrix exponential of Chapter 37.

Historical Note — The diagonalization of a matrix grew out of the classification of quadratic forms in the 18th and 19th centuries — the problem of finding principal axes that eliminate the cross-terms of a quadric surface, which Euler and Lagrange studied for the motion of rigid bodies and Cauchy advanced in the 1820s [verify]. Augustin-Louis Cauchy proved that a real symmetric matrix has real eigenvalues and can be diagonalized by an orthogonal change of axes — the principal-axis theorem that becomes the spectral theorem of Chapter 27 [verify]. The word "eigen-" (German for "own" or "characteristic") and the systematic matrix formulation came later, with David Hilbert's work on integral equations around 1904 popularizing Eigenwert [verify]. The closed form for Fibonacci numbers (§25.5) is named for Jacques Philippe Marie Binet, who published it in 1843, though it was known earlier to Abraham de Moivre and Daniel Bernoulli [verify].

25.9 What should you carry forward from this chapter?

We learned to diagonalize a matrix — to factor $A = PDP^{-1}$ by stacking its eigenvectors as the columns of $P$ and its eigenvalues on the diagonal of $D$ — and we saw that this is nothing more than changing to the eigenbasis, the coordinate system in which the transformation reveals itself as a set of independent stretches. The factorization works if and only if $A$ has $n$ linearly independent eigenvectors (an eigenbasis exists); having $n$ distinct eigenvalues guarantees this, but is not required, while a defective matrix — one whose geometric multiplicity falls short of its algebraic multiplicity for some eigenvalue — fails it and must wait for Jordan form (Chapter 36). The payoff is the power formula $A^k = PD^kP^{-1}$: matrix powers, which encode the iteration of any linear process, become $n$ scalar exponentiations on the diagonal, and the largest eigenvalue dictates the long-run behavior. We cashed this in twice — a closed-form Binet formula for Fibonacci, where the golden-ratio eigenvalue is the growth rate, and the steady state of a Markov chain, where the eigenvalue $1$ is the survivor and the second eigenvalue sets the convergence speed. Finally we previewed functions of a matrix, $f(A) = Pf(D)P^{-1}$, the door to the matrix exponential and continuous dynamics.

The single image to keep is the eigenbasis as the matrix's native coordinate system. In the standard basis a transformation can look like a dense, coupled mess; switch to its eigenbasis and the mess dissolves into independent one-dimensional scalings, recorded as the diagonal $D$. This completes the arc opened in Chapter 16, where we re-gridded $\begin{bmatrix}2&1\\1&2\end{bmatrix}$ into $\begin{bmatrix}3&0\\0&1\end{bmatrix}$ and called it a preview — it was a preview of this. And it advances two of the book's recurring themes at once: eigenvalues and eigenvectors reveal what a matrix really does (the diagonal $D$ is the transformation's essential action, stripped of coordinate artifacts), and geometry and algebra are two views of one object (the algebraic factorization $A = PDP^{-1}$ is the geometric act of viewing the transformation along the directions it merely stretches).

This sets up the next summit directly. We have learned when a matrix can be diagonalized and what it buys us — but the diagonalizing basis $P$ was, in general, a skewed (oblique) basis, so $P^{-1}$ was a genuine inverse to compute. Chapter 27, the Spectral Theorem, reveals the spectacular special case: when $A$ is symmetric (real) or Hermitian (complex), its eigenvectors are not merely independent but mutually orthogonal, so the diagonalizing $P$ can be taken orthogonal ($P^{-1} = P^{\mathsf{T}}$, a free transpose instead of an inversion), and the change of basis is a pure rotation. That is the cleanest diagonalization possible, it underlies Principal Component Analysis and the geometry of covariance, and it is the mathematical bedrock of changing representations in quantum mechanics, where Hermitian operators represent observable quantities and their orthogonal eigenbases are the measurement outcomes. Before that, Chapter 26 confronts the matrices with no real eigenbasis — rotations — and shows their complex eigenvalues are how the algebra encodes spinning. Diagonalization is the key; the next chapters reveal how beautifully it turns for the most important matrices in science.