48 min read

> Learning paths. Math majors — read everything, especially the structure theorem in §36.5, the motivated proof sketch of existence in §36.6, and the Math-Major Sidebar on the generalized eigenspace decomposition; this chapter is the honest...

Prerequisites

  • chapter-25-diagonalization
  • chapter-24-characteristic-polynomial

Learning Objectives

  • Explain why a matrix is defective — geometric multiplicity strictly less than algebraic multiplicity means too few eigenvectors to form a basis — and connect this directly to the failure of diagonalization in Chapter 25.
  • Define a generalized eigenvector and build a Jordan chain by solving (A - λI)w = v, then assemble the chain into the columns of P.
  • Recognize a Jordan block as an eigenvalue on the diagonal with 1's on the superdiagonal, and read its geometric action as a shear in the eigen-direction.
  • State the Jordan canonical form theorem with its conditions — every square matrix over the complex numbers is similar to a Jordan form, unique up to the order of its blocks — and compute A = PJP^{-1} for a defective 2×2 and 3×3 matrix.
  • Explain why the Jordan form makes powers and the matrix exponential of defective matrices computable, and why it is numerically fragile and rarely computed on a real machine.

Jordan Normal Form: When a Matrix Can't Be Diagonalized

Learning paths. Math majors — read everything, especially the structure theorem in §36.5, the motivated proof sketch of existence in §36.6, and the Math-Major Sidebar on the generalized eigenspace decomposition; this chapter is the honest completion of the eigenvalue story begun in Part V. CS / Data Science — focus on the Geometric Intuition callouts, the full worked example in §36.4, the numpy/sympy verification, and the "why it matters for powers and expm" discussion in §36.7; the existence proof is optional. Physics / Engineering — focus on the shear picture in §36.3, the critically-damped and transient-decay examples, and the §36.7 bridge to the matrix exponential, which Chapter 37 turns into the solution of every linear system of ODEs. This chapter assumes diagonalization $A=PDP^{-1}$ from Chapter 25 and the algebraic-versus-geometric multiplicity distinction from Chapter 24 — it is, in a sense, the chapter Chapter 24 promised when it first flagged the defective case.

Chapter 25 ended on a quiet caveat. We learned to diagonalize: to write $A = PDP^{-1}$, decoding a matrix into a clean diagonal $D$ of eigenvalues in a coordinate system $P$ built from eigenvectors, so that the tangled action of $A$ became, in the right basis, nothing but independent stretching along independent axes. It is one of the most powerful ideas in the book. But it came with a condition we could not wish away: it works only when there are enough eigenvectors — enough, that is, to form a basis. Chapter 24 had already named the matrices that fail this test. They are the defective matrices: those for which some eigenvalue's geometric multiplicity (the number of independent eigenvectors it owns) falls strictly short of its algebraic multiplicity (how many times it appears as a root of the characteristic polynomial). For such a matrix, $P$ cannot be assembled, and diagonalization simply stops.

This chapter is about what to do then. The wrong response — the one a beginner reaches for — is to treat a defective matrix as broken, a degenerate accident to be perturbed away. The right response, and the beautiful one, is to ask what the matrix is doing that diagonalization could not capture, and to build exactly the right tool to record it. That tool is the Jordan normal form (equivalently, the Jordan canonical form), and the headline result of this chapter is as clean as it is profound: every square matrix over the complex numbers is similar to a Jordan form — an almost-diagonal matrix that is as close to diagonal as the matrix will allow. Diagonalizable matrices turn out to be merely the lucky special case. The defective ones are not exceptions to the theory; they are the general theory, finally told in full.

We will earn this in the book's usual way: the picture first. Before any formula, we will look at the simplest matrix that cannot be diagonalized — a $2\times 2$ shear — and watch precisely what it does to space that a single eigenvector fails to describe. That leftover motion has a name, generalized eigenvectors, and an organizing structure, the Jordan chain; the chains assemble into Jordan blocks, and the blocks stack into the Jordan form. By the end you will be able to take a defective matrix, find its generalized eigenvectors by solving one extra linear system, build the change-of-basis $P$ by hand, and verify against numpy and sympy that $P^{-1}AP$ really is the promised Jordan form. And you will see why this matters: the Jordan form is what lets us compute powers $A^k$ and the matrix exponential $e^{At}$ of a defective matrix — the very thing Chapter 37 needs to solve a system of differential equations whose matrix has a repeated root.

36.1 Why can't some matrices be diagonalized?

Start with the picture, and start with the one matrix every linear algebra student should be able to summon on demand: the shear

$$ A = \begin{bmatrix} 2 & 1 \\ 0 & 2 \end{bmatrix}. $$

Geometrically — using the visualizer from Chapter 1, with this $A$ — this map scales everything by $2$ and then slides the plane horizontally by an amount proportional to height. The bottom edge of the unit square (height zero) does not move sideways at all; the top edge (height one) slides one unit to the right. The square becomes a leaning parallelogram. There is exactly one direction this map leaves pointing the same way: the horizontal axis, $\mathbf{e}_1 = (1,0)$. Vectors along it get stretched by $2$ and not turned. Every other direction gets tilted by the shear.

Geometric Intuition — A shear is the geometric shape of a defective eigenvalue. Picture the plane sliding past itself: one line of vectors (the eigen-direction) is preserved, scaled but never turned, while everything off that line gets tilted toward it. There is room in $\mathbb{R}^2$ for two invariant directions, but the shear supplies only one. The second direction the matrix "should" have is consumed by the sliding motion itself — there is no honest eigenvector pointing along the slide, because the slide does not preserve any direction except the one we already found. The missing eigenvector is missing because the matrix spent it on a shear.

Now do the algebra and watch it confirm the picture. The characteristic polynomial (Chapter 24) is $\det(A - \lambda I) = (2-\lambda)^2$, so $\lambda = 2$ is a root twice: its algebraic multiplicity is $2$. How many independent eigenvectors does it have? Eigenvectors for $\lambda = 2$ are the nonzero solutions of $(A - 2I)\mathbf{v} = \mathbf{0}$, and

$$ A - 2I = \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix}. $$

This matrix has rank $1$, so by rank–nullity (Chapter 14) its null space — the eigenspace — has dimension $2 - 1 = 1$. The equation forces the second component to be zero and leaves the first free, so every eigenvector is a multiple of $\mathbf{e}_1 = (1,0)$. The geometric multiplicity is $1$. We have a $2\times 2$ matrix whose only eigenvalue offers a single line of eigenvectors. There is no way to build a basis of $\mathbb{R}^2$ out of eigenvectors of $A$, because they all lie on one line. So $A$ is defective, and — exactly as Chapter 25 warned — it cannot be diagonalized.

The Key Insight — A matrix is diagonalizable if and only if every eigenvalue's geometric multiplicity equals its algebraic multiplicity, because that is precisely the condition under which the eigenvectors are numerous enough to form a basis. A defective matrix is one where, for at least one eigenvalue, geometric multiplicity is strictly less than algebraic multiplicity: the eigenvalue is "owed" more eigenvectors than it actually has, and the shortfall is exactly the number of independent eigenvectors we are missing.

Chapter 24 set this up precisely, and it is worth recalling the exact statement, because the whole chapter hinges on it. There we proved the multiplicity inequality: for every eigenvalue $\lambda$ of any square matrix, $1 \le \text{(geometric multiplicity)} \le \text{(algebraic multiplicity)}$. The lower bound says every eigenvalue has at least one eigenvector (the eigenspace is never empty — that is what makes it an eigenvalue). The upper bound says it can have at most as many independent eigenvectors as its multiplicity as a root. The two extremes are the two stories of Part V. When the inequality is an equality for every eigenvalue, the eigenvectors fill out a basis and Chapter 25's diagonalization succeeds. When it is strict for even a single eigenvalue, we fall short, and that shortfall — the defect, hence "defective" — is the number $d_\lambda = \text{(algebraic)} - \text{(geometric)}$ of basis vectors that ordinary eigenvectors cannot supply. For our shear, $d_\lambda = 2 - 1 = 1$: we are short exactly one vector, and the entire remainder of this chapter is the project of manufacturing it.

It pays to notice where the defect can possibly occur, because it rules out most of the matrices you have met. An eigenvalue with algebraic multiplicity $1$ — a simple eigenvalue — can never be defective: the inequality $1 \le \text{geometric} \le 1$ forces geometric multiplicity $1$ as well. So a matrix can only be defective if it has a repeated eigenvalue, and even then only if that repeated eigenvalue fails to gather its full quota of eigenvectors. A matrix with $n$ distinct eigenvalues is automatically diagonalizable (Chapter 25), so defectiveness lives entirely in the world of repeated roots that are short on eigenvectors. This is why the symmetric matrices of Chapter 27 are never defective — the spectral theorem guarantees them a full orthonormal eigenbasis no matter how often an eigenvalue repeats — and why defectiveness is a phenomenon of general, non-symmetric matrices, the kind that arise from coupled differential equations and asymmetric transition models rather than from the symmetric Gram matrices of least squares.

Let numpy confirm the eigenvector shortage, because this is the kind of claim the book always checks by computation rather than by trust.

# The shear A = [[2,1],[0,2]] is defective: alg mult 2 but only one eigenvector.
import numpy as np
A = np.array([[2., 1.],
              [0., 2.]])
vals, vecs = np.linalg.eig(A)
print("eigenvalues:", vals)            # [2. 2.]  -> algebraic multiplicity 2
print("eigenvectors (columns):")
print(np.round(vecs, 6))               # both columns are +/- (1, 0): same line!
print("geometric multiplicity:", 2 - np.linalg.matrix_rank(A - 2*np.eye(2)))  # 1

The two "eigenvectors" numpy returns are $(1,0)$ and $(-1,0)$ — the same line, reported twice. The geometric multiplicity prints as $1$. Algebraic $2$, geometric $1$: the gap is one, and that one missing dimension is the whole story of this chapter. Note already a numpy quirk we will return to in §36.8: numpy did return a $2\times 2$ matrix of "eigenvectors," but its columns are not independent, so it is not a usable diagonalizing $P$. The floating-point world does not raise an alarm when a matrix is defective; it just hands back a degenerate basis and moves on.

Common Pitfall — algebraic versus geometric multiplicity. These two numbers are not the same, and the entire chapter lives in the gap between them. The algebraic multiplicity of $\lambda$ is how many times $(\lambda_0 - \lambda)$ divides the characteristic polynomial — a counting of roots. The geometric multiplicity is $\dim N(A - \lambda I)$, the number of independent eigenvectors — a counting of dimensions. The theorem you must hold onto is an inequality: for every eigenvalue, $1 \le (\text{geometric}) \le (\text{algebraic})$. Equality for every eigenvalue means diagonalizable; strict inequality for even one eigenvalue means defective. The frequent mistake is to assume a double root automatically supplies two eigenvectors. It might (the identity matrix $I$ has $\lambda = 1$ with both multiplicities equal to $2$), or it might not (our shear has algebraic $2$, geometric $1$). You must always compute $\dim N(A - \lambda I)$ to know — the multiplicity of the root tells you only the most eigenvectors you could hope for, never how many you actually have.

FAQ: Does "defective" mean the matrix is somehow invalid or pathological?

Not at all, and this is worth saying plainly because the word defective sounds like an insult. A defective matrix is a perfectly good, perfectly common transformation — the shear above is one of the most basic maps in all of geometry. "Defective" is purely a technical term meaning "has fewer eigenvectors than its eigenvalues' multiplicities promise," and therefore "cannot be diagonalized." It does not mean singular (our shear is invertible — its determinant is $4$), nor numerically bad in any moral sense, nor rare. Many matrices that arise from real differential equations at the boundary between two behaviors — the critically-damped oscillator of this chapter's first case study is the canonical example — are exactly defective. The honest reaction to a defective matrix is not dismay but curiosity: what is it doing that diagonalization could not see? That question is what the rest of the chapter answers.

36.2 What is a generalized eigenvector, and how does a Jordan chain repair the missing basis?

We are short one basis vector. Diagonalization wanted two independent eigenvectors and got one. The defining move of this chapter is to relax the demand: if we cannot find a second vector that $A$ leaves exactly in its own direction, let us find the next best thing — a vector that $A$ almost leaves in the eigen-direction, missing only by a copy of the eigenvector we already have.

Here is the precise idea. An eigenvector $\mathbf{v}$ for $\lambda$ satisfies $(A - \lambda I)\mathbf{v} = \mathbf{0}$ — applying $(A - \lambda I)$ annihilates it in one step. A generalized eigenvector is a vector that $(A - \lambda I)$ does not kill in one step, but does kill if you apply it enough times. The simplest and most important case is a vector $\mathbf{w}$ that takes two steps: $(A - \lambda I)\mathbf{w} \neq \mathbf{0}$, but $(A - \lambda I)^2\mathbf{w} = \mathbf{0}$. Equivalently — and this is how we will actually find it — $\mathbf{w}$ is a solution of

$$ (A - \lambda I)\,\mathbf{w} = \mathbf{v}, $$

where $\mathbf{v}$ is a genuine eigenvector. Apply $(A - \lambda I)$ once and you land on the eigenvector $\mathbf{v}$; apply it again and $\mathbf{v}$ (being an eigenvector) goes to $\mathbf{0}$. The pair $\{\mathbf{v}, \mathbf{w}\}$ — the eigenvector and the generalized eigenvector that maps onto it — is called a Jordan chain of length $2$. The eigenvector $\mathbf{v}$ is the bottom of the chain (the part $A$ handles cleanly); the generalized eigenvector $\mathbf{w}$ sits above it (the part $A$ shears).

The Key Insight — A Jordan chain is a relay. Start at the top with the generalized eigenvector $\mathbf{w}$. Apply $A - \lambda I$ and you slide down to the eigenvector $\mathbf{v}$. Apply it once more and you fall off the bottom to $\mathbf{0}$. The chain $\mathbf{v} \to \mathbf{w}$ (read $A-\lambda I$ as the arrow $\mathbf{w} \to \mathbf{v} \to \mathbf{0}$) is the structure that replaces the missing eigenvector: $\mathbf{v}$ and $\mathbf{w}$ are guaranteed independent, so together they finish the basis that the eigenvectors alone could not.

Why are $\mathbf{v}$ and $\mathbf{w}$ independent? Suppose $\mathbf{w} = c\,\mathbf{v}$ for some scalar $c$. Then $(A - \lambda I)\mathbf{w} = c\,(A - \lambda I)\mathbf{v} = c\cdot\mathbf{0} = \mathbf{0}$, contradicting the requirement $(A - \lambda I)\mathbf{w} = \mathbf{v} \neq \mathbf{0}$. So $\mathbf{w}$ cannot be a multiple of $\mathbf{v}$, and the two vectors are linearly independent. This little argument is the engine of the whole construction: every step up a Jordan chain produces a vector provably independent of everything below it.

For longer chains the same logic compounds, and it is worth seeing the structure once in full because the largest blocks come from the longest chains. A Jordan chain of length $k$ is a list $\mathbf{v}_1, \mathbf{v}_2, \dots, \mathbf{v}_k$ in which $\mathbf{v}_1$ is a genuine eigenvector and each later vector maps onto the previous one under $A - \lambda I$:

$$ (A - \lambda I)\mathbf{v}_1 = \mathbf{0}, \quad (A - \lambda I)\mathbf{v}_2 = \mathbf{v}_1, \quad (A - \lambda I)\mathbf{v}_3 = \mathbf{v}_2, \quad \dots, \quad (A - \lambda I)\mathbf{v}_k = \mathbf{v}_{k-1}. $$

Apply $A - \lambda I$ repeatedly to the top vector $\mathbf{v}_k$ and you walk straight down the chain — $\mathbf{v}_k \to \mathbf{v}_{k-1} \to \cdots \to \mathbf{v}_1 \to \mathbf{0}$ — so $(A - \lambda I)^k \mathbf{v}_k = \mathbf{0}$ but $(A - \lambda I)^{k-1}\mathbf{v}_k = \mathbf{v}_1 \neq \mathbf{0}$. The number of applications it takes to annihilate a generalized eigenvector is its rank in the chain, and the longest chain's length is the smallest power $p$ with $(A - \lambda I)^p = \mathbf{0}$ on the whole generalized eigenspace — a quantity called the index of the eigenvalue. A length-$k$ chain produces $k$ basis vectors that are provably independent (the same multiple-of-a-lower-vector argument, applied at each rung), and they will become a single $k \times k$ Jordan block. So the length of a chain is exactly how many independent directions one eigenvector can anchor above itself, and the index of an eigenvalue is the size of its largest block. Our $2\times 2$ shear has index $2$: $(A - 2I)^2 = 0$ but $A - 2I \neq 0$, which is why its largest (and only) block has size $2$.

This reframes the defect $d_\lambda$ from §36.1 in a useful way. The geometric multiplicity counts the bottoms of chains (each chain starts from one eigenvector), so it equals the number of blocks. The algebraic multiplicity counts all the vectors across every chain, so it equals the total size of the blocks. The defect $d_\lambda = \text{(algebraic)} - \text{(geometric)}$ is therefore the number of generalized eigenvectors above the bottoms — the number of superdiagonal $1$'s that will appear for that eigenvalue. For the shear, one block of size $2$ means one bottom (geometric $1$), total size $2$ (algebraic $2$), and one superdiagonal $1$ (defect $1$). The bookkeeping is exact and it is the same bookkeeping the theorem in §36.5 states formally.

Geometric Intuition — Think of the generalized eigenvector as the direction the shear slides along. The eigenvector $\mathbf{v}$ is the line the shear preserves; the generalized eigenvector $\mathbf{w}$ points across that line, into the motion. Applying $A$ to $\mathbf{w}$ does not return a multiple of $\mathbf{w}$ — that is exactly why $\mathbf{w}$ is not an eigenvector — but it returns $\lambda\mathbf{w}$ plus a kick of $\mathbf{v}$: the scaling part ($\lambda \mathbf{w}$) and the shear part (the extra $\mathbf{v}$) cleanly separated. The eigenvector captures the scaling; the generalized eigenvector captures the leftover shear the eigenvector could not.

That last sentence is worth pinning down with algebra, because it is the bridge to the Jordan block. From $(A - \lambda I)\mathbf{w} = \mathbf{v}$ we get, just by moving the $\lambda I$ across,

$$ A\mathbf{w} = \lambda\mathbf{w} + \mathbf{v}. $$

Read it carefully. Acting on the eigenvector, $A\mathbf{v} = \lambda\mathbf{v}$: pure scaling. Acting on the generalized eigenvector, $A\mathbf{w} = \lambda\mathbf{w} + \mathbf{v}$: scaling plus a contribution along the eigenvector. In the basis $\{\mathbf{v}, \mathbf{w}\}$, the first column of the matrix (the image of $\mathbf{v}$) is $(\lambda, 0)$, and the second column (the image of $\mathbf{w}$) is $(1, \lambda)$ — a $\lambda$ from the scaling and a $1$ from the kick of $\mathbf{v}$. That matrix is

$$ \begin{bmatrix} \lambda & 1 \\ 0 & \lambda \end{bmatrix}, $$

the Jordan block we are about to meet. The $1$ in the corner is not decoration; it is the shear, the leftover, written down.

Check Your Understanding — Suppose $\lambda = 5$ and $\{\mathbf{v}, \mathbf{w}\}$ is a Jordan chain with $A\mathbf{v} = 5\mathbf{v}$ and $(A - 5I)\mathbf{w} = \mathbf{v}$. Compute $A\mathbf{w}$ in terms of $\mathbf{v}$ and $\mathbf{w}$, and confirm $\mathbf{w}$ is not an eigenvector.

Answer From $(A - 5I)\mathbf{w} = \mathbf{v}$, add $5\mathbf{w}$ to both sides: $A\mathbf{w} = 5\mathbf{w} + \mathbf{v}$. For $\mathbf{w}$ to be an eigenvector we would need $A\mathbf{w} = \mu\mathbf{w}$ for some scalar $\mu$, i.e. $5\mathbf{w} + \mathbf{v} = \mu\mathbf{w}$, forcing $\mathbf{v} = (\mu - 5)\mathbf{w}$ — a multiple of $\mathbf{w}$. But $\mathbf{v}$ is independent of $\mathbf{w}$ (proved above), so no such $\mu$ exists. $\mathbf{w}$ is genuinely not an eigenvector; the extra $\mathbf{v}$ is exactly what disqualifies it, and exactly what the Jordan block's off-diagonal $1$ records.

FAQ: How is solving (A − λI)w = v different from finding an eigenvector?

An eigenvector solves the homogeneous system $(A - \lambda I)\mathbf{v} = \mathbf{0}$ — you are looking in the null space of $A - \lambda I$. A generalized eigenvector solves the inhomogeneous system $(A - \lambda I)\mathbf{w} = \mathbf{v}$ — the right-hand side is the eigenvector you already found, not zero. Mechanically it is the same Gaussian elimination from Chapter 4, just with a nonzero right-hand side. There is one subtlety that will matter in the worked examples: this inhomogeneous system is not always solvable for an arbitrary eigenvector $\mathbf{v}$. It is solvable precisely when $\mathbf{v}$ lies in the column space of $A - \lambda I$ (recall from Chapter 13 that $(A-\lambda I)\mathbf{w} = \mathbf{v}$ has a solution exactly when $\mathbf{v} \in C(A - \lambda I)$). When an eigenvalue has several independent eigenvectors but a chain of length greater than one, you must start the chain from the right eigenvector — the one in the image of $A - \lambda I$ — and §36.4's 3×3 example shows exactly how that plays out.

36.3 What does a Jordan block do geometrically?

We have seen the algebra force the matrix $\begin{bmatrix} \lambda & 1 \\ 0 & \lambda \end{bmatrix}$ out of a length-2 chain. Now let us understand it on its own terms, because the Jordan block is the atom from which every Jordan form is built, and its geometry is the geometry of the entire chapter.

A Jordan block $J_k(\lambda)$ is a $k \times k$ matrix with the single eigenvalue $\lambda$ repeated all down the main diagonal, $1$'s on the superdiagonal (the diagonal just above the main one), and zeros everywhere else. The blocks of sizes $1, 2, 3$ are

$$ J_1(\lambda) = \begin{bmatrix} \lambda \end{bmatrix}, \qquad J_2(\lambda) = \begin{bmatrix} \lambda & 1 \\ 0 & \lambda \end{bmatrix}, \qquad J_3(\lambda) = \begin{bmatrix} \lambda & 1 & 0 \\ 0 & \lambda & 1 \\ 0 & 0 & \lambda \end{bmatrix}. $$

A size-$1$ block is just a $1\times 1$ scalar — a genuine eigenvalue with a genuine eigenvector, the diagonalizable case. The interesting structure begins at size $2$. The "almost-diagonal" description is exact: a Jordan block is a diagonal matrix $\lambda I$ plus a matrix $N$ that has $1$'s on the superdiagonal and zeros elsewhere. That $N$ is nilpotent — apply it enough times and it becomes the zero matrix, because each application pushes the $1$'s one diagonal further up and out of the matrix. For the $2\times 2$ case, $N = \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix}$ and $N^2 = 0$; for the $3\times 3$ case, $N^3 = 0$. So a Jordan block splits cleanly into a scaling part $\lambda I$ (the diagonal) and a nilpotent part $N$ (the superdiagonal of $1$'s) — and that split, $J_k(\lambda) = \lambda I + N$, is the key to computing its powers and its exponential in §36.7.

Geometric Intuition — A $2\times 2$ Jordan block is a shear in the eigen-direction. Set $\lambda = 1$ for a moment to isolate the shear: $J_2(1) = \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix}$ is the pure horizontal shear that fixes the $x$-axis and slides every other point horizontally in proportion to its height — the §36.1 picture exactly. Now restore a general $\lambda$: $J_2(\lambda) = \lambda\!\begin{bmatrix} 1 & 1/\lambda \\ 0 & 1 \end{bmatrix}$ first shears, then scales the whole plane by $\lambda$. So a Jordan block does two things at once — it scales along the eigen-direction (the diagonal $\lambda$) and shears the second direction into the first (the off-diagonal $1$). The shear is the "leftover" that no eigenvector could capture: it is precisely the part of the transformation that prevents diagonalization, made visible as a slide.

Let us see the shear with the visualizer, the recurring tool from Chapter 1, so this Jordan block joins the gallery of transformations the book has been drawing all along.

# Figure 36.1: a Jordan block is a shear. Using the visualizer from Chapter 1.
from toolkit.visualizer import visualize_2d
import matplotlib.pyplot as plt
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5))
visualize_2d([[1, 1], [0, 1]], title="J(1): pure shear (lambda = 1)", ax=ax1)
visualize_2d([[2, 1], [0, 2]], title="J(2): shear then scale by 2", ax=ax2)
plt.tight_layout(); plt.show()

Figure 36.1Left: the Jordan block $J_2(1)$ is a pure shear — the unit square tips into a parallelogram of the same area (determinant $1$), the bottom edge fixed, the top edge slid right by one unit; the only invariant direction is the horizontal axis. Right: the block $J_2(2)$ does the same shear and then scales everything by $2$, so the parallelogram is four times the area (determinant $4$). Alt-text: two side-by-side plots of a dashed unit square and its sheared image, the left preserving area, the right enlarging it, both leaning to the right and fixing the horizontal axis.

The picture explains the name "almost diagonal." A diagonal matrix $\begin{bmatrix} \lambda & 0 \\ 0 & \lambda \end{bmatrix}$ scales the plane uniformly and turns nothing — every direction is invariant. The Jordan block $\begin{bmatrix} \lambda & 1 \\ 0 & \lambda \end{bmatrix}$ is one entry away from that, and the entire difference between "scales everything cleanly" and "scales while shearing" is that single off-diagonal $1$. The Jordan form is the matrix's confession that it is one shear away from being diagonalizable, and the size of its blocks measures exactly how much shearing it does.

The nilpotent part deserves one more look, because it carries the geometric meaning of block size. On the standard basis $\mathbf{e}_1, \dots, \mathbf{e}_k$ of a single block, the nilpotent matrix $N$ acts as a one-step backshift: it sends $\mathbf{e}_k \mapsto \mathbf{e}_{k-1} \mapsto \cdots \mapsto \mathbf{e}_1 \mapsto \mathbf{0}$, marching each basis vector one rung toward the bottom of the chain and pushing the last one off the edge into zero. That is exactly the Jordan chain of §36.2 viewed inside the block's own coordinates — the chain is the orbit of the top basis vector under repeated backshifting. After $k$ steps everything has fallen to zero, which is why $N^k = 0$ and not before. So the block size $k$ is the length of the longest "fall to zero," the index of the eigenvalue, and geometrically it counts how many shear-layers stack above the eigen-direction before the motion exhausts itself. A size-$2$ block has one shear layer; a size-$3$ block has two layers, a shear of a shear; and the diagonal case (size $1$) has none. The Jordan form, read this way, is a stack of these nested shears, one tower per eigenvector, sorted by eigenvalue — the complete anatomy of how a transformation fails to be a pure scaling.

Real-World Application — In computer graphics and physics simulation, a shear matrix is a building block of its own right (italic slant on fonts, the strain tensor of a deforming material). The Jordan-block viewpoint says something deeper: whenever a transformation has a repeated mode that it cannot fully decouple — two coupled oscillators tuned to the same frequency, a control system at the boundary of stability, a structure at a buckling threshold — the matrix governing it is defective, and the Jordan block is the mathematics of that coupling. The off-diagonal $1$ is not a flaw in the model; it is the model telling you two modes have merged, and §36.7 will show that this merging is exactly what produces the slow $t\,e^{\lambda t}$ creep seen in critically-damped systems instead of the clean exponential of a diagonalizable one.

FAQ: Why a 1 on the superdiagonal — why not some other number?

The $1$ is a normalization, a choice of how long to make the generalized eigenvector. Recall $A\mathbf{w} = \lambda\mathbf{w} + \mathbf{v}$: the coefficient of $\mathbf{v}$ is $1$ because we defined $\mathbf{w}$ by $(A - \lambda I)\mathbf{w} = \mathbf{v}$, landing exactly on $\mathbf{v}$. We could instead have solved $(A - \lambda I)\mathbf{w}' = c\,\mathbf{v}$ for some scalar $c \neq 0$, and then the block would show $c$ in the corner. The convention is to scale so the off-diagonal entry is $1$, giving the canonical form — the same way we usually normalize eigenvectors to unit length. So the superdiagonal $1$ is not forced by the matrix; it is forced by the standard, agreed-upon way of writing the answer, which is what makes the Jordan form canonical (a unique normal form) rather than merely a block-triangular form. Some textbooks place the $1$'s on the subdiagonal instead (lower-triangular blocks); the content is identical, just transposed.

36.4 How do you actually compute the Jordan form? A full worked example

Enough structure — let us do one completely by hand, find every generalized eigenvector, build $P$, and verify against numpy and sympy that $P^{-1}AP$ is the Jordan form. We will deliberately not use the matrix $\begin{bmatrix} 2 & 1 \\ 0 & 2 \end{bmatrix}$, because that one is already in Jordan form (its $P$ is the identity), so it teaches the procedure nothing. Instead take a defective matrix that hides its structure:

$$ A = \begin{bmatrix} 1 & -1 \\ 1 & 3 \end{bmatrix}. $$

Step 1 — eigenvalues. The characteristic polynomial is $\det(A - \lambda I) = (1-\lambda)(3-\lambda) - (-1)(1) = \lambda^2 - 4\lambda + 4 = (\lambda - 2)^2$. So $\lambda = 2$ with algebraic multiplicity $2$. (Trace $= 1 + 3 = 4 = 2 + 2$ and determinant $= 3 - (-1) = 4 = 2\cdot 2$, a quick sanity check from Chapter 24.)

Step 2 — eigenvectors, and the diagnosis. Form

$$ A - 2I = \begin{bmatrix} -1 & -1 \\ 1 & 1 \end{bmatrix}, $$

which has rank $1$ (the second row is $-1$ times the first). So the eigenspace has dimension $2 - 1 = 1$: geometric multiplicity $1$, less than the algebraic $2$. The matrix is defective. The eigenvectors solve $-v_1 - v_2 = 0$, i.e. $v_2 = -v_1$, so $\mathbf{v} = (1, -1)$ spans the eigenspace. We have our one genuine eigenvector — and we are owed one more direction.

Step 3 — the generalized eigenvector. Solve the inhomogeneous system $(A - 2I)\mathbf{w} = \mathbf{v}$, that is,

$$ \begin{bmatrix} -1 & -1 \\ 1 & 1 \end{bmatrix} \begin{bmatrix} w_1 \\ w_2 \end{bmatrix} = \begin{bmatrix} 1 \\ -1 \end{bmatrix}. $$

The first row says $-w_1 - w_2 = 1$; the second says $w_1 + w_2 = -1$ — the same equation (consistent, as it must be, since $\mathbf{v}$ lies in the column space of $A - 2I$). There is a free variable: pick $w_1 = 0$, giving $w_2 = -1$, so $\mathbf{w} = (0, -1)$. (Any choice of the free variable gives a valid generalized eigenvector; different choices differ by a multiple of $\mathbf{v}$ and yield the same Jordan form. We pick the simplest.)

Step 4 — assemble $P$ and read off $J$. The change-of-basis matrix has the chain in its columns, eigenvector first, then generalized eigenvector:

$$ P = \big[\, \mathbf{v} \mid \mathbf{w} \,\big] = \begin{bmatrix} 1 & 0 \\ -1 & -1 \end{bmatrix}. $$

The order matters: with the eigenvector in column $1$ and the generalized eigenvector in column $2$, the off-diagonal $1$ lands above the diagonal (superdiagonal), giving the standard upper-triangular block. We claim

$$ P^{-1} A P = J = \begin{bmatrix} 2 & 1 \\ 0 & 2 \end{bmatrix}. $$

We could invert $P$ by hand (Chapter 9), but the cleaner check is the defining relations of §36.2, which avoid any inversion: $AP$ should equal $PJ$. Column by column, $AP = [\,A\mathbf{v} \mid A\mathbf{w}\,] = [\,2\mathbf{v} \mid 2\mathbf{w} + \mathbf{v}\,]$, and $PJ = [\,2\mathbf{v} \mid \mathbf{v} + 2\mathbf{w}\,]$ — identical. The relation $AP = PJ$ holds, and since $P$ is invertible (its determinant is $-1 \neq 0$), $P^{-1}AP = J$. Done by hand.

Now verify with numpy, computing $P^{-1}AP$ directly and checking it matches the $J$ we predicted:

# Worked example: A = [[1,-1],[1,3]] is defective; build P from the Jordan chain.
import numpy as np
A = np.array([[1., -1.],
              [1.,  3.]])
v = np.array([1., -1.])                 # eigenvector for lambda = 2
w = np.array([0., -1.])                 # generalized: (A - 2I) w = v
print("check (A - 2I) v == 0 :", np.allclose((A - 2*np.eye(2)) @ v, 0))   # True
print("check (A - 2I) w == v :", np.allclose((A - 2*np.eye(2)) @ w, v))   # True
P = np.column_stack([v, w])             # columns: [v | w]
J = np.linalg.inv(P) @ A @ P
print("P^{-1} A P =\n", np.round(J, 10))

The printed matrix is

P^{-1} A P =
 [[2. 1.]
  [0. 2.]]

— exactly the Jordan form we built by hand. The off-diagonal $1$ is the shear that the lone eigenvector could not capture, now sitting in its canonical place.

Computational Note — Ask sympy for the Jordan form symbolically and you get the same $J$ but possibly a different $P$. sympy.Matrix([[1,-1],[1,3]]).jordan_form() returns $J = \begin{bmatrix}2&1\\0&2\end{bmatrix}$ with $P = \begin{bmatrix}-1&1\\1&0\end{bmatrix}$ — its chain is built from a different (equally valid) choice of the free variable, and a sign-flipped eigenvector. This is not a disagreement; it is the freedom in the construction. The Jordan form $J$ is unique (up to the order of blocks, §36.5), but the transforming matrix $P$ is not unique — generalized eigenvectors can be rescaled and shifted by lower chain members. When you compare your $P$ to a library's, expect the $J$ to match exactly and the $P$ to differ.

# sympy: same J, a different (equally valid) P.
import sympy as sp
P_sym, J_sym = sp.Matrix([[1, -1], [1, 3]]).jordan_form()
print("sympy J =", J_sym.tolist())     # [[2, 1], [0, 2]]
print("sympy P =", P_sym.tolist())     # [[-1, 1], [1, 0]]  -- differs from ours, both correct

A 3×3 example: two blocks, and the trap of the wrong eigenvector

The $2\times 2$ case hides a subtlety that only appears with three dimensions, so let us meet it. Take

$$ A = \begin{bmatrix} 2 & 1 & 0 \\ 0 & 2 & 0 \\ 0 & 1 & 2 \end{bmatrix}. $$

Its characteristic polynomial is $(2 - \lambda)^3$, so $\lambda = 2$ has algebraic multiplicity $3$. Compute

$$ A - 2I = \begin{bmatrix} 0 & 1 & 0 \\ 0 & 0 & 0 \\ 0 & 1 & 0 \end{bmatrix}, $$

which has rank $1$, so the eigenspace has dimension $3 - 1 = 2$: geometric multiplicity $2$. The gap between algebraic $3$ and geometric $2$ is $1$, so we are short exactly one basis vector and need exactly one generalized eigenvector. The eigenspace ($N(A-2I)$) is the set of vectors with second component zero — spanned by $\mathbf{e}_1 = (1,0,0)$ and $\mathbf{e}_3 = (0,0,1)$.

Here is the trap. To build a length-2 Jordan chain we must solve $(A - 2I)\mathbf{w} = \mathbf{v}$ for an eigenvector $\mathbf{v}$, but not every eigenvector works — only one in the column space of $A - 2I$. The column space of $A - 2I$ is spanned by its nonzero column, $(1, 0, 1)$. Neither $\mathbf{e}_1$ nor $\mathbf{e}_3$ lies in that span, but their combination $\mathbf{v} = \mathbf{e}_1 + \mathbf{e}_3 = (1, 0, 1)$ does. That is the eigenvector that starts the chain. Solving $(A - 2I)\mathbf{w} = (1,0,1)$: the only nontrivial row is $w_2 = 1$, with $w_1, w_3$ free; pick $\mathbf{w} = (0, 1, 0)$. The chain is $\mathbf{v} = (1,0,1) \to \mathbf{w} = (0,1,0)$.

We now have two of three basis vectors. The third comes from the rest of the eigenspace: pick any eigenvector independent of $\mathbf{v}$, say $\mathbf{u} = (1, 0, 0)$, to fill the size-$1$ block. The Jordan form will therefore have two blocks for $\lambda = 2$: a $2\times 2$ block (from the chain) and a $1\times 1$ block (from $\mathbf{u}$). Assemble $P$ with the chain first, then the lone eigenvector:

$$ P = \big[\, \mathbf{v} \mid \mathbf{w} \mid \mathbf{u} \,\big] = \begin{bmatrix} 1 & 0 & 1 \\ 0 & 1 & 0 \\ 1 & 0 & 0 \end{bmatrix}, \qquad J = \begin{bmatrix} 2 & 1 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 2 \end{bmatrix} = J_2(2) \oplus J_1(2). $$

# 3x3 defective: alg mult 3, geom mult 2 -> blocks J2(2) (+) J1(2).
import numpy as np
A = np.array([[2., 1., 0.],
              [0., 2., 0.],
              [0., 1., 2.]])
N = A - 2*np.eye(3)
v = np.array([1., 0., 1.])              # eigenvector IN the column space of A-2I (starts the chain)
w = np.array([0., 1., 0.])              # generalized: (A-2I) w = v
u = np.array([1., 0., 0.])              # second eigenvector, independent of v (the 1x1 block)
print("(A-2I)v == 0 :", np.allclose(N @ v, 0))    # True
print("(A-2I)w == v :", np.allclose(N @ w, v))    # True
print("(A-2I)u == 0 :", np.allclose(N @ u, 0))    # True
P = np.column_stack([v, w, u])
print("J = P^{-1} A P =\n", np.round(np.linalg.inv(P) @ A @ P, 10))

which prints

J = P^{-1} A P =
 [[2. 1. 0.]
  [0. 2. 0.]
  [0. 0. 2.]]

The Jordan form stacks a $2\times 2$ block on a $1\times 1$ block along the diagonal, with the rest zero. The lesson of the trap: when an eigenvalue has multiple eigenvectors but a longer chain, you cannot start the chain from just any eigenvector — you must find the one(s) in the image of $A - \lambda I$. (Choosing $\mathbf{v} = \mathbf{e}_1$, not in the column space, would make $(A-2I)\mathbf{w}=\mathbf{e}_1$ inconsistent — no solution — and the construction would stall. This is why §36.6's systematic method works down from the highest powers of $A - \lambda I$ rather than guessing eigenvectors.)

Warning

— The number and sizes of the Jordan blocks for an eigenvalue $\lambda$ are not determined by the algebraic and geometric multiplicities alone in general. The number of blocks equals the geometric multiplicity ($\dim N(A - \lambda I)$ — one block per independent eigenvector). The total size of those blocks equals the algebraic multiplicity. But the individual sizes require more information: the rank of higher powers $(A - \lambda I)^2, (A - \lambda I)^3, \dots$. For our $3\times 3$, geometric multiplicity $2$ gives two blocks and algebraic multiplicity $3$ gives total size $3$, which forces sizes $2 + 1$ — uniquely determined here. But an eigenvalue with algebraic multiplicity $4$ and geometric multiplicity $2$ could be $3 + 1$ or $2 + 2$, and only the rank of $(A - \lambda I)^2$ distinguishes them. Never assume the block sizes; compute the chain lengths. (§36.6 gives the exact rank formula.)

36.5 What is the Jordan canonical form, and does every matrix have one?

We can now state the central theorem of the chapter. Assemble all the Jordan chains for all the eigenvalues into the columns of a single matrix $P$, and the conjugated matrix $P^{-1}AP$ comes out block-diagonal, each block a Jordan block. This block-diagonal matrix is the Jordan normal form (or Jordan canonical form) of $A$.

Theorem (Jordan Canonical Form). Let $A$ be an $n \times n$ matrix over the complex numbers $\mathbb{C}$. Then there exists an invertible matrix $P$ such that $$ A = P\,J\,P^{-1}, $$ where $J$ is block-diagonal, $J = \operatorname{diag}\!\big(J_{k_1}(\lambda_1),\, J_{k_2}(\lambda_2),\, \dots,\, J_{k_m}(\lambda_m)\big)$, and each $J_{k_i}(\lambda_i)$ is a Jordan block: the eigenvalue $\lambda_i$ down the diagonal with $1$'s on the superdiagonal. The form $J$ is unique up to the order of the blocks. For each eigenvalue $\lambda$, the number of blocks for $\lambda$ equals its geometric multiplicity, and the sum of their sizes equals its algebraic multiplicity.

The structure of the statement deserves attention because it follows the proof template of §10 in spirit: it buys us a normal form (every matrix reduced to an almost-diagonal canonical representative), and it states its conditions loudly. Read the conditions carefully, because they are the whole point.

Warning — the form exists over the complex numbers. The theorem is stated, and is true, over $\mathbb{C}$. Existence relies on the characteristic polynomial splitting completely into linear factors $\prod (\lambda - \lambda_i)$, which is guaranteed for every polynomial over $\mathbb{C}$ by the Fundamental Theorem of Algebra but not over $\mathbb{R}$. A real matrix with complex eigenvalues — a rotation, say (Chapter 26) — has no Jordan form with real entries; its eigenvalues are genuinely complex, and you must work in $\mathbb{C}^n$ to find $J$. This is the same expansion to complex scalars that Chapter 26 made to handle rotations, and it is why the theorem must say "over $\mathbb{C}$." The honest statement is: every square complex matrix has a Jordan form; a real matrix has a real Jordan form only when all its eigenvalues are real. (There is a real "block" variant for the complex-eigenvalue case, mirroring the real canonical form of Chapter 26, but the clean theorem is the complex one.)

The Key Insight — Diagonalization is the special case of the Jordan canonical form in which every block has size $1$. A diagonalizable matrix is one whose Jordan form is actually diagonal — no superdiagonal $1$'s, because no eigenvalue is defective. So we have not replaced Chapter 25; we have completed it. The question "is $A$ diagonalizable?" becomes "are all of $A$'s Jordan blocks $1\times 1$?", and the answer is yes exactly when geometric multiplicity equals algebraic multiplicity for every eigenvalue — the Chapter 25 criterion, recovered.

The uniqueness clause is what makes the form canonical and therefore genuinely useful. Two matrices are similar (Chapter 16 — related by $B = P^{-1}AP$, the same transformation in different coordinates) if and only if they have the same Jordan form up to block order. This makes the Jordan form a complete invariant of similarity: it captures everything about a transformation that does not depend on the choice of coordinates, and discards everything that does. The eigenvalues, their multiplicities, and the sizes of the blocks together form the full fingerprint — and the block sizes are the new information beyond what eigenvalues alone could tell you. Two matrices can share every eigenvalue and every multiplicity yet be genuinely different transformations, distinguished only by their Jordan block structure.

FAQ: If the Jordan form exists for every complex matrix, why did Chapter 25 say some matrices can't be diagonalized?

Because diagonalizable and has a Jordan form are different bars. "Diagonalizable" demands a strictly diagonal $D$ — every block of size $1$. The defective matrices fail that bar; that is the whole content of Chapter 25's caveat. But they clear the lower bar of Jordan form, where blocks of size $2, 3, \dots$ are allowed. The Jordan form is the answer to "if I cannot get all the way to diagonal, how close can I get?" — and the answer, over $\mathbb{C}$, is "always to block-diagonal with these specific almost-diagonal blocks." So both statements are true: not every matrix is diagonalizable (Chapter 25), but every complex matrix has a Jordan form (this chapter). The second is the honest generalization of the first.

36.6 Why does every complex matrix have a Jordan form? (A motivated sketch)

A full proof of the Jordan canonical form is a substantial piece of algebra — most courses spend a week on it — so we give the motivated version of §10: why we care, the key idea, the shape of the argument, and what it means. A reader content with using the theorem can skip to §36.7; a math major should read on and then chase the full proof in Axler or Hoffman–Kunze (see Further Reading).

Why we care. Existence is what licenses every application: if some matrices had no normal form, we could not claim to compute powers or exponentials of every defective matrix. The theorem promises that the construction of §36.2–§36.4 never stalls — there are always enough generalized eigenvectors to finish the basis.

The key idea, in one sentence. For a single eigenvalue $\lambda$, the space splits into "the part $(A - \lambda I)$ eventually kills" — the generalized eigenspace — and the chains within it organize that part into Jordan blocks; doing this for every eigenvalue tiles the whole space.

The shape of the argument. First, restrict to one eigenvalue $\lambda$ and study $N = A - \lambda I$. The generalized eigenspace is $K_\lambda = \{\mathbf{x} : N^p\mathbf{x} = \mathbf{0} \text{ for some } p\}$ — everything $N$ eventually annihilates. A foundational fact (true over any field, but combined with $\mathbb{C}$'s splitting to cover all of $\mathbb{C}^n$) is that $\mathbb{C}^n$ decomposes as a direct sum of the generalized eigenspaces, one per distinct eigenvalue, with $\dim K_\lambda$ equal to the algebraic multiplicity of $\lambda$. This is why the algebraic multiplicity, not the geometric one, measures the room available: the generalized eigenspace is large enough to hold a full set of chain vectors even when the ordinary eigenspace is too small. Second, within $K_\lambda$, the operator $N$ is nilpotent ($N^p = 0$ on $K_\lambda$), and the structure theorem for a single nilpotent operator says its space breaks into Jordan chains — strings $\mathbf{x}, N\mathbf{x}, N^2\mathbf{x}, \dots$ ending at $\mathbf{0}$ — whose lengths and count are pinned down by the ranks of the powers $N, N^2, N^3, \dots$. Each chain becomes one Jordan block. Stacking the blocks from all eigenvalues gives $J$, and the chains' vectors are the columns of $P$.

The rank formula (the practical payoff). The number of Jordan blocks of size $\ge j$ for the eigenvalue $\lambda$ is $\operatorname{rank}(N^{j-1}) - \operatorname{rank}(N^{j})$ (with $N^0 = I$). In particular the number of blocks (size $\ge 1$) is $n - \operatorname{rank}(N) = \dim N(A - \lambda I)$, the geometric multiplicity — exactly as the theorem claims. This formula is the systematic way to find block sizes without guessing, and it is why the §36.4 warning insisted you compute ranks of higher powers when the multiplicities alone leave the sizes ambiguous.

Math-Major Sidebar — the generalized eigenspace decomposition. The clean modern statement (Axler's approach, avoiding determinants) is: for an operator $T$ on a finite-dimensional complex vector space, $V = \bigoplus_\lambda K_\lambda$ where $K_\lambda = \ker (T - \lambda I)^{\dim V}$ is the generalized eigenspace, and $T$ restricted to $K_\lambda$ equals $\lambda I + N_\lambda$ with $N_\lambda$ nilpotent. The Jordan form is then the matrix of $T$ in a basis adapted to the chains of each $N_\lambda$. Two facts make it canonical: the decomposition into generalized eigenspaces is forced (each $K_\lambda$ is the unique $T$-invariant complement summing to $V$ on which $T - \lambda I$ is nilpotent), and the chain decomposition of a nilpotent operator is unique up to ordering by the rank formula above. The cost of avoiding determinants is that one proves the characteristic polynomial splits via the existence of eigenvalues over $\mathbb{C}$ — which is, ultimately, the Fundamental Theorem of Algebra wearing a linear-algebra hat. Over a field that is not algebraically closed, one gets instead the rational canonical form, which always exists but uses companion blocks rather than Jordan blocks.

FAQ: What is a generalized eigenspace, in one picture?

It is everything the operator $N = A - \lambda I$ eventually sends to zero, even if not in one step. The ordinary eigenspace $N(A - \lambda I)$ is what dies in one application; the generalized eigenspace is what dies in some number of applications — it adds the vectors $N$ takes one, two, or more steps to annihilate. Geometrically, the eigenspace is the line (or plane) the matrix truly fixes in direction; the generalized eigenspace is that line plus the shear-directions stacked above it. Its dimension is the algebraic multiplicity, which is exactly why there is always enough room to complete the chains: the generalized eigenspace is sized by the algebraic multiplicity, not the geometric one, and so it can hold a full basis even when genuine eigenvectors run short.

36.7 Why does the Jordan form matter? Powers and the matrix exponential of a defective matrix

The payoff that justifies the whole machinery is computational, and it is the bridge to Chapter 37. Recall why diagonalization was so powerful in Chapter 25: if $A = PDP^{-1}$ then $A^k = PD^kP^{-1}$, and $D^k$ is trivial — just raise each diagonal entry to the $k$. Powers of a diagonalizable matrix are easy. But a defective matrix has no such $D$. The Jordan form rescues exactly this computation: $A = PJP^{-1}$ gives $A^k = PJ^kP^{-1}$, and the only question becomes how to raise a Jordan block to a power — which, thanks to the clean split $J_k(\lambda) = \lambda I + N$ with $N$ nilpotent, has a closed form.

For a $2\times 2$ block, write $J = \lambda I + N$ with $N = \begin{bmatrix}0&1\\0&0\end{bmatrix}$ and $N^2 = 0$. Since $\lambda I$ and $N$ commute, the binomial theorem applies and terminates (every $N^2$ and higher vanishes):

$$ J^k = (\lambda I + N)^k = \lambda^k I + k\lambda^{k-1} N = \begin{bmatrix} \lambda^k & k\lambda^{k-1} \\ 0 & \lambda^k \end{bmatrix}. $$

There it is — the signature of a defective eigenvalue. A diagonalizable matrix's powers grow like $\lambda^k$ alone; a defective one grows like $\lambda^k$ plus a $k\lambda^{k-1}$ term — a polynomial-in-$k$ times a geometric factor. That extra $k$ is the nilpotent part making itself felt, and it is the leftover shear, now visible in the growth rate. Let numpy confirm the formula for $\lambda = 2$:

# Powers of the Jordan block J = [[2,1],[0,2]] vs the closed form [[2^k, k 2^{k-1}],[0,2^k]].
import numpy as np
J = np.array([[2., 1.], [0., 2.]])
for k in [2, 3, 4]:
    formula = np.array([[2**k, k * 2**(k-1)], [0, 2**k]], float)
    print(f"k={k}: J^k =", np.linalg.matrix_power(J, k).tolist(),
          "| formula =", formula.tolist())
# k=2: J^k = [[4.0, 4.0], [0.0, 4.0]]  | formula = [[4.0, 4.0], [0.0, 4.0]]
# k=3: J^k = [[8.0, 12.0], [0.0, 8.0]] | formula = [[8.0, 12.0], [0.0, 8.0]]
# k=4: J^k = [[16.0, 32.0], [0.0, 16.0]] | formula = [[16.0, 32.0], [0.0, 16.0]]

The off-diagonal entries $4, 12, 32$ are exactly $k\,2^{k-1}$ for $k = 2,3,4$ — the polynomial-times-geometric growth, matched to the digit.

The same idea handles a block of any size, and the pattern is the heart of why the Jordan form is computationally useful rather than merely classificatory. For a $k \times k$ block $J = \lambda I + N$, the nilpotent part satisfies $N^k = 0$ (its index is the block size), so the binomial expansion of $(\lambda I + N)^m$ keeps only the terms up to $N^{k-1}$:

$$ J^m = \sum_{j=0}^{k-1} \binom{m}{j}\lambda^{m-j} N^{j}. $$

Each power $N^j$ is the matrix with $1$'s on the $j$-th superdiagonal, so this places $\binom{m}{j}\lambda^{m-j}$ on the $j$-th superdiagonal of $J^m$ — a tidy, finite formula. The crucial structural fact, visible directly in the binomial coefficients, is that the entries of $J^m$ are polynomials in $m$ of degree up to $k-1$, times the geometric factor $\lambda^{m}$. A size-$2$ block gives a degree-$1$ polynomial (the $m\lambda^{m-1}$ we saw); a size-$3$ block gives a degree-$2$ polynomial (an $\binom{m}{2}\lambda^{m-2}$ term appears); and so on. The bigger the block, the higher the polynomial — and the polynomial degree is one less than the block size. This is the precise sense in which a defective eigenvalue's contribution to $A^m$ is "$\lambda^m$ with a polynomial correction": the correction's degree measures how defective the eigenvalue is.

The same split powers the matrix exponential, $e^{A} = \sum_{k=0}^\infty A^k/k!$, which Chapter 37 will use to solve $\mathbf{x}' = A\mathbf{x}$. Because $\lambda I$ and $N$ commute, $e^{Jt} = e^{\lambda t I}\,e^{Nt}$, and $e^{Nt}$ is a finite sum (again $N^2 = 0$ for the $2\times 2$ block): $e^{Nt} = I + tN$. So for the $2\times 2$ block,

$$ e^{Jt} = e^{\lambda t}\begin{bmatrix} 1 & t \\ 0 & 1 \end{bmatrix}. $$

Geometric Intuition — Compare a diagonalizable mode and a defective mode evolving in time. A diagonalizable eigenvalue $\lambda$ contributes a pure $e^{\lambda t}$ — clean exponential growth or decay. A defective eigenvalue contributes $e^{\lambda t}$ times a polynomial in $t$, here the factor $t$ in $t\,e^{\lambda t}$. Physically, that polynomial is why a critically-damped system (this chapter's case study) does not just decay — it first creeps (the $t$ grows) before the exponential wins and pulls it to zero. The shear in the Jordan block, invisible to the eigenvalue alone, becomes the slow transient you can see in the system's response. Eigenvalues tell you the rates; the Jordan block tells you about the $t\,e^{\lambda t}$ creep the bare eigenvalue would miss.

Confirm the exponential against scipy, which computes $e^{Jt}$ by a general algorithm (not our formula), so agreement is a genuine check:

# Matrix exponential of the Jordan block, vs the closed form e^{lambda t}[[1,t],[0,1]].
import numpy as np
from scipy.linalg import expm
J, t = np.array([[2., 1.], [0., 2.]]), 0.5
formula = np.exp(2*t) * np.array([[1., t], [0., 1.]])
print("expm(Jt) =", np.round(expm(J*t), 8).tolist())
print("formula  =", np.round(formula,   8).tolist())
# expm(Jt) = [[2.71828183, 1.35914091], [0.0, 2.71828183]]
# formula  = [[2.71828183, 1.35914091], [0.0, 2.71828183]]

The entries match to eight digits: the diagonal $e^{2\cdot 0.5} = e \approx 2.71828183$, and the off-diagonal $t\,e^{2t} = 0.5\,e \approx 1.35914091$. This $t\,e^{\lambda t}$ term, which only a defective matrix produces, is precisely what Chapter 37 needs to write the solution of a linear ODE system with a repeated eigenvalue. Everything in this chapter has been building the tool that makes that solution possible. We will see the payoff in full when we connect it to differential equations, where the matrix exponential turns a coupled system into the single formula $\mathbf{x}(t) = e^{At}\mathbf{x}(0)$.

FAQ: Why not just avoid defective matrices by perturbing them slightly?

You can, and §36.8 shows the perturbation makes the eigenvalues distinct and the matrix diagonalizable — which is tempting. But there are two reasons not to. First, honesty of the model: a critically-damped shock absorber is defective by design (it is tuned to the boundary between oscillating and sluggish), and perturbing it changes the physics into either under- or over-damping — you would be solving a different problem. Second, numerical danger: the perturbed matrix is diagonalizable only in the technical sense; its eigenvector matrix is wildly ill-conditioned (§36.8 shows a condition number near $10^5$ for a perturbation of $10^{-10}$), so computing with it is less accurate, not more. The Jordan structure is real information about the system, and the $t\,e^{\lambda t}$ behavior it predicts is exactly what you would measure in the lab. Perturbing it away discards the truth to dodge an inconvenience.

36.8 Is the Jordan form actually used in practice? An honest note on numerical fragility

Here is the candid disclosure the book owes you, and it points straight to Chapter 38. The Jordan canonical form is a triumph of theory — it classifies every matrix up to similarity, and it is indispensable for proving things and for understanding defective behavior. But it is almost never computed numerically, because it is catastrophically unstable under the rounding errors of floating-point arithmetic. The reason is exactly the gap this chapter celebrates.

Defectiveness is infinitely fragile. A defective matrix sits on a razor's edge: the slightest perturbation splits its repeated eigenvalue into distinct ones and makes the matrix diagonalizable. Watch our shear $\begin{bmatrix}2&1\\0&2\end{bmatrix}$ wobble under a perturbation of size $10^{-10}$ in the corner that "should" be zero:

# Defectiveness is infinitely fragile: a tiny perturbation splits the double eigenvalue.
import numpy as np
A = np.array([[2., 1.], [0., 2.]])
A[1, 0] += 1e-10                        # perturb the zero entry by 10^-10
vals, vecs = np.linalg.eig(A)
print("perturbed eigenvalues:", vals)               # ~ 2.00001 and 1.99999
print("split size:", abs(vals[0] - vals[1]))        # ~ 2e-5, NOT 1e-10!
print("eigenvector-matrix condition number:", np.linalg.cond(vecs))  # ~ 1e5

The eigenvalues split by about $2\times 10^{-5}$ — not $10^{-10}$. A perturbation of size $\epsilon$ moves a defective eigenvalue by roughly $\sqrt{\epsilon}$: shrink the input error by a factor of $100$ and the output error shrinks by only $10$. This square-root sensitivity is the fingerprint of a defective eigenvalue, and it means floating-point computation — which carries roughly $10^{-16}$ relative error in every entry — can never reliably tell whether a matrix is exactly defective or merely close to it. The eigenvector matrix becomes nearly singular (condition number $\approx 10^5$ here, and unbounded as the perturbation shrinks), so the very $P$ a Jordan computation needs is the least trustworthy object in numerical linear algebra.

Warning — the Jordan form is numerically fragile; use it for theory, not computation. Because an arbitrarily small perturbation destroys defectiveness, no general-purpose numerical library computes the Jordan form of a floating-point matrix — numpy does not even offer it. Robust software answers the questions the Jordan form would answer using stable surrogates: the Schur decomposition ($A = QTQ^{*}$ with $Q$ unitary and $T$ upper-triangular — always computable, perfectly stable) replaces it for most purposes, and the singular value decomposition of Chapter 30 replaces eigen-analysis whenever orthogonality is available. Chapter 38 makes this precise with the language of condition number and backward stability; the one-line summary is that the Jordan form's discontinuous dependence on the matrix entries makes it unusable on a finite-precision machine. sympy can compute it exactly (we used it above) precisely because it works with exact rational arithmetic, never rounding — which is also why sympy's Jordan form is for small symbolic matrices, not large numerical ones.

This is not a contradiction with the chapter's main message; it is the mature version of it. The Jordan form is true and important — it is the right way to understand a defective matrix, to compute its powers and exponential symbolically, and to classify transformations up to similarity. It is simply not something you ask a floating-point routine to produce. The same tension runs through all of applied mathematics, and Chapter 38 is devoted to it: exact theory and finite-precision computation are different disciplines, and the linear algebraist must be fluent in both. When you reach the numerical methods that production software actually runs, you will find the Schur and singular-value decompositions doing the Jordan form's job, stably.

Build Your Toolkit — Implement jordan_chain(A, lam, tol=1e-9) in toolkit/jordan.py, from scratch (pure Python, no numpy in the body — reuse your toolkit/linear_systems.py solver from Chapter 4 and your null-space routine). Given a defective matrix A and a repeated eigenvalue lam, it should: (1) find an eigenvector v by solving the homogeneous system $(A - \lambda I)\mathbf{v} = \mathbf{0}$, choosing one in the column space of $A - \lambda I$ when a chain is needed; (2) solve the inhomogeneous system $(A - \lambda I)\mathbf{w} = \mathbf{v}$ for a generalized eigenvector; (3) assemble $P = [\mathbf{v}\mid\mathbf{w}\mid\dots]$ and return both P and J = P^{-1} A P. Verify on this chapter's $\begin{bmatrix}1&-1\\1&3\end{bmatrix}$ and the $3\times 3$ example that your J matches the hand result and sympy.Matrix(A).jordan_form() (the $J$'s agree; expect your $P$ to differ, per the Computational Note). Add a guard that warns when the eigenvalue's geometric and algebraic multiplicities are equal — then the matrix is diagonalizable and no chain is needed.

FAQ: If we never compute it numerically, why learn the Jordan form at all?

For the same reason we learn the definition of the derivative even though we use the chain rule in practice: it is the conceptual bedrock. The Jordan form is why defective matrices behave as they do — why a critically-damped system creeps, why $A^k$ of a defective matrix carries a polynomial factor, why $e^{At}$ grows like $t\,e^{\lambda t}$, why two matrices with identical eigenvalues can be genuinely different transformations. Every one of those facts is a consequence of the block structure, and you cannot reason about them without it. Numerical analysts also need it to understand the instabilities they design around — you cannot appreciate why the Schur form is the stable substitute without knowing what unstable thing it substitutes for. The Jordan form is to the structure of matrices what the periodic table is to chemistry: rarely the tool you reach for in the lab, always the map of why everything is where it is.

36.9 Summary: the honest completion of the eigenvalue story

We began with a confession from Chapter 25 — not every matrix can be diagonalized — and turned it into a theorem. The defective matrices, those with too few eigenvectors (geometric multiplicity below algebraic, the gap Chapter 24 first measured), are not broken; they are shearing, and the shear is the leftover that no eigenvector can hold. Generalized eigenvectors name that leftover, solving $(A - \lambda I)\mathbf{w} = \mathbf{v}$ to step one rung above an eigenvector. Jordan chains organize the generalized eigenvectors into independent strings, and each chain becomes a Jordan block — an eigenvalue on the diagonal with $1$'s on the superdiagonal, the almost-diagonal record of a matrix that does as much as it can while still refusing to diagonalize. Stacking the blocks gives the Jordan canonical form $A = PJP^{-1}$, which every matrix over $\mathbb{C}$ possesses, unique up to the order of the blocks, with diagonalizable matrices the lucky case where every block has size $1$.

The form matters because it computes what diagonalization cannot: powers $A^k = PJ^kP^{-1}$ and the matrix exponential $e^{At} = Pe^{Jt}P^{-1}$ of a defective matrix, both carrying the tell-tale polynomial-times-exponential terms ($k\lambda^{k-1}$, $t\,e^{\lambda t}$) that a clean diagonal could never produce. That is the door into Chapter 37, where $e^{At}$ solves every linear system of ODEs and the eigenvalues — now with their Jordan structure — decide a physical system's fate. And we closed honestly: the Jordan form is a theoretical triumph and a numerical impossibility, infinitely fragile under rounding, computed in exact arithmetic or not at all, with the stable Schur and singular-value decompositions standing in for it on real machines — a tension Chapter 38 will make precise. The eigenvalue story that began in Chapter 23 with "the vectors a matrix doesn't rotate" is now complete: when there are enough such vectors, you diagonalize; when there are not, you reach for Jordan, and the matrix tells you exactly how much of itself the eigenvectors could not capture.