The Characteristic Polynomial and How to Find Eigenvalues

DataField.Dev

53 min read

> Learning paths. Math majors — read everything, especially the proof that the characteristic polynomial detects singularity and the multiplicity inequality in §24.7. CS / Data Science — focus on the Geometric Intuition, the worked examples with...

Prerequisites

chapter-23-eigenvalues-and-eigenvectors
chapter-11-the-determinant

Learning Objectives

Explain WHY det(A - lambda I) = 0 finds eigenvalues, by connecting it to singularity and the null space.
Form and expand the characteristic polynomial of a 2x2 and a 3x3 matrix.
Find all eigenvalues as roots of the characteristic polynomial, then find each eigenvector by solving (A - lambda I)v = 0.
Distinguish algebraic from geometric multiplicity and recognize a defective matrix.
Use trace = sum of eigenvalues and determinant = product of eigenvalues as a fast check, and justify both.
Explain why by-hand root-finding does not scale and what iterative solvers do instead.

In This Chapter

24.1 Why does det(A − λI) = 0 find eigenvalues?
24.2 What is the characteristic polynomial?
24.3 How do you find the eigenvalues and eigenvectors of a 2×2 matrix?
24.4 How do you find the eigenvalues and eigenvectors of a 3×3 matrix?
24.5 What is an eigenspace, and why is it a subspace?
24.6 When does algebraic multiplicity differ from geometric multiplicity?
24.7 How are the two multiplicities related?
24.8 Why is the trace the sum of the eigenvalues and the determinant their product?
24.9 What if a real matrix has no real eigenvalues?
24.10 Why doesn't the characteristic polynomial scale to large matrices?
24.11 What do eigenvalues of a stochastic matrix tell us?
24.12 Putting the method together

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

The Characteristic Polynomial and How to Find Eigenvalues

Learning paths. Math majors — read everything, especially the proof that the characteristic polynomial detects singularity and the multiplicity inequality in §24.7. CS / Data Science — focus on the Geometric Intuition, the worked examples with their numpy checks, and the trace/determinant shortcuts; the Math-Major Sidebars are optional. Physics / Engineering — focus on the geometry of "which numbers collapse the matrix," the 2×2 and 3×3 recipes you will use constantly, and the vibration application in the case studies. In this chapter we turn the idea of an eigenvalue into a method for finding one.

In Chapter 23 we met eigenvectors as the invariant directions of a transformation — the rare arrows that a matrix only stretches or shrinks, never knocks off their own line — and eigenvalues as the stretch factors. We could see them in the visualizer. But seeing is not solving. Stare at a general $3 \times 3$ matrix and the invariant directions are nowhere obvious. We need a procedure: given $A$, compute its eigenvalues and eigenvectors. This chapter delivers that procedure, and — true to the spirit of the book — it does not arrive as a magic recipe. It falls straight out of an idea you already own: the determinant detects when a matrix collapses space.

The engine is a single polynomial attached to every square matrix, the characteristic polynomial. Finding how to find eigenvalues comes down to finding the roots of this one polynomial. We will build it from scratch, expand it for a $2 \times 2$ and a $3 \times 3$ matrix by hand, read the eigenvalues off as its roots, recover each eigenvector by solving a homogeneous system, and confront the subtle and important gap between how many times a number is a root (algebraic multiplicity) and how many independent eigenvectors it actually has (geometric multiplicity). We will close with two fast sanity checks — the trace and the determinant — and an honest warning about why the hand method you are about to learn is not what your computer actually does.

24.1 Why does det(A − λI) = 0 find eigenvalues?

Start, as always, with the picture. An eigenvector $\mathbf{v}$ of $A$ is a nonzero vector that does not change direction under $A$: applying $A$ leaves it pointing along the same line, merely scaled by some number $\lambda$. That is the eigen-equation from Chapter 23,

$$ A\mathbf{v} = \lambda \mathbf{v}, \qquad \mathbf{v} \neq \mathbf{0}. $$

The whole difficulty is that two unknowns are tangled together: the eigenvalue $\lambda$ (a scalar) and the eigenvector $\mathbf{v}$ (a direction). If we knew $\lambda$ we could solve for $\mathbf{v}$; if we knew $\mathbf{v}$ we could read off $\lambda$. The trick is to untangle them, and the way to do it is to move everything to one side.

Subtract $\lambda \mathbf{v}$ from both sides. Since $\lambda \mathbf{v} = \lambda I \mathbf{v}$ (multiplying by the identity changes nothing), we can factor out $\mathbf{v}$:

$$ A\mathbf{v} - \lambda \mathbf{v} = \mathbf{0} \quad\Longrightarrow\quad A\mathbf{v} - \lambda I \mathbf{v} = \mathbf{0} \quad\Longrightarrow\quad (A - \lambda I)\,\mathbf{v} = \mathbf{0}. $$

Pause on what just happened. We have rewritten "$\mathbf{v}$ is an eigenvector with eigenvalue $\lambda$" as "$\mathbf{v}$ is a nonzero solution of the homogeneous system $(A - \lambda I)\mathbf{v} = \mathbf{0}$." The matrix $A - \lambda I$ is just $A$ with $\lambda$ subtracted from every diagonal entry. Everything now hinges on one question about that matrix: can the equation $(A - \lambda I)\mathbf{v} = \mathbf{0}$ have a solution other than $\mathbf{v} = \mathbf{0}$?

The Key Insight — A number $\lambda$ is an eigenvalue of $A$ exactly when the matrix $A - \lambda I$ is singular. Eigenvalues are not found by guessing directions; they are found by hunting for the values of $\lambda$ that break $A - \lambda I$.

Here is where Chapters 11 and 13 pay off. Recall the chain of equivalences we built for any square matrix $M$:

$$ M\mathbf{v} = \mathbf{0} \text{ has a nonzero solution} \;\Longleftrightarrow\; N(M) \neq \{\mathbf{0}\} \;\Longleftrightarrow\; M \text{ is singular} \;\Longleftrightarrow\; \det(M) = 0. $$

The first link is the definition of the null space $N(M)$ from Chapter 13: a nontrivial null space means a nonzero solution exists. The last link is the headline theorem of Chapter 11: a matrix is singular — non-invertible, rank-deficient, collapsing space flat — precisely when its determinant is zero. Strung together, with $M = A - \lambda I$, they say:

$$ \boxed{\;\lambda \text{ is an eigenvalue of } A \;\Longleftrightarrow\; \det(A - \lambda I) = 0.\;} $$

This is not a definition pulled from a hat. It is the eigen-equation, rewritten as a singularity condition, cashed out through the determinant. We will call $\det(A - \lambda I) = 0$ the characteristic equation of $A$.

Geometric Intuition — Think of $\lambda$ as a dial you can turn. For most settings of the dial, $A - \lambda I$ is a healthy invertible matrix: the only vector it sends to $\mathbf{0}$ is $\mathbf{0}$ itself, so there is no eigenvector. But at a few special settings the matrix $A - \lambda I$ collapses — its image drops to a lower dimension, its determinant passes through zero, and a whole line (or plane) of vectors suddenly gets crushed to the origin. Those crushed directions are the eigenvectors, and the dial settings that crush them are the eigenvalues. The determinant $\det(A - \lambda I)$, plotted against $\lambda$, is the graph of "how much volume $A - \lambda I$ scales by"; its zero-crossings are the eigenvalues.

Why we cannot just set det(A) = 0

A common first reaction is: "If singular matrices are the interesting ones, why not just compute $\det(A)$?" Because $\det(A) = 0$ asks whether zero is an eigenvalue of $A$ — whether $A$ itself crushes some direction. That is a single yes/no question. We want all the eigenvalues, so we must let $\lambda$ vary and ask when $A - \lambda I$ is singular for each possible $\lambda$. Turning $\lambda$ into a variable is the whole move; it converts one determinant into a function of $\lambda$, and that function is a polynomial.

Common Pitfall — Students sometimes write $\det(A - \lambda I) = \det(A) - \det(\lambda I)$. The determinant is not additive: $\det(X + Y) \neq \det(X) + \det(Y)$ in general (we saw this fail back in Chapter 11). You must form the matrix $A - \lambda I$ first — subtract $\lambda$ from each diagonal entry — and only then take the determinant of the result. There is no shortcut around building the matrix.

There is a second, equally important way to read the singularity condition, in the language of the four fundamental subspaces from Part III. Saying $A - \lambda I$ is singular says its null space is nontrivial — $\dim N(A - \lambda I) \ge 1$ — which by the Rank–Nullity theorem of Chapter 14 says its rank has dropped: $\operatorname{rank}(A - \lambda I) < n$. So an eigenvalue is exactly a value of $\lambda$ that makes $A - \lambda I$ rank-deficient, and the eigenspace is the resulting nonzero null space. We will return to this in §24.5; for now, hold onto the triple identity eigenvalue $\Leftrightarrow$ singular $\Leftrightarrow$ rank-deficient $\Leftrightarrow$ nontrivial null space. Everything in this chapter is one of these four faces of the same fact.

24.2 What is the characteristic polynomial?

Now let $\lambda$ vary and watch what $\det(A - \lambda I)$ becomes. The matrix $A - \lambda I$ has $\lambda$ sitting on its diagonal, so when you expand its determinant by the cofactor method of Chapter 11, every term is a product of entries, and the diagonal entries each contribute a factor of the form $(a_{ii} - \lambda)$. Multiplying out an $n \times n$ determinant whose diagonal carries $n$ copies of $-\lambda$ produces a polynomial in $\lambda$ of degree exactly $n$.

Definition (characteristic polynomial). For an $n \times n$ matrix $A$, the characteristic polynomial is $$ p_A(\lambda) = \det(A - \lambda I). $$ It is a polynomial of degree $n$ in $\lambda$. The characteristic equation is $p_A(\lambda) = 0$, and its roots are exactly the eigenvalues of $A$. The full list of eigenvalues (with repetition) is called the spectrum of $A$.

A few structural facts about $p_A(\lambda)$ that we will lean on throughout the chapter — each one provable directly from the cofactor expansion, and each one verified in our worked examples:

The degree is $n$, so an $n \times n$ matrix has at most $n$ distinct eigenvalues, and exactly $n$ eigenvalues counted with multiplicity over the complex numbers (this is the Fundamental Theorem of Algebra; we lean on it in Chapter 26 when real roots run out).
The leading term is $(-\lambda)^n = (-1)^n \lambda^n$. Some authors instead define the characteristic polynomial as $\det(\lambda I - A)$, which is the same equation flipped in sign so the leading coefficient is $+1$. The roots are identical; only the overall sign differs. We use $\det(A - \lambda I)$ throughout, matching the singularity story above.
The constant term is $p_A(0) = \det(A - 0\cdot I) = \det(A)$. So the determinant of $A$ is just the characteristic polynomial evaluated at zero — a fact that will hand us the "$\det = $ product of eigenvalues" identity in §24.8.

Historical Note — Augustin-Louis Cauchy studied the polynomial $\det(A - \lambda I)$ in the 1820s–1840s in connection with the principal axes of quadric surfaces and the secular (slow, long-term) perturbations of planetary orbits — which is why physicists still sometimes call it the secular equation. The word "eigenvalue" itself is a half-translation of David Hilbert's German Eigenwert ("characteristic/own value"), which entered English in the early twentieth century. [verify] The name "characteristic" predates the German term and reflects that this polynomial captures the matrix's coordinate-free character.

What does "the characteristic polynomial of a matrix" buy us?

It converts a geometry problem into an algebra problem we already know how to attack. Finding invariant directions sounds hopeless for a big matrix; finding the roots of a polynomial is a problem with a long, well-developed theory. The characteristic polynomial is the bridge from "what does this matrix do?" to "solve this equation." That is the entire content of how to find eigenvalues by hand: build $p_A(\lambda)$, find its roots.

Math-Major Sidebar (Cayley–Hamilton). The characteristic polynomial has a startling property worth previewing: a matrix satisfies its own characteristic equation. If $p_A(\lambda) = \lambda^2 - 7\lambda + 10$ as for our first example, then substituting the matrix $A$ for the scalar $\lambda$ (and reading the constant term as a multiple of $I$) gives the zero matrix: $$ p_A(A) = A^2 - 7A + 10I = 0. $$ You can check this directly: $A^2 = \begin{bmatrix} 18 & 7 \\ 14 & 11\end{bmatrix}$, and $\begin{bmatrix} 18 & 7 \\ 14 & 11\end{bmatrix} - 7\begin{bmatrix} 4 & 1 \\ 2 & 3\end{bmatrix} + 10\begin{bmatrix} 1 & 0 \\ 0 & 1\end{bmatrix}$ is the zero matrix. This is the Cayley–Hamilton theorem, true for every square matrix. (A tempting "proof" — plug $A$ into $p_A(\lambda) = \det(A - \lambda I)$ to get $\det(A - A) = \det(0) = 0$ — is wrong, because it conflates the scalar $\lambda$ with the matrix $A$ inside the determinant; the real proof is more careful. [verify] Arthur Cayley stated it for $2\times2$ and $3\times3$ in 1858; the general case came later.) One immediate payoff: $A^2 = 7A - 10I$ lets you write every power of $A$ as a combination of $A$ and $I$, an idea that powers the matrix-function machinery of Chapter 37.

Check Your Understanding — Without computing anything, how many eigenvalues (counted with multiplicity, allowing complex ones) does a $5 \times 5$ matrix have? And what is the constant term of its characteristic polynomial?

Answer
Exactly 5 eigenvalues counted with multiplicity over $\mathbb{C}$, because $p_A(\lambda)$ has degree 5 and the Fundamental Theorem of Algebra gives a degree-5 polynomial exactly 5 complex roots with multiplicity. The constant term is $p_A(0) = \det(A)$, the determinant of the matrix. (Some of the 5 roots may coincide, and some may be complex even if $A$ is real — see Chapter 26.)

Seeing the characteristic polynomial: the determinant as a dial

The "dial" metaphor of §24.1 is not just a figure of speech — we can plot it. Using the visualizer's parent idea from Chapter 1 (compute $\det$ of a $2 \times 2$ for many values of a parameter), let us turn the dial $\lambda$ across a range and graph $\det(A - \lambda I)$ for our first matrix $A = \begin{bmatrix} 4 & 1 \\ 2 & 3\end{bmatrix}$. The places where the curve crosses zero are the eigenvalues — the dial settings that make the matrix singular.

# Plot the characteristic polynomial of A; its zero-crossings are the eigenvalues.
import numpy as np
import matplotlib.pyplot as plt
A = np.array([[4, 1], [2, 3]], dtype=float)
lams = np.linspace(0, 7, 400)
p = [np.linalg.det(A - L * np.eye(2)) for L in lams]   # det(A - lambda I) vs lambda
plt.plot(lams, p, "C1-", lw=2)
plt.axhline(0, color="gray", lw=0.8)
plt.scatter([2, 5], [0, 0], color="C3", zorder=3)       # the eigenvalues, where p = 0
plt.xlabel(r"$\lambda$"); plt.ylabel(r"$\det(A-\lambda I)$"); plt.grid(alpha=0.3)
plt.title("Characteristic polynomial: zeros at the eigenvalues 2 and 5")
plt.show()

Figure 24.1. The graph of $p_A(\lambda) = \det(A - \lambda I) = \lambda^2 - 7\lambda + 10$ is an upward parabola that dips below the axis between its two roots. Alt-text: a parabola crossing the horizontal axis at $\lambda = 2$ and $\lambda = 5$, with those two crossing points marked in red as the eigenvalues; between them the curve is negative, meaning $A - \lambda I$ has flipped orientation there. The picture makes the abstract claim tangible: for $\lambda$ between $2$ and $5$ the determinant is negative (the matrix $A - \lambda I$ reverses orientation), outside that interval it is positive, and exactly at $2$ and $5$ it is zero — the two moments the transformation collapses a direction to nothing. Finding eigenvalues is finding where this curve touches the axis.

Geometric Intuition — Read the parabola as a story about volume. The value $\det(A - \lambda I)$ is the signed area-scaling factor of the matrix $A - \lambda I$ (Chapter 11). As $\lambda$ climbs from $0$, that scaling factor shrinks, hits zero at $\lambda = 2$ (first collapse — the matrix flattens the plane onto the line $E_2$), goes negative (orientation flipped), returns to zero at $\lambda = 5$ (second collapse, onto $E_5$), then grows positive again. The eigenvalues are precisely the two instants of collapse. For a $3 \times 3$ the curve is a cubic with up to three crossings; for an $n \times n$, a degree-$n$ curve with up to $n$ real crossings — and fewer when some roots are complex (Chapter 26), which is the graph failing to reach the axis.

24.3 How do you find the eigenvalues and eigenvectors of a 2×2 matrix?

Let us make all of this concrete with the smallest interesting case. Take

$$ A = \begin{bmatrix} 4 & 1 \\ 2 & 3 \end{bmatrix}. $$

Before any algebra, ask the geometric question: this matrix stretches the plane, and we are looking for the two lines it stretches without rotating. The procedure has three steps, and you will repeat it for every matrix you ever diagonalize: (1) form the characteristic polynomial, (2) find its roots — the eigenvalues, (3) for each eigenvalue, solve $(A - \lambda I)\mathbf{v} = \mathbf{0}$ for the eigenvector.

Step 1 — Form the characteristic polynomial

Subtract $\lambda$ from each diagonal entry to build $A - \lambda I$, then take its determinant. For a $2 \times 2$ matrix the determinant is the familiar "main diagonal minus off-diagonal" from Chapter 11:

$$ A - \lambda I = \begin{bmatrix} 4 - \lambda & 1 \\ 2 & 3 - \lambda \end{bmatrix}, \qquad p_A(\lambda) = \det(A - \lambda I) = (4 - \lambda)(3 - \lambda) - (1)(2). $$

Multiply it out carefully:

$$ (4 - \lambda)(3 - \lambda) - 2 = (12 - 4\lambda - 3\lambda + \lambda^2) - 2 = \lambda^2 - 7\lambda + 10. $$

So $p_A(\lambda) = \lambda^2 - 7\lambda + 10$. Notice the coefficients are not random: the $\lambda^1$ coefficient is $-7 = -(4+3) = -\operatorname{tr}(A)$, and the constant term is $10 = (4)(3) - (1)(2) = \det(A)$. That is no accident — it is the general $2 \times 2$ pattern.

The Key Insight — For any $2 \times 2$ matrix, the characteristic polynomial is always $$ p_A(\lambda) = \lambda^2 - \operatorname{tr}(A)\,\lambda + \det(A). $$ The trace and the determinant — two numbers you can read off in seconds — completely determine the eigenvalues of a $2 \times 2$. Memorize this one; you will use it constantly.

Step 2 — Find the roots (the eigenvalues)

Set the polynomial to zero and factor (or use the quadratic formula):

$$ \lambda^2 - 7\lambda + 10 = 0 \quad\Longrightarrow\quad (\lambda - 5)(\lambda - 2) = 0 \quad\Longrightarrow\quad \lambda_1 = 5, \;\; \lambda_2 = 2. $$

The two eigenvalues are $5$ and $2$. Both are real and distinct, which (as we will confirm) means $A$ has two independent eigen-directions and behaves as nicely as a $2 \times 2$ can. As a lightning check: $5 + 2 = 7 = \operatorname{tr}(A)$ and $5 \times 2 = 10 = \det(A)$. Both identities hold — we will prove in §24.8 that they always must.

Geometric Intuition — These two numbers already tell you the whole qualitative story of what $A$ does, before we find a single eigenvector. There are two special lines through the origin; along one, $A$ stretches by a factor of $5$, along the other by a factor of $2$. Both factors are positive, so neither line is flipped — the transformation is a pure (if unequal) stretch in two oblique directions. The product of the stretch factors, $5 \times 2 = 10$, is the determinant: $A$ multiplies areas by $10$, exactly as the visualizer of Chapter 11 would show by drawing the unit square ballooning to ten times its size. Eigenvalues are the per-direction stretch factors; their product is the total volume change.

Step 3 — Find the eigenvectors

Each eigenvalue makes $A - \lambda I$ singular; the eigenvectors for that $\lambda$ are exactly the nonzero vectors in its null space $N(A - \lambda I)$, computed by the elimination of Chapter 4.

Eigenvalue $\lambda_1 = 5$. Substitute $\lambda = 5$ into $A - \lambda I$:

$$ A - 5I = \begin{bmatrix} 4 - 5 & 1 \\ 2 & 3 - 5 \end{bmatrix} = \begin{bmatrix} -1 & 1 \\ 2 & -2 \end{bmatrix}. $$

This matrix is singular (its determinant is $(-1)(-2) - (1)(2) = 0$, exactly as designed). The two rows are dependent — the second is $-2$ times the first — so we keep just one equation. Reading off the top row, $(A - 5I)\mathbf{v} = \mathbf{0}$ means

$$ -v_1 + v_2 = 0 \quad\Longrightarrow\quad v_2 = v_1. $$

Any vector with equal components works; setting $v_1 = 1$ gives the eigenvector

$$ \mathbf{v}_1 = \begin{bmatrix} 1 \\ 1 \end{bmatrix} \quad (\text{for } \lambda_1 = 5). $$

Sanity check the eigen-equation directly: $A\mathbf{v}_1 = \begin{bmatrix} 4+1 \\ 2+3 \end{bmatrix} = \begin{bmatrix} 5 \\ 5 \end{bmatrix} = 5\mathbf{v}_1$. The matrix stretched $(1,1)$ by exactly $5$ without turning it.

Eigenvalue $\lambda_2 = 2$. Substitute $\lambda = 2$:

$$ A - 2I = \begin{bmatrix} 2 & 1 \\ 2 & 1 \end{bmatrix}. $$

Again singular, again dependent rows (here they are identical). The top row gives $2v_1 + v_2 = 0$, so $v_2 = -2v_1$. Setting $v_1 = 1$:

$$ \mathbf{v}_2 = \begin{bmatrix} 1 \\ -2 \end{bmatrix} \quad (\text{for } \lambda_2 = 2). $$

Check: $A\mathbf{v}_2 = \begin{bmatrix} 4 - 2 \\ 2 - 6 \end{bmatrix} = \begin{bmatrix} 2 \\ -4 \end{bmatrix} = 2\mathbf{v}_2$. Stretched by $2$, no turning.

Common Pitfall — An eigenvector is never unique — only its direction is. Every nonzero scalar multiple of $\mathbf{v}_1$, such as $(2,2)$ or $(-1,-1)$, is also an eigenvector for $\lambda_1 = 5$, because if $A\mathbf{v} = \lambda \mathbf{v}$ then $A(c\mathbf{v}) = \lambda(c\mathbf{v})$. So "the eigenvector" is shorthand for "any nonzero vector on the eigen-line." What we are really computing is the eigenspace $E_\lambda = N(A - \lambda I)$, the whole line (or subspace) of solutions. Reporting a single representative is fine, but never claim it is the only one — and never report $\mathbf{0}$, which lives in every null space but is barred from being an eigenvector by definition.

numpy verification

Let us confirm the hand result. A reminder that bites here for the first time in this chapter: mathematics numbers the eigenvalues $\lambda_1, \lambda_2$; numpy returns them in an array indexed from $0$, and in no guaranteed order.

# Eigenvalues and eigenvectors of A = [[4, 1], [2, 3]] -- verify the hand result.
import numpy as np
A = np.array([[4, 1], [2, 3]], dtype=float)
vals, vecs = np.linalg.eig(A)        # vals: 1-D array; vecs: columns are eigenvectors
print("eigenvalues :", vals)          # -> [5. 2.]
print("eigenvector columns:\n", vecs) # each column matches one eigenvalue, same index
print("trace, det  :", np.trace(A), round(np.linalg.det(A)))   # -> 7.0  10

Output:

eigenvalues : [5. 2.]
eigenvector columns:
 [[ 0.70710678 -0.4472136 ]
 [ 0.70710678  0.89442719]]
trace, det  : 7.0 10

The eigenvalues $5$ and $2$ match. The eigenvectors look different from our $(1,1)$ and $(1,-2)$ only because numpy normalizes every eigenvector to unit length: the first column $(0.7071, 0.7071)$ is exactly $(1,1)/\sqrt{2}$, and the second column $(-0.4472, 0.8944)$ is $(1,-2)/\sqrt{5}$ scaled by $-1$ (recall direction is all that matters). Same eigen-lines, different representatives.

Computational Note — np.linalg.eig returns eigenvalues in no guaranteed order and scales each eigenvector to norm $1$ (and, for a complex pair, with an arbitrary phase). Two consequences for verifying hand work: (1) don't expect vals[0] to be "your" $\lambda_1$ — sort if you need a canonical order, e.g. np.sort(vals); (2) compare eigenvectors by direction, not entry-by-entry. A robust check is np.allclose(A @ v, lam * v) for each pair, which is true regardless of scaling. For a symmetric matrix, reach for np.linalg.eigh, which is faster and returns real eigenvalues in ascending order — we use it in the case studies.

24.4 How do you find the eigenvalues and eigenvectors of a 3×3 matrix?

The $2 \times 2$ case is small enough that the polynomial almost writes itself. The $3 \times 3$ case is where the real method earns its keep, because there is no "look at it and guess" available. The three steps are identical; only the determinant in Step 1 grows. Take

$$ A = \begin{bmatrix} 4 & 0 & 1 \\ -2 & 1 & 0 \\ -2 & 0 & 1 \end{bmatrix}. $$

Step 1 — Form and expand the characteristic polynomial

Subtract $\lambda$ from the diagonal:

$$ A - \lambda I = \begin{bmatrix} 4 - \lambda & 0 & 1 \\ -2 & 1 - \lambda & 0 \\ -2 & 0 & 1 - \lambda \end{bmatrix}. $$

Now expand the determinant by cofactors (Chapter 11). The middle column has two zeros, so expanding along the second column is by far the least work — a reminder from Chapter 11 that you should always expand along the row or column with the most zeros. Only the $(2,2)$ entry $1 - \lambda$ survives, and its cofactor sign is $(-1)^{2+2} = +1$:

$$ \det(A - \lambda I) = (1 - \lambda)\,\det\!\begin{bmatrix} 4 - \lambda & 1 \\ -2 & 1 - \lambda \end{bmatrix}. $$

The inner $2 \times 2$ determinant is

$$ (4 - \lambda)(1 - \lambda) - (1)(-2) = (4 - 5\lambda + \lambda^2) + 2 = \lambda^2 - 5\lambda + 6. $$

So

$$ p_A(\lambda) = (1 - \lambda)(\lambda^2 - 5\lambda + 6). $$

We could stop here and read roots off the factored form, but let us also multiply it out so you can see the degree-3 polynomial in full and check the coefficient pattern:

$$ (1 - \lambda)(\lambda^2 - 5\lambda + 6) = -\lambda^3 + 6\lambda^2 - 11\lambda + 6. $$

Two coefficients you can predict and verify: the constant term is $6 = \det(A)$ (the determinant — expand $A$ itself and you get $6$), and the $\lambda^2$ coefficient is $+6 = \operatorname{tr}(A) = 4 + 1 + 1$ (the trace appears with a $+$ sign here because the leading term is $-\lambda^3$). We will explain both patterns in §24.8.

Step 2 — Find the roots (the eigenvalues)

The factored form makes the roots immediate. From $(1 - \lambda) = 0$ we get $\lambda = 1$, and from $\lambda^2 - 5\lambda + 6 = (\lambda - 2)(\lambda - 3) = 0$ we get $\lambda = 2$ and $\lambda = 3$:

$$ \lambda_1 = 1, \qquad \lambda_2 = 2, \qquad \lambda_3 = 3. $$

Three distinct real eigenvalues. Quick check: $1 + 2 + 3 = 6 = \operatorname{tr}(A)$ and $1 \cdot 2 \cdot 3 = 6 = \det(A)$. Both hold.

The Key Insight — Factoring beats expanding. When a cofactor expansion hands you a product of factors — as the zero-rich middle column did here — resist multiplying it out. The factored form $ (1-\lambda)(\lambda^2 - 5\lambda + 6)$ displays one root for free and reduces the rest to a quadratic. For a general $3 \times 3$ with no convenient zeros you will get a genuine cubic and must hunt for a rational root (test divisors of the constant term) before factoring — which is exactly where the hand method starts to hurt (§24.9).

Step 3 — Find the three eigenvectors

For each $\lambda$, solve $(A - \lambda I)\mathbf{v} = \mathbf{0}$ by row reduction.

Eigenvalue $\lambda_1 = 1$. Form $A - I$:

$$ A - I = \begin{bmatrix} 3 & 0 & 1 \\ -2 & 0 & 0 \\ -2 & 0 & 0 \end{bmatrix}. $$

The second and third rows both say $-2v_1 = 0$, so $v_1 = 0$. The first row then says $3v_1 + v_3 = 0 \Rightarrow v_3 = 0$. Nothing constrains $v_2$, so it is the free variable. Setting $v_2 = 1$:

$$ \mathbf{v}_1 = \begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix} \quad (\lambda_1 = 1). $$

Check: $A\mathbf{v}_1 = \begin{bmatrix} 0 \\ 1 \\ 0 \end{bmatrix} = 1 \cdot \mathbf{v}_1$. The standard basis vector $\mathbf{e}_2$ is fixed — unsurprising, since the middle column of $A$ is exactly $\mathbf{e}_2$.

Eigenvalue $\lambda_2 = 2$. Form $A - 2I$:

$$ A - 2I = \begin{bmatrix} 2 & 0 & 1 \\ -2 & -1 & 0 \\ -2 & 0 & -1 \end{bmatrix}. $$

Row-reduce. Add row 1 to row 3: $(-2,0,-1) + (2,0,1) = (0,0,0)$ — the third row vanishes, confirming singularity. We are left with two equations:

$$ 2v_1 + v_3 = 0, \qquad -2v_1 - v_2 = 0. $$

From the first, $v_3 = -2v_1$; from the second, $v_2 = -2v_1$. With $v_1 = 1$ this gives $(1, -2, -2)$, but to keep the components tidy let us instead pick $v_1 = -1$, yielding

$$ \mathbf{v}_2 = \begin{bmatrix} -1 \\ 2 \\ 2 \end{bmatrix} \quad (\lambda_2 = 2). $$

Check: $A\mathbf{v}_2 = \begin{bmatrix} -4 + 0 + 2 \\ 2 + 2 + 0 \\ 2 + 0 + 2 \end{bmatrix} = \begin{bmatrix} -2 \\ 4 \\ 4 \end{bmatrix} = 2\mathbf{v}_2$.

Eigenvalue $\lambda_3 = 3$. Form $A - 3I$:

$$ A - 3I = \begin{bmatrix} 1 & 0 & 1 \\ -2 & -2 & 0 \\ -2 & 0 & -2 \end{bmatrix}. $$

Add $2\times$ row 1 to row 3: $(-2,0,-2) + (2,0,2) = (0,0,0)$ — third row gone. The remaining equations are

$$ v_1 + v_3 = 0, \qquad -2v_1 - 2v_2 = 0. $$

So $v_3 = -v_1$ and $v_2 = -v_1$. With $v_1 = -1$ (again chosen for tidy signs):

$$ \mathbf{v}_3 = \begin{bmatrix} -1 \\ 1 \\ 1 \end{bmatrix} \quad (\lambda_3 = 3). $$

Check: $A\mathbf{v}_3 = \begin{bmatrix} -4 + 0 + 1 \\ 2 + 1 + 0 \\ 2 + 0 + 1 \end{bmatrix} = \begin{bmatrix} -3 \\ 3 \\ 3 \end{bmatrix} = 3\mathbf{v}_3$.

We now have the complete eigen-structure of $A$: three eigenvalues $1, 2, 3$ and a representative eigenvector for each. Three distinct eigenvalues forced three independent eigenvectors (we prove that independence in Chapter 25), which means $A$ has a full set of eigen-directions — a fact that will let us diagonalize it next chapter.

What if the cubic doesn't factor nicely?

Our $3 \times 3$ was kind: a zero-rich column handed us one root for free and a clean quadratic for the other two. A general $3 \times 3$ gives a genuine cubic $p_A(\lambda) = -\lambda^3 + c_2\lambda^2 + c_1\lambda + c_0$ with no factoring shortcut, and finding its roots by hand means hunting for a rational root first. The rational root theorem says any rational root, in lowest terms $p/q$, must have $p$ dividing the constant term and $q$ dividing the leading coefficient. For a monic-up-to-sign characteristic polynomial with integer entries, the leading coefficient is $\pm 1$, so any rational root must be an integer dividing $\det(A)$ (the constant term). The practical recipe: list the divisors of $\det(A)$, test each by plugging into $p_A$, and once you find one root $\lambda_1$, divide it out (polynomial long division) to reduce the cubic to a quadratic you can finish with the formula.

This works — but feel how fragile it is. It relies on the matrix having a rational eigenvalue at all, which most real matrices do not. Perturb a single entry of our tidy $A$ by $0.01$ and the eigenvalues become irrational decimals with no divisor to find; the rational-root trick collapses and you are left rooting a cubic numerically. This is the first concrete sign of the wall we hit in §24.9: the by-hand method is a method for carefully chosen small matrices, not for the matrices that actually arise in data and physics. Textbook problems are rigged to factor; reality is not.

Common Pitfall — When you do find an integer root of the cubic, don't forget to divide it out and solve the remaining quadratic — the other two eigenvalues live there, and they may be irrational or complex even when the first root is a clean integer. A frequent error is to find one nice root, declare victory, and miss that a $3 \times 3$ has three eigenvalues. Always account for all $n$ of them (with multiplicity), checking your list against $\operatorname{tr}(A)$ and $\det(A)$.

Check Your Understanding — A $3 \times 3$ matrix has characteristic polynomial $p_A(\lambda) = -\lambda^3 + 6\lambda^2 - 11\lambda + 6$. You spot that $\lambda = 1$ is a root. Use it to find the other two eigenvalues, and confirm the trace and determinant.

Answer
Since $\lambda = 1$ is a root, $(\lambda - 1)$ divides $p_A$. Dividing (or matching coefficients) gives $p_A(\lambda) = -(\lambda - 1)(\lambda^2 - 5\lambda + 6) = -(\lambda - 1)(\lambda - 2)(\lambda - 3)$, so the eigenvalues are $1, 2, 3$. Trace $= 1 + 2 + 3 = 6$, matching the $\lambda^2$ coefficient $+6$; determinant $= 1 \cdot 2 \cdot 3 = 6$, matching the constant term. (This is exactly the polynomial of the worked $3 \times 3$ above — now you can recover it from a single spotted root.)

numpy verification

# Eigenvalues/eigenvectors of the 3x3 A -- verify the hand result.
import numpy as np
A = np.array([[4, 0, 1], [-2, 1, 0], [-2, 0, 1]], dtype=float)
vals, vecs = np.linalg.eig(A)
order = np.argsort(vals.real)              # sort ascending so output is stable
print("eigenvalues :", np.round(vals[order].real, 6))   # -> [1. 2. 3.]
for lam, v in zip(vals[order], vecs[:, order].T):
    print(f"lambda={lam.real:.0f}  unit eigenvector={np.round(v, 4)}")
print("trace, det  :", np.trace(A), round(np.linalg.det(A)))  # -> 6.0  6

Output:

eigenvalues : [1. 2. 3.]
lambda=1  unit eigenvector=[0. 1. 0.]
lambda=2  unit eigenvector=[-0.3333  0.6667  0.6667]
lambda=3  unit eigenvector=[ 0.5774 -0.5774 -0.5774]

Read these against our hand vectors. For $\lambda = 1$, numpy gives exactly $(0,1,0)$. For $\lambda = 2$, $(-0.3333, 0.6667, 0.6667)$ is our $(-1, 2, 2)$ divided by its length $3$. For $\lambda = 3$, $(0.5774, -0.5774, -0.5774)$ is our $(-1, 1, 1)$ divided by $\sqrt{3} \approx 1.732$ and then flipped in sign — numpy happened to return the opposite representative of the same eigen-line, a vivid reminder of the previous Computational Note that sign and scale are arbitrary. Every eigen-line matches; numpy has merely scaled each to unit length (and, for $\lambda=3$, negated it). Trace and determinant both equal $6$, as predicted.

24.5 What is an eigenspace, and why is it a subspace?

We have been computing "the eigenvector," but Step 3 always solved a homogeneous system $(A - \lambda I)\mathbf{v} = \mathbf{0}$, and the solution set of a homogeneous system is never a single vector — it is a whole subspace, the null space. That subspace deserves a name.

Definition (eigenspace). For an eigenvalue $\lambda$ of $A$, the eigenspace $E_\lambda$ is the null space of $A - \lambda I$: $$ E_\lambda = N(A - \lambda I) = \{\mathbf{v} : A\mathbf{v} = \lambda\mathbf{v}\}. $$ It consists of all eigenvectors for $\lambda$ together with the zero vector. Because it is a null space, it is automatically a subspace (Chapter 13): it is closed under addition and scalar multiplication and contains $\mathbf{0}$.

This is the cleanest place to see Part III paying dividends inside Part V. Every fact we proved about null spaces transfers verbatim to eigenspaces. The eigenspace is a genuine subspace, so it has a dimension; that dimension is what we will call the geometric multiplicity. And the machinery for finding a basis of a null space — row-reduce $A - \lambda I$, read off the free variables, write down the special solutions (Chapter 13) — is exactly the machinery for finding all the eigenvectors of $\lambda$ at once. There is nothing new to learn about computing eigenvectors; it is null-space computation applied to $A - \lambda I$.

Geometric Intuition — Picture each eigenspace as a flat through the origin that the transformation leaves in place as a set. For our $3 \times 3$ matrix, $E_1, E_2, E_3$ are three distinct lines through the origin; $A$ slides points along $E_1$ by a factor of $1$ (i.e., not at all), along $E_2$ by $2$, and along $E_3$ by $3$, never lifting a point off its line. A vector that starts inside an eigenspace stays inside it forever, no matter how many times you apply $A$ — which is precisely why eigenspaces are the natural coordinate axes for understanding the transformation.

Real-World Application — In quantum mechanics, the eigenspace of an operator is where measurement lands. Each observables and eigenstates of a physical quantity — energy, spin, position — corresponds to an eigenspace of a Hermitian matrix, and the eigenvalue is the value you read on the instrument. A measured electron with a definite energy occupies that energy's eigenspace; the dimension of the eigenspace is the degeneracy of that energy level. The whole structure of atomic energy levels is a statement about the eigenspaces of a single matrix, the Hamiltonian. We will see in Chapter 27 why Hermitian matrices are guaranteed enough eigenspaces to describe any state.

24.6 When does algebraic multiplicity differ from geometric multiplicity?

So far every eigenvalue has been a simple root, and every eigenspace has been a single line. That tidiness is not guaranteed. The richest — and most error-prone — part of eigenvalue theory is what happens when an eigenvalue repeats, because then two different "counts" attached to that eigenvalue can disagree.

Definition (the two multiplicities). Let $\lambda$ be an eigenvalue of $A$. - Its algebraic multiplicity $m_a(\lambda)$ is the number of times $(\lambda - \lambda_0)$ divides the characteristic polynomial — i.e., how many times $\lambda$ appears as a root. - Its geometric multiplicity $m_g(\lambda)$ is $\dim E_\lambda = \dim N(A - \lambda I)$ — the number of independent eigenvectors for $\lambda$, equivalently the number of free variables when you row-reduce $A - \lambda I$.

For all four eigenvalues we have met so far ($5, 2$ in the $2 \times 2$; $1, 2, 3$ in the $3 \times 3$), each was a simple root ($m_a = 1$) with a one-dimensional eigenspace ($m_g = 1$), so the two counts agreed trivially. Now watch them part ways.

A defective matrix: the shear

Consider the humble shear matrix, which we first met in the visualizer back in Chapter 7:

$$ A = \begin{bmatrix} 2 & 1 \\ 0 & 2 \end{bmatrix}. $$

Eigenvalues. It is upper-triangular, so its determinant is the product of the diagonal and its characteristic polynomial reads straight off the diagonal (a fact worth remembering: the eigenvalues of a triangular matrix are its diagonal entries):

$$ p_A(\lambda) = \det\!\begin{bmatrix} 2 - \lambda & 1 \\ 0 & 2 - \lambda \end{bmatrix} = (2 - \lambda)^2. $$

The only root is $\lambda = 2$, and it is a double root: $(2 - \lambda)^2$. So the algebraic multiplicity is $m_a(2) = 2$. By the degree count, $\lambda = 2$ has "used up" both of the matrix's eigenvalue slots.

Eigenvectors. Form $A - 2I$ and find its null space:

$$ A - 2I = \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix}. $$

The single nontrivial equation is $v_2 = 0$, with $v_1$ free. That is one free variable, so the eigenspace is one-dimensional: $E_2 = \operatorname{span}\{(1,0)\}$. The geometric multiplicity is $m_g(2) = 1$.

Here is the mismatch, in black and white:

$$ m_a(2) = 2 \qquad\text{but}\qquad m_g(2) = 1. $$

The polynomial promised "two" of the eigenvalue $2$, but the matrix delivers only one independent eigen-direction. There is no second eigenvector hiding anywhere; we have found the entire eigenspace. A square matrix that is short on eigenvectors this way — whose geometric multiplicity falls strictly below its algebraic multiplicity for some eigenvalue — is called defective.

The Key Insight — Algebraic multiplicity counts roots of a polynomial; geometric multiplicity counts dimensions of a subspace. They are different questions, and a defective matrix is one where the answers disagree. The shear $\begin{bmatrix} 2 & 1 \\ 0 & 2\end{bmatrix}$ has only a single eigen-line even though $\lambda = 2$ is a double root — the missing direction is exactly why it cannot be diagonalized (Chapter 25).

Geometric Intuition — Why does the shear come up short? Look at it in the visualizer (Chapter 1). The horizontal axis $(1,0)$ is fixed — it is the lone eigenvector. Every other vector gets tilted toward the horizontal: it is partly stretched and partly slid sideways. A pure stretch with a double eigenvalue $2$ — the matrix $2I$ — would fix the entire plane and have a two-dimensional eigenspace. The shear is "$2I$ plus a slide," and the slide destroys the second eigen-direction without changing the eigenvalues. That leftover sliding motion is precisely what Jordan normal form will quarantine into an off-diagonal $1$.

Warning — Repeated eigenvalues do not automatically mean defective. The identity-like matrix $2I = \begin{bmatrix} 2 & 0 \\ 0 & 2 \end{bmatrix}$ also has the double eigenvalue $2$, but $A - 2I$ is the zero matrix, whose null space is all of $\mathbb{R}^2$ — so $m_g(2) = 2 = m_a(2)$ and it is not defective. The lesson: when an eigenvalue repeats, you must compute the dimension of its eigenspace to learn whether the matrix is defective. Never assume; always row-reduce $A - \lambda I$ and count free variables. Simple (multiplicity-1) eigenvalues are always safe — they have $m_a = m_g = 1$ — so only repeated roots require this extra check.

numpy and the defective case

# A defective matrix: double eigenvalue 2, but only ONE independent eigenvector.
import numpy as np
A = np.array([[2, 1], [0, 2]], dtype=float)
vals, vecs = np.linalg.eig(A)
print("eigenvalues  :", vals)                 # -> [2. 2.]   (the double root)
print("eigenvectors :\n", vecs)               # two nearly-parallel columns!
print("rank of eigenvector matrix:", np.linalg.matrix_rank(vecs))  # -> 1

Output:

eigenvalues  : [2. 2.]
eigenvectors :
 [[ 1.0000000e+00 -1.0000000e+00]
 [ 0.0000000e+00  4.4408921e-16]]
rank of eigenvector matrix: 1

numpy dutifully reports the eigenvalue $2$ twice, but look at the eigenvector columns: $(1, 0)$ and $(-1, 4.4 \times 10^{-16})$ — the second is just $-1$ times the first, up to floating-point dust (that $10^{-16}$ is a numerical zero). The rank of the eigenvector matrix is $1$, not $2$: there is genuinely only one independent eigen-direction. numpy cannot conjure a second eigenvector that does not exist; the defect is real, not a bug. We will need the generalized eigenvectors of Jordan normal form (Chapter 36) to supply a sensible "second direction" for such a matrix.

Defectiveness hides — a non-triangular example

The shear was triangular, so its repeated eigenvalue was visible on the diagonal and you might suspect that only triangular matrices misbehave. Not so. Consider

$$ A = \begin{bmatrix} 3 & 1 \\ -1 & 1 \end{bmatrix}, $$

which is not triangular and gives no hint of its eigenvalues at a glance. Run the procedure. The characteristic polynomial is

$$ p_A(\lambda) = (3 - \lambda)(1 - \lambda) - (1)(-1) = (3 - 4\lambda + \lambda^2) + 1 = \lambda^2 - 4\lambda + 4 = (\lambda - 2)^2. $$

(Check: $\operatorname{tr}(A) = 3 + 1 = 4$ and $\det(A) = 3 - (-1) = 4$, so $p_A(\lambda) = \lambda^2 - 4\lambda + 4$, agreeing.) Again a double root $\lambda = 2$, so $m_a(2) = 2$. Now the eigenspace:

$$ A - 2I = \begin{bmatrix} 1 & 1 \\ -1 & -1 \end{bmatrix}. $$

The rows are dependent (the second is $-1$ times the first), leaving the single equation $v_1 + v_2 = 0$, so $v_2 = -v_1$ and there is one free variable. The eigenspace is the single line $E_2 = \operatorname{span}\{(1, -1)\}$, so $m_g(2) = 1 < 2 = m_a(2)$. This matrix is defective too — but you could only discover that by doing the computation, not by inspecting the entries.

# A non-triangular defective matrix: double eigenvalue 2, one eigen-direction.
import numpy as np
A = np.array([[3, 1], [-1, 1]], dtype=float)
vals, vecs = np.linalg.eig(A)
print("eigenvalues :", np.round(vals, 6))  # -> [2. 2.]  (raw: 2.00000002, 1.99999998)
print("eigenvectors:\n", np.round(vecs, 4))

Output:

eigenvalues : [2. 2.]
eigenvectors:
 [[ 0.7071 -0.7071]
 [-0.7071  0.7071]]

The two unit eigenvectors numpy returns, $(0.7071, -0.7071)$ and $(-0.7071, 0.7071)$, are exact negatives of one another — both represent the one eigen-line $\operatorname{span}\{(1,-1)\}$. There is no genuine second direction, exactly as the hand computation found.

Real-World Application — Defective matrices are not pathological curiosities; they sit at the boundary between qualitatively different behaviors and so appear right where engineers care most. A mechanical or electrical system modeled by $\mathbf{x}' = A\mathbf{x}$ is critically damped — the fastest return to rest with no oscillation — exactly when its matrix has a repeated eigenvalue with too few eigenvectors, i.e. is defective (Chapter 37). A car's suspension is tuned toward this critical case: stiffer and it oscillates (complex eigenvalues, Chapter 26), softer and it returns sluggishly. The defective matrix is the knife-edge an engineer designs toward. This is also why the matrix exponential of Chapter 37 needs the Jordan form of Chapter 36 to handle the defective case.

The defective shear and the well-behaved $2I$ show the two multiplicities can either disagree or agree. That raises the natural question: which combinations are possible? Could geometric multiplicity ever exceed algebraic? The answer is a clean, always-true inequality, and it is worth proving because the proof reveals why a missing eigenvector is the obstruction to everything nice in Chapter 25.

Theorem (multiplicity bounds). For every eigenvalue $\lambda$ of a square matrix $A$, $$ 1 \le m_g(\lambda) \le m_a(\lambda). $$ The geometric multiplicity is at least $1$ (an eigenvalue has at least one eigenvector, by definition) and never exceeds the algebraic multiplicity.

Why we care. This single inequality controls the entire theory of diagonalization. A matrix is diagonalizable exactly when the inequality is an equality for every eigenvalue — when no eigen-direction goes missing. The whole drama of the next chapter is whether the eigenvectors add up to a full basis, and this theorem is the bookkeeping that decides it.

Key idea. The lower bound is immediate. For the upper bound: if $\lambda$ has $k$ independent eigenvectors, build a basis starting with those $k$ vectors. In that basis $A$ looks block-triangular with a $k \times k$ block of pure $\lambda$'s, which forces $(\lambda_0 - \lambda)^k$ to divide the characteristic polynomial — so $\lambda$ is a root at least $k$ times, i.e. $m_a \ge k = m_g$.

Math-Major Sidebar (proof of the upper bound). Let $m_g(\lambda) = k$, and pick a basis $\mathbf{v}_1, \dots, \mathbf{v}_k$ of the eigenspace $E_\lambda$. Extend it to a full basis $\mathbf{v}_1, \dots, \mathbf{v}_k, \mathbf{w}_{k+1}, \dots, \mathbf{w}_n$ of $\mathbb{R}^n$ (possible by the basis-extension theorem of Chapter 15). Let $P$ be the invertible matrix with these basis vectors as columns. Because each $\mathbf{v}_i$ satisfies $A\mathbf{v}_i = \lambda \mathbf{v}_i$, the change-of-basis matrix $B = P^{-1}AP$ (Chapter 16) has a very special form: its first $k$ columns are $\lambda \mathbf{e}_i$, so $$ B = \begin{bmatrix} \lambda I_k & C \\ 0 & D \end{bmatrix}, $$ a block upper-triangular matrix with the $k \times k$ scalar block $\lambda I_k$ in the corner ($C$ and $D$ are whatever is left over). Similar matrices share a characteristic polynomial (we prove this in Chapter 25; geometrically, $A$ and $B$ are the same transformation in different coordinates, so they have the same eigenvalues with the same multiplicities). The determinant of a block-triangular matrix is the product of the blocks' determinants (Chapter 11), so $$ p_A(t) = p_B(t) = \det(\lambda I_k - tI_k)\cdot \det(D - tI) = (\lambda - t)^k \cdot \det(D - tI). $$ Thus $(\lambda - t)^k$ divides $p_A(t)$, which means $\lambda$ is a root of multiplicity at least $k$. Therefore $m_a(\lambda) \ge k = m_g(\lambda)$. $\blacksquare$

What this means. Geometric multiplicity is the "honest" count — the number of eigenvectors you can actually produce — and it can only ever fall short of the algebraic count, never overshoot it. When they match for every eigenvalue, you have a full basis of eigenvectors and the matrix diagonalizes; when even one eigenvalue is short, the matrix is defective and diagonalization is impossible. The inequality is the precise diagnostic.

Check Your Understanding — A $3 \times 3$ matrix has characteristic polynomial $-(\lambda - 4)^2(\lambda - 7)$. You compute that the eigenspace for $\lambda = 4$ is a single line. Is the matrix defective? How many independent eigenvectors does it have in total?

Answer
Yes, it is defective. For $\lambda = 4$: $m_a = 2$ (double root) but $m_g = 1$ (a line), so $m_g < m_a$ — that is the definition of defective. For $\lambda = 7$: $m_a = 1$, and a simple eigenvalue always has $m_g = 1$. Total independent eigenvectors $= 1 + 1 = 2$, one short of the $3$ needed for a basis of $\mathbb{R}^3$. The matrix cannot be diagonalized (Chapter 25).

24.8 Why is the trace the sum of the eigenvalues and the determinant their product?

We have used two checks repeatedly — $\sum \lambda_i = \operatorname{tr}(A)$ and $\prod \lambda_i = \det(A)$ — and they have held every time. They are not coincidences or rules of thumb; they are forced by the structure of the characteristic polynomial, and they are among the most useful facts in all of applied linear algebra. Let us state them precisely and see why they are true.

Theorem (trace and determinant from the spectrum). Let $A$ be an $n \times n$ matrix with eigenvalues $\lambda_1, \dots, \lambda_n$ listed with algebraic multiplicity (and including any complex ones). Then $$ \operatorname{tr}(A) = \sum_{i=1}^{n} \lambda_i \qquad\text{and}\qquad \det(A) = \prod_{i=1}^{n} \lambda_i. $$

Why we care. These give you two free, instant checks on any eigenvalue computation, by hand or by machine: add your eigenvalues and you should get the trace; multiply them and you should get the determinant. Beyond checking, they carry meaning — the determinant is the product of eigenvalues, so a matrix is singular ($\det = 0$) exactly when zero is one of its eigenvalues, tying this chapter straight back to §24.1. And the trace, the sum of eigenvalues, turns out to be the total "stretch rate" of the associated flow, which is why it governs stability in Chapter 37.

Key idea. Both facts come from a single move: write the characteristic polynomial two ways — once as a product over its roots, once by expanding the determinant — and compare coefficients. This is exactly the Vieta's-formulas idea you may have seen for quadratics, applied to a degree-$n$ polynomial.

The product → determinant half. Because $\lambda_1, \dots, \lambda_n$ are precisely the roots of $p_A(\lambda) = \det(A - \lambda I)$, and the leading coefficient is $(-1)^n$, the polynomial factors as

$$ p_A(\lambda) = (-1)^n (\lambda - \lambda_1)(\lambda - \lambda_2)\cdots(\lambda - \lambda_n). $$

Now evaluate both descriptions of $p_A$ at $\lambda = 0$. On one side, $p_A(0) = \det(A - 0\cdot I) = \det(A)$ (the constant-term fact from §24.2). On the other,

$$ p_A(0) = (-1)^n (0 - \lambda_1)\cdots(0 - \lambda_n) = (-1)^n (-1)^n \lambda_1 \cdots \lambda_n = \lambda_1 \lambda_2 \cdots \lambda_n. $$

The two $(-1)^n$ factors cancel, leaving $\det(A) = \prod_i \lambda_i$.

The sum → trace half. Compare the coefficient of $\lambda^{n-1}$ in the two forms. Expanding the product $(-1)^n\prod_i(\lambda - \lambda_i)$, the $\lambda^{n-1}$ term arises by choosing $-\lambda_i$ from exactly one factor and $\lambda$ from the other $n-1$; summing these gives a coefficient of $(-1)^n \cdot (-1)\sum_i \lambda_i = (-1)^{n-1}\sum_i\lambda_i$. Expanding the determinant $\det(A - \lambda I)$ instead, the only way to reach degree $n-1$ in $\lambda$ is to take $n-1$ of the diagonal factors $(a_{ii} - \lambda)$ and the lone constant from the remaining one; the highest cross-terms come entirely from the product of diagonal entries $\prod_i(a_{ii} - \lambda)$, whose $\lambda^{n-1}$ coefficient is $(-1)^{n-1}\sum_i a_{ii} = (-1)^{n-1}\operatorname{tr}(A)$. (Any off-diagonal contribution to the determinant is missing at least two diagonal factors and so cannot reach degree $n-1$.) Matching the two coefficients of $\lambda^{n-1}$ gives $\sum_i \lambda_i = \operatorname{tr}(A)$. $\blacksquare$

What this means. The trace and determinant — two numbers computable without ever finding the eigenvalues — already pin down their sum and product. For a $2 \times 2$ that is everything: sum and product determine a quadratic's two roots completely, which is exactly why $p_A(\lambda) = \lambda^2 - \operatorname{tr}(A)\lambda + \det(A)$ from §24.3. For larger matrices the trace and determinant are two equations among $n$ unknowns — not enough to solve, but always enough to catch an arithmetic slip.

Real-World Application — In a dynamical or economic model $\mathbf{x}_{k+1} = A\mathbf{x}_k$, the determinant $\det(A) = \prod_i \lambda_i$ is the factor by which the system contracts or expands volume in state space each step (Chapter 11's volume interpretation, now read through the spectrum). A macroeconomic input–output model is stable — perturbations die out — exactly when every $|\lambda_i| < 1$, and the determinant being small is a necessary symptom. Economists and control engineers routinely glance at $\operatorname{tr}(A)$ and $\det(A)$ to get a first read on stability before computing a single eigenvalue, precisely because of this theorem. We make the stability story rigorous in Chapter 37.

Check Your Understanding — A $2 \times 2$ matrix has $\operatorname{tr}(A) = 5$ and $\det(A) = 6$. Find its eigenvalues without seeing the matrix.

Answer
The characteristic polynomial is $\lambda^2 - 5\lambda + 6 = (\lambda - 2)(\lambda - 3)$, so the eigenvalues are $\boxed{2 \text{ and } 3}$. Check: $2 + 3 = 5 = \operatorname{tr}$ and $2 \times 3 = 6 = \det$. The trace and determinant alone determined the spectrum — no matrix entries required.

24.9 What if a real matrix has no real eigenvalues?

Sometimes the characteristic polynomial has no real roots at all, and the procedure seems to stall. This is not a failure — it is the algebra telling you something geometric. Consider the $90°$ rotation matrix from Chapter 21,

$$ R = \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix}, \qquad p_R(\lambda) = \det\!\begin{bmatrix} -\lambda & -1 \\ 1 & -\lambda \end{bmatrix} = \lambda^2 + 1. $$

Set $\lambda^2 + 1 = 0$ and you get $\lambda = \pm i$ — a complex-conjugate pair, no real solutions. Geometrically this is exactly right: a $90°$ rotation turns every nonzero vector to a new direction, so there is no real invariant line, no real eigenvector to find. The visualizer would show every arrow swinging a quarter turn; nothing holds its line. The characteristic polynomial faithfully reports "no real eigenvalues" by refusing to cross the real axis (its graph $\lambda^2 + 1$ sits entirely above it).

Notice the spectral identities survive into the complex world: $\operatorname{tr}(R) = 0 = i + (-i)$ and $\det(R) = 1 = (i)(-i)$, since $i \cdot (-i) = -i^2 = 1$. The trace-sum and determinant-product of §24.8 hold over $\mathbb{C}$, which is one reason we always count eigenvalues over the complex numbers: it keeps the bookkeeping clean and the Fundamental Theorem of Algebra guaranteeing exactly $n$ of them. A real matrix can perfectly well have complex eigenvalues — and when it does, they reveal a hidden rotation inside the transformation. The complex number $\lambda = \cos\theta + i\sin\theta$ encodes a rotation by angle $\theta$; for our $R$, $\lambda = i = \cos 90° + i\sin 90°$ encodes the quarter turn exactly. This is the entire subject of Chapter 26, Complex Eigenvalues, where we will see that complex eigenvalues are "rotations in disguise" and learn to read the angle and scaling straight off them. For now, the takeaway is simply: do not be alarmed when the roots come out complex — the method is working, and it is telling you the transformation spins.

Common Pitfall — "No real eigenvalues" is not the same as "no eigenvalues" or "the matrix is broken." Over $\mathbb{C}$ every $n \times n$ matrix has exactly $n$ eigenvalues counted with multiplicity. A real matrix simply may have some (or all) of them complex, always arriving in conjugate pairs $a \pm bi$ because the polynomial has real coefficients. Reporting "the rotation matrix has no eigenvalues" is wrong; it has the two eigenvalues $\pm i$, which happen to be complex.

24.10 Why doesn't the characteristic polynomial scale to large matrices?

You now have a complete, correct method: build $p_A(\lambda)$, find its roots, solve a null space for each. It is the right way to understand eigenvalues, and it is the right tool for the $2 \times 2$ and $3 \times 3$ problems you will meet on paper. But I owe you an honest warning, because this method is emphatically not how your computer finds eigenvalues — and the reason cuts to a deep limitation of algebra itself.

The trouble is two-fold. First, there is no formula. Finding eigenvalues means finding polynomial roots, and the Abel–Ruffini theorem (Niels Henrik Abel, 1824; Évariste Galois) proves that the roots of a general polynomial of degree $5$ or higher cannot be written in terms of the coefficients using only $+, -, \times, \div$, and radicals. [verify] So for a $5 \times 5$ matrix or larger there is, in principle, no "quadratic formula" to reach for. Degrees $3$ and $4$ do have closed-form formulas (Cardano's and Ferrari's), but they are so unwieldy and numerically treacherous that no serious software uses them.

Second, and more practically, computing eigenvalues by way of the characteristic polynomial is numerically disastrous. The roots of a polynomial can be wildly sensitive to tiny changes in its coefficients — a phenomenon Wilkinson made famous with a degree-20 polynomial whose roots moved by enormous amounts when one coefficient was nudged in the sixteenth decimal place. [verify] Forming $p_A(\lambda)$ from $A$ and then rooting it amplifies rounding error catastrophically. The polynomial, so illuminating on paper, is one of the worst possible numerical paths to an eigenvalue. This is a preview of the central lesson of Chapter 38 (Numerical Linear Algebra): a method can be mathematically exact and computationally hopeless.

Warning

— Do not implement an eigenvalue solver by forming the characteristic polynomial and calling a root-finder. It is unstable for matrices larger than about $4 \times 4$ and impossible to express in closed form beyond that. The characteristic polynomial is a tool for proof and understanding, not for production computation. Real libraries invert the logic entirely: famously, MATLAB's roots command finds polynomial roots by building a companion matrix and calling the eigenvalue routine on it — the reverse of the textbook recipe.

So what does your computer do? It iterates. Instead of solving for eigenvalues exactly, it generates a sequence of better and better approximations that converge to them — trading the false promise of an exact formula for the reliable reality of fast convergence. We already glimpsed the simplest such method in Chapter 23: power iteration, where repeatedly multiplying any starting vector by $A$ drives it toward the dominant eigenvector (this is the engine of PageRank in Chapter 29). The workhorse behind np.linalg.eig and LAPACK is a far more sophisticated relative, the QR algorithm (John Francis and Vera Kublanovskaya, independently, around 1961). [verify] It repeatedly factors the matrix as $A = QR$ using the Gram–Schmidt / QR decomposition of Chapter 20, then reverses the factors to form $RQ$, and astonishingly this similarity-preserving shuffle makes the matrix march toward upper-triangular form — at which point the eigenvalues sit on the diagonal. We sketch why it works in Chapter 38.

The Key Insight — There is a clean division of labor. The characteristic polynomial gives you the theory: it proves eigenvalues exist, counts them, and explains multiplicity. Iterative algorithms (power iteration, the QR algorithm) give you the practice: they compute eigenvalues of large matrices fast and stably, without ever forming the polynomial. A literate user of linear algebra knows the polynomial for understanding and trusts the iteration for computation.

Build Your Toolkit — Extend toolkit/eigen.py with the direct $2 \times 2$ solver, the one case where a formula is both available and well-behaved. Implement eig_2x2(A) that returns the two eigenvalues by solving the characteristic quadratic $\lambda^2 - \operatorname{tr}(A)\lambda + \det(A) = 0$ with the quadratic formula — inspect the discriminant to handle the real-distinct, repeated, and complex-pair cases (the last returns Python complex numbers; we explain them fully in Chapter 26). Then implement trace_det_check(A, eigenvalues) that verifies $\sum \lambda_i = \operatorname{tr}(A)$ and $\prod \lambda_i = \det(A)$ to a tolerance, exercising the theorem of §24.8 as a self-test. Pure Python only — no numpy in the implementation. Verify against np.linalg.eigvals on a handful of matrices, including the shear (a repeated root) and a rotation (a complex pair). This is the seed of the toolkit's eigen-module; in Chapter 29 you will add power_iteration for the large matrices that eig_2x2 cannot touch.

24.11 What do eigenvalues of a stochastic matrix tell us?

Let us close the conceptual arc by connecting this chapter's machinery to the anchor we have been promising since Chapter 3: PageRank. A stochastic matrix (or Markov matrix) is a square matrix of non-negative entries whose columns each sum to $1$ — it encodes transition probabilities, where column $j$ says where the "stuff" at state $j$ goes next. Watch how the singularity story of §24.1 instantly predicts one of its eigenvalues.

Take the small example

$$ M = \begin{bmatrix} 0.8 & 0.3 \\ 0.2 & 0.7 \end{bmatrix}, $$

a two-state model — say, the fraction of customers loyal to brand A versus brand B from one month to the next. Both columns sum to $1$. Claim: $\lambda = 1$ is always an eigenvalue of such a matrix. Here is the argument, straight from §24.1. A number $\lambda$ is an eigenvalue iff $M - \lambda I$ is singular. Set $\lambda = 1$ and look at the columns of $M - I$: in each column we subtracted $1$ from the diagonal entry, and since the original column summed to $1$, every column of $M - I$ now sums to $0$. But a matrix whose columns all sum to zero is singular — the row vector $(1, 1, \dots, 1)$ times $M - I$ gives the zero row, so the rows are dependent and $\det(M - I) = 0$ (Chapter 11). Therefore $1$ is an eigenvalue, guaranteed, for every stochastic matrix. No computation of the polynomial required.

Confirm with the full procedure. The characteristic polynomial is

$$ p_M(\lambda) = (0.8 - \lambda)(0.7 - \lambda) - (0.3)(0.2) = \lambda^2 - 1.5\lambda + 0.5 = (\lambda - 1)(\lambda - 0.5), $$

so the eigenvalues are $\lambda_1 = 1$ and $\lambda_2 = 0.5$ — and indeed $1$ is among them (trace check: $1 + 0.5 = 1.5 = 0.8 + 0.7$; det check: $1 \times 0.5 = 0.5 = 0.56 - 0.06$). The eigenvector for $\lambda = 1$ solves $(M - I)\mathbf{v} = \mathbf{0}$: from $\begin{bmatrix} -0.2 & 0.3 \\ 0.2 & -0.3\end{bmatrix}$ we get $0.2 v_1 = 0.3 v_2$, i.e. $\mathbf{v} = (3, 2)$, which normalized to sum $1$ is the stationary distribution $(0.6, 0.4)$ — the long-run $60\%$/$40\%$ market split the system settles into.

# Eigenvalues of a stochastic (Markov) matrix; lambda=1 is guaranteed.
import numpy as np
M = np.array([[0.8, 0.3], [0.2, 0.7]], dtype=float)
vals, vecs = np.linalg.eig(M)
print("eigenvalues:", vals)                       # -> [1.  0.5]
i = int(np.argmin(np.abs(vals - 1)))               # find the lambda=1 eigenvector
stationary = vecs[:, i] / vecs[:, i].sum()         # normalize to a probability vector
print("stationary distribution:", np.round(stationary, 4))   # -> [0.6 0.4]

Output:

eigenvalues: [1.  0.5]
stationary distribution: [0.6 0.4]

The eigenvalue $\lambda = 1$ is the steady state — the distribution the chain converges to — and the second eigenvalue $0.5$ controls how fast it gets there (each step shrinks the deviation from steady state by a factor of $0.5$, an idea we make precise with diagonalization in Chapter 25 and exploit in power iteration in Chapter 29). This is PageRank in miniature: Google's ranking is the stationary distribution of a giant stochastic matrix over web pages — the eigenvector for $\lambda = 1$. We seeded that idea in Chapter 3; now you can see why the eigenvalue is exactly $1$ and what its eigenvector means. The same eigenvalue-of-a-data-matrix idea drives PCA, where the eigenvalues of a covariance matrix rank the directions of greatest variance in a dataset — the subject of Chapter 32.

Real-World Application — Beyond web search, the dominant eigenvector of a stochastic matrix is how analysts find long-run equilibria everywhere: the steady-state customer split between competing brands, the equilibrium distribution of a population across geographic regions under migration rates, the stationary genotype frequencies in a population-genetics model, and the limiting distribution of a randomly surfing user. In every case the question "where does this system settle?" is answered by the eigenvector for $\lambda = 1$ — a question this chapter has taught you to pose as a null-space computation on $M - I$.

24.12 Putting the method together

Step back and see the shape of what you have learned, because it is a complete and self-contained procedure — the answer to how to find eigenvalues of any matrix small enough to handle by hand:

Form $A - \lambda I$ by subtracting $\lambda$ from each diagonal entry.
Compute the characteristic polynomial $p_A(\lambda) = \det(A - \lambda I)$ (expand along the row or column with the most zeros).
Solve $p_A(\lambda) = 0$ for the eigenvalues — the roots. Use the trace/determinant identities of §24.8 as an instant check.
For each eigenvalue $\lambda$, find the eigenvectors by computing the null space of $A - \lambda I$ (row-reduce, read off free variables). The dimension of that null space is the geometric multiplicity.
Compare geometric multiplicity to algebraic multiplicity for any repeated eigenvalue, to learn whether the matrix is defective.

Every step rests on machinery you already had: the determinant and singularity from Chapter 11, the null space from Chapter 13, row reduction from Chapter 4. The characteristic polynomial did not introduce a new computational primitive; it organized the ones you owned into a procedure for extracting the invariant directions of Chapter 23.

This chapter is also where the book's defining theme — that geometry and algebra are two views of one object — comes most sharply into focus. Look at how many faces a single eigenvalue wears. It is a root of a polynomial (the algebra of §24.2). It is a stretch factor along an invariant line (the geometry of the visualizer, §24.3). It is a value of $\lambda$ that makes $A - \lambda I$ singular, rank-deficient, and possessed of a nonzero null space (the four-subspaces view of §24.1). It is a contribution to the trace and determinant (the coefficient identities of §24.8). And, for a stochastic matrix, the dominant one is a long-run equilibrium (§24.11). These are not five different things that happen to coincide; they are five descriptions of the same number, and the characteristic polynomial is the hinge that lets you swing between them. A fluent linear algebraist holds all five in view at once — proving with the polynomial, picturing with the stretch, computing with the null space, checking with the trace.

That fluency is exactly what the rest of Part V will demand and reward.

What have we actually accomplished? For our $3 \times 3$ matrix $A = \begin{bmatrix} 4 & 0 & 1 \\ -2 & 1 & 0 \\ -2 & 0 & 1\end{bmatrix}$, we found that in the eigenvector coordinate system — using $(0,1,0)$, $(-1,2,2)$, $(-1,1,1)$ as the new axes — the transformation is just "scale by $1$, scale by $2$, scale by $3$ along the three axes." All the apparent complexity of the original nine numbers collapses into three independent stretches. That is the promise we made in Chapter 23 about what it means to understand "what a matrix really does," and it is the doorway to the next chapter. When a matrix has a full set of eigenvectors — when no eigen-direction goes missing to a defect — those eigenvectors become the columns of a matrix $P$, the eigenvalues become a diagonal matrix $D$, and $A$ factors as $A = PDP^{-1}$. That factorization, diagonalization, is the subject of Chapter 25, and it turns the hard problem of computing $A^{100}$ — the engine of Markov chains, population models, and the long-run behavior of every linear dynamical system — into the easy problem of computing $D^{100}$. The eigenvalues we learned to find in this chapter are the diagonal of $D$; the eigenvectors are the columns of $P$. We have assembled all the parts. Next we put them together.

Prerequisites

Learning Objectives

In This Chapter

The Characteristic Polynomial and How to Find Eigenvalues

24.1 Why does det(A − λI) = 0 find eigenvalues?

Why we cannot just set det(A) = 0

24.2 What is the characteristic polynomial?

What does "the characteristic polynomial of a matrix" buy us?

Seeing the characteristic polynomial: the determinant as a dial

24.3 How do you find the eigenvalues and eigenvectors of a 2×2 matrix?

Step 1 — Form the characteristic polynomial

Step 2 — Find the roots (the eigenvalues)

Step 3 — Find the eigenvectors

numpy verification

24.4 How do you find the eigenvalues and eigenvectors of a 3×3 matrix?

Step 1 — Form and expand the characteristic polynomial

Step 2 — Find the roots (the eigenvalues)

Step 3 — Find the three eigenvectors

What if the cubic doesn't factor nicely?

numpy verification

24.5 What is an eigenspace, and why is it a subspace?

24.6 When does algebraic multiplicity differ from geometric multiplicity?

A defective matrix: the shear

numpy and the defective case

Defectiveness hides — a non-triangular example

24.7 How are the two multiplicities related?

24.8 Why is the trace the sum of the eigenvalues and the determinant their product?

24.9 What if a real matrix has no real eigenvalues?

24.10 Why doesn't the characteristic polynomial scale to large matrices?

Warning

24.11 What do eigenvalues of a stochastic matrix tell us?

24.12 Putting the method together