> Learning paths. Math majors — read everything, especially the linear-map axioms in §35.2, the matrix-of-a-map construction in §35.4, the proof of abstract Rank–Nullity in §35.6, and the isomorphism theorem in §35.8; this chapter completes the arc...
Prerequisites
- chapter-07-matrices-as-functions
- chapter-16-change-of-basis
Learning Objectives
- State the linear transformation definition between abstract vector spaces V and W (additivity and homogeneity) and verify it for concrete maps such as differentiation, the shift, and evaluation.
- Build the matrix of a linear map relative to a chosen basis of the domain and codomain, and explain why that matrix changes with the bases while the transformation does not.
- Define the kernel and image of an abstract linear map and recognize them as the coordinate-free versions of the null space and column space of Chapter 13.
- State and prove the abstract Rank-Nullity theorem, dim ker T + dim im T = dim V, and use it to reason about injectivity, surjectivity, and invertibility.
- Define an isomorphism and explain why every n-dimensional vector space is isomorphic to R^n, which is precisely why coordinates work.
- Represent differentiation on polynomials of degree at most n as a matrix in the monomial basis, verify it against hand differentiation, and show it is nilpotent.
In This Chapter
- 35.1 What is a linear transformation between abstract spaces?
- 35.2 The definition, grounded in differentiation
- 35.3 A field guide to abstract linear maps
- 35.4 What is the matrix of a linear map relative to chosen bases?
- 35.5 Differentiation as a matrix: the anchor worked in full
- 35.6 Kernel and image: the null space and column space, freed from coordinates
- 35.7 Why is the matrix of $T$ basis-dependent? A motivated proof
- 35.8 Rank–Nullity, abstractly — and why every $n$-dimensional space is $\mathbb{R}^n$
- 35.9 Putting it together: operators, and where this goes
- 35.10 Summary and the road ahead
Linear Transformations and Abstract Vector Spaces: The Full Generalization
Learning paths. Math majors — read everything, especially the linear-map axioms in §35.2, the matrix-of-a-map construction in §35.4, the proof of abstract Rank–Nullity in §35.6, and the isomorphism theorem in §35.8; this chapter completes the arc begun in Chapter 5 and is the abstract payoff of Part II. CS / Data Science — focus on the Geometric Intuition callouts, the differentiation-as-a-matrix worked example in §35.5, the kernel/image computations, and the encoding-map application; the deepest sidebars are optional. Physics / Engineering — focus on the operator viewpoint (differentiation, the shift, evaluation), the matrix of an operator and how it changes with basis, and the connection to operators in quantum mechanics in §35.9. This chapter assumes the matrix-as-function idea of Chapter 7, the change-of-basis machinery of Chapter 16, and the null space / column space of Chapter 13 — all of which now return, freed from $\mathbb{R}^n$.
The previous chapter freed the dot product from $\mathbb{R}^n$: geometry, we found, was never about arrows — it was about an operation obeying three rules, and so it travels to functions, sequences, and quantum states. This chapter performs the same liberation on the central object of the whole book. In Chapter 7 we made the foundational move that organizes everything: a matrix is a function that transforms space. A matrix $A$ acts on a vector $\mathbf{x}$ by $\mathbf{x}\mapsto A\mathbf{x}$; its columns are the images of the standard basis vectors; matrix multiplication is composition of these functions. That picture has carried us through eigenvalues, the SVD, and PCA. But it was stated for maps $T:\mathbb{R}^n\to\mathbb{R}^m$, and Chapter 5 already showed us vector spaces with no obvious coordinates at all — polynomials, functions, matrices. What does a "linear function" mean between those?
That is the question of this chapter, and answering it completes the book's first and deepest theme. We will define a linear transformation $T:V\to W$ between any two abstract vector spaces, with no coordinates assumed. We will discover that the moment we choose a basis for $V$ and a basis for $W$, the abstract map collapses into an ordinary matrix — the matrix of $T$ relative to those bases — and that this matrix is exactly the Chapter 16 story: change the bases and the matrix changes, but the transformation underneath is the same object the whole time. The kernel and image of $T$ will turn out to be precisely the null space and column space of Chapter 13, now living in abstract spaces; the rank–nullity theorem you proved for matrices in Chapter 14 will reappear, proved abstractly, as $\dim\ker T+\dim\operatorname{im}T=\dim V$; and we will see why every $n$-dimensional vector space is, secretly, just $\mathbb{R}^n$ in disguise — the theorem that finally explains why coordinates are allowed to work at all.
True to the warning that opened Part VII, and obeying the book's standing rule never to introduce an abstraction in a vacuum, we will not state a single abstract definition without first grounding it in a concrete map you can compute by hand. Our anchor — returned to in nearly every section — is differentiation as a linear transformation on polynomials. Taking a derivative is a linear operation: the derivative of a sum is the sum of the derivatives, and constants pull out. So $\frac{d}{dx}$ is a linear map on the space of polynomials, and once we pick the monomial basis $1,x,x^2,\dots$ it becomes an honest matrix $D$ that we can multiply, whose kernel is the constants, whose image is the lower-degree polynomials, and which — delightfully — is nilpotent. The calculus you already know turns out to be linear algebra wearing different clothes. By the end you will see the derivative as an operator and operators in quantum mechanics as the same idea this chapter makes precise: a linear transformation on an abstract vector space, represented — only when convenient — by a matrix.
35.1 What is a linear transformation between abstract spaces?
Begin, as always, with the picture before the symbols. In Chapter 7 the picture was vivid: a $2\times 2$ matrix grabs the plane and stretches, rotates, shears, or flattens it, sending the unit square to a parallelogram, dragging every vector to a new home — but doing so coherently, so that the grid stays a grid. Lines stay lines, the origin stays put, evenly spaced points stay evenly spaced. That coherence — "the transformation respects the linear structure" — is the whole content of linearity, and it is exactly the part that does not depend on having coordinates. A space of polynomials has no visible grid, but it still has addition and scalar multiplication, and a map can still respect them.
Geometric Intuition — A linear transformation is a map between vector spaces that preserves the structure of the space: it sends grids to grids. In $\mathbb{R}^2$ you watched the visualizer (Chapter 1) do this literally — straight evenly-spaced lines mapped to straight evenly-spaced lines, the origin fixed. In an abstract space the "grid" is invisible, but the principle is identical: a linear map must send the sum of two vectors to the sum of their images, and a scaled vector to the scaled image. If you know what $T$ does to a few building-block vectors (a basis), linearity forces what it does to everything — the map is rigidly determined by its action on a basis, just as a matrix is determined by its columns.
Here is the motivating question that organizes the chapter: which property of a matrix actually made it a "linear function," and does that property need coordinates? Go back to Chapter 7. The defining facts about $\mathbf{x}\mapsto A\mathbf{x}$ were two, and only two: $A(\mathbf{x}+\mathbf{y})=A\mathbf{x}+A\mathbf{y}$ and $A(c\mathbf{x})=cA\mathbf{x}$. Every other property of matrices — the column picture, composition, the four subspaces — was derived from those two. And neither one mentions components. Each speaks only of adding vectors and scaling them, operations that every vector space has by the Chapter 5 axioms. So the definition of "linear" lifts, untouched, to abstract spaces. We are about to give it a name and turn it loose on differentiation.
The Key Insight — A matrix was linear because it satisfied two rules — it commutes with vector addition and with scalar multiplication — and those rules never mentioned coordinates. So "linear transformation" is meaningful between any two vector spaces, with no matrix in sight. The matrix was one way to represent a linear map after choosing coordinates; the linearity was always the real thing. This is the same move Chapter 34 made for the inner product, and Chapter 5 for the vector-space axioms: keep the rules, drop the components, and the idea travels everywhere.
This is recurring theme #1 of the book, finally stated in full. Linear algebra is the study of linear transformations; matrices are merely how we represent them in a coordinate system. Until now we could only half-say it, because our transformations lived in $\mathbb{R}^n$ where coordinates come for free and it is tempting to confuse the matrix with the map. Strip away the coordinates and the confusion becomes impossible: a linear transformation between abstract spaces simply has no matrix until you choose bases, the way a physical rotation has no $3\times 3$ matrix until you nail down axes. The map is the noun; the matrix is one of its many photographs.
It is worth pausing on why this reframing is more than philosophical tidiness. When the map and its matrix are conflated, every property you discover risks being an accident of the coordinate system rather than a fact about the transformation — and you have no way to tell which is which. Separating them gives you a litmus test: a quantity is intrinsic to the map exactly when it survives every change of basis. Determinant, trace, rank, and the eigenvalues pass that test (we prove it in §35.7); a particular matrix entry, or whether the matrix happens to be upper-triangular, does not. The whole of Parts V and VI can be re-read as the search for the basis in which a fixed operator's matrix is simplest — diagonal, if we are lucky — and that search only makes sense once you have firmly distinguished the operator you are studying from the matrix you are choosing to write it in. This chapter draws that distinction once and for all.
FAQ: Why bother with abstract spaces when every example reduces to $\mathbb{R}^n$ anyway?
Because the reduction is exactly the theorem we are after, and seeing why it holds is more valuable than taking it for granted — and because the abstract objects arise on their own terms before any coordinates do. A derivative is a linear operation that exists whether or not you have written polynomials as coefficient lists; a quantum observable is a linear operator on a state space defined by physics, not by a chosen basis; a rotation of physical space is a linear map before you pick axes. Treating these as abstract linear maps, and only then choosing a basis to compute, keeps the geometry honest: you never mistake an artifact of your coordinate choice for a property of the map. And the punchline of §35.8 — that every $n$-dimensional space is isomorphic to $\mathbb{R}^n$ — is not a reason to skip the abstraction; it is the deepest payoff of taking it seriously, the precise statement of why coordinates are allowed to work.
35.2 The definition, grounded in differentiation
Now the rule, grounded first in the anchor exactly as promised. Let $V$ and $W$ be vector spaces over the same field of scalars (think $\mathbb{R}$ throughout, $\mathbb{C}$ where noted). A linear transformation (or linear map) $T:V\to W$ is a function from $V$ to $W$ satisfying two axioms:
Axiom 1 — Additivity. For all $\mathbf{u},\mathbf{v}\in V$, $$ T(\mathbf{u}+\mathbf{v}) = T(\mathbf{u}) + T(\mathbf{v}). $$
Axiom 2 — Homogeneity. For all $\mathbf{v}\in V$ and every scalar $c$, $$ T(c\,\mathbf{v}) = c\,T(\mathbf{v}). $$
That is the entire definition — the two facts from Chapter 7, with the matrix removed. The two axioms are often compressed into one: $T$ is linear iff $T(c\mathbf{u}+d\mathbf{v})=cT(\mathbf{u})+dT(\mathbf{v})$ for all vectors $\mathbf{u},\mathbf{v}$ and scalars $c,d$. Notice the addition on the left of Axiom 1 happens in $V$ while the addition on the right happens in $W$ — the map transports the structure from one space to the other. When $W=V$ we call $T$ a linear operator on $V$; differentiation, which sends polynomials to polynomials, is the prototype.
Before generalizing further, the anchor. Let $\mathbb{P}$ be the vector space of all polynomials (Chapter 5 established that this is a vector space), and define $T=\frac{d}{dx}$, the differentiation map $T(p)=p'$. Is it linear? The two rules you learned in your first calculus course are exactly the two axioms: the derivative of a sum is the sum of the derivatives, $(p+q)'=p'+q'$ (Axiom 1), and a constant pulls out, $(c\,p)'=c\,p'$ (Axiom 2). You have been using the linearity of differentiation since before you met a matrix. The "sum rule" and "constant-multiple rule" of calculus are not two unrelated facts to memorize; together they are the statement that $\frac{d}{dx}$ is a linear transformation.
The Key Insight — The two rules that make differentiation easy — the derivative of a sum is the sum of derivatives, and constants pull out — are precisely the two axioms of a linear transformation. Calculus's $\frac{d}{dx}$ is a linear operator on the space of functions, and recognizing that is the doorway to representing it as a matrix and analyzing it with every tool from Parts II–III. The product rule and chain rule, by contrast, are not linear (they involve products of functions), which is exactly why differentiation-as-an-operator captures the linear part of calculus and leaves the nonlinear part aside.
Two consequences fall straight out of the axioms and hold for every linear map, abstract or not. First, $T$ sends the zero vector to the zero vector: $T(\mathbf{0}) = T(0\cdot\mathbf{0}) = 0\cdot T(\mathbf{0}) = \mathbf{0}$ by homogeneity. (This is why the visualizer always pins the origin in place — a linear map cannot move it.) Second, $T$ preserves linear combinations of any length: $T(c_1\mathbf{v}_1+\cdots+c_k\mathbf{v}_k)=c_1T(\mathbf{v}_1)+\cdots+c_kT(\mathbf{v}_k)$, by induction on the two axioms. This second fact is the one that makes matrices possible at all, and we exploit it in §35.4: a linear map is completely determined by what it does to a basis.
Common Pitfall — Many students think any "formula-defined" function is linear, but most are not, and the test is the two axioms — not whether the formula "looks linear." The map $T(x)=x+3$ on $\mathbb{R}$ fails additivity: $T(1+1)=5$ but $T(1)+T(1)=8$ (it also moves $0$ to $3$, an instant disqualification). The map $T(p)=p^2$ on polynomials fails both axioms, since $(p+q)^2\neq p^2+q^2$. Even $T(\mathbf{v})=\lVert\mathbf{v}\rVert$ is not linear (the norm of a sum is not the sum of the norms — that was the triangle inequality of Chapter 18). A function is linear only if it commutes with addition and scaling; "affine" maps like $\mathbf{x}\mapsto A\mathbf{x}+\mathbf{b}$ with $\mathbf{b}\neq\mathbf{0}$ are the classic near-miss. Always check $T(\mathbf{0})=\mathbf{0}$ first; if it fails, the map is not linear and you are done.
FAQ: How do I check whether a given map is linear?
Run the two axioms, and as a fast pre-screen, check that $T(\mathbf{0})=\mathbf{0}$. If $T(\mathbf{0})\neq\mathbf{0}$, stop — the map is not linear. Otherwise, verify additivity and homogeneity directly from the formula: pick general $\mathbf{u},\mathbf{v}$ and a general scalar $c$, compute $T(\mathbf{u}+\mathbf{v})$ and $T(c\mathbf{u})$, and confirm they equal $T(\mathbf{u})+T(\mathbf{v})$ and $cT(\mathbf{u})$. For differentiation this is the sum and constant-multiple rules; for the integral $\int_a^b$, the analogous linearity rules of integration; for the evaluation map $p\mapsto p(0)$, the fact that $(p+q)(0)=p(0)+q(0)$ and $(cp)(0)=c\,p(0)$. The cleanest disproof is a single numerical counterexample to either axiom — one violation suffices, as with $T(x)=x+3$ above.
35.3 A field guide to abstract linear maps
A definition lands only when you have several concrete instances to attach it to, so before building any matrices let us collect a small zoo of abstract linear maps — abstract vector space examples you can return to as the chapter abstracts further. Each is a genuine linear transformation between spaces that may look nothing like $\mathbb{R}^n$, and each will reappear later as a matrix. The point of meeting them first, in their native habitat, is failure-mode-#6 hygiene: the abstraction is never empty, because here is the furniture it describes.
The differentiation operator $D:\mathbb{P}_n\to\mathbb{P}_{n-1}$. On the space $\mathbb{P}_n$ of polynomials of degree at most $n$, $D(p)=p'$ lowers degree by one, so it maps into $\mathbb{P}_{n-1}$. Linear, as we just established. This is our anchor, and §35.5 builds its matrix.
The integration (antiderivative) operator $S:\mathbb{P}_{n}\to\mathbb{P}_{n+1}$, $S(p)(x)=\int_0^x p(t)\,dt$. Integration from a fixed lower limit is linear (the integral of a sum is the sum of the integrals; constants pull out), and it raises degree by one. It is, in a precise sense we make in §35.6, the near-inverse of $D$ — the Fundamental Theorem of Calculus as a statement about two linear maps.
The shift operator $L$ on sequences (or on $\mathbb{R}^n$), $L(a_0,a_1,a_2,\dots)=(a_1,a_2,a_3,\dots)$, which drops the first entry and shifts everything left. Linear, and ubiquitous in signal processing, time-series analysis, and the theory of difference equations. We will see in §35.5 that, in the right basis, differentiation and the shift are the very same matrix — a small marvel.
The evaluation map $\operatorname{ev}_a:\mathbb{P}_n\to\mathbb{R}$, $\operatorname{ev}_a(p)=p(a)$, which plugs in a fixed number $a$ and returns the value. This maps an $(n+1)$-dimensional space to the $1$-dimensional space $\mathbb{R}$; it is linear because $(p+q)(a)=p(a)+q(a)$ and $(cp)(a)=c\,p(a)$. More generally, evaluating at several points at once, $p\mapsto(p(a_1),\dots,p(a_m))$, is a linear map $\mathbb{P}_n\to\mathbb{R}^m$ — the backbone of polynomial interpolation and of the encoding map in Case Study 2.
The transpose $T(A)=A^{\mathsf{T}}$ on the space $M_{n\times n}$ of square matrices. The space of matrices is itself a vector space (Chapter 5), and transposition is a linear operator on it: $(A+B)^{\mathsf{T}}=A^{\mathsf{T}}+B^{\mathsf{T}}$ and $(cA)^{\mathsf{T}}=cA^{\mathsf{T}}$. Here both the "vectors" and the operator are made of matrices — a useful jolt to anyone still equating "vector" with "arrow."
Geometric Intuition — Even without coordinates, each of these maps has a shape you can feel. Differentiation deflates: it pushes every polynomial down a degree, and it utterly destroys constants (their derivative is zero). Integration inflates: it lifts every polynomial up a degree and is one-to-one — no information is lost going up. The shift truncates: it discards the leading datum and slides the rest. Evaluation projects: it crushes a whole space of polynomials down onto a single number, keeping only "the value at $a$" and forgetting everything else. These verbs — deflate, inflate, truncate, project — are the abstract analogues of stretch, shear, and flatten from the visualizer. The map has a character; the matrix will merely record it in numbers.
Real-World Application — discrete-time systems and the shift operator (signals / control engineering). The shift operator is not a toy. A digital filter, a discrete control system, and an autoregressive time-series model are all built from the shift: the output at time $n$ is a linear combination of shifted copies of the input, $y_n=\sum_k b_k\,x_{n-k}$. Writing $L$ for the shift, a linear filter is a polynomial in the operator $L$ — exactly the algebra of polynomials, now with $L$ in the role of $x$. The reason the $z$-transform turns convolution into multiplication, and the reason filter stability reduces to where certain roots lie, is that this whole apparatus is linear-operator theory on a sequence space. We will see the same operator-as-polynomial idea power the matrix exponential in Chapter 37.
To make one of these abstract maps fully concrete before we generalize, take the transpose operator $T(A)=A^{\mathsf{T}}$ on the four-dimensional space $M_{2\times 2}$ of $2\times 2$ matrices, and watch it become an ordinary matrix. Use the basis $E_{11},E_{12},E_{21},E_{22}$ (the matrices with a single $1$ in each position), so that a matrix $\left[\begin{smallmatrix}a&b\\c&d\end{smallmatrix}\right]$ has coordinate vector $(a,b,c,d)$. Transposition swaps the off-diagonal entries, $b\leftrightarrow c$, and fixes $a$ and $d$, so in coordinates it sends $(a,b,c,d)\mapsto(a,c,b,d)$ — and its matrix is the permutation that swaps the middle two coordinates, $$ [T]=\begin{bmatrix}1&0&0&0\\0&0&1&0\\0&1&0&0\\0&0&0&1\end{bmatrix}. $$ This little matrix tells the whole story of transposition. Its kernel is $\{\mathbf{0}\}$ (the only matrix equal to the zero matrix after transposing is the zero matrix), so transposition is injective; its image is all of $M_{2\times 2}$ (every matrix is the transpose of its own transpose), so it is surjective — transposition is an isomorphism of $M_{2\times 2}$ with itself, in fact one that squares to the identity ($T^2=I$, since $(A^{\mathsf{T}})^{\mathsf{T}}=A$). Its eigenvalues are $+1$ (the symmetric matrices, fixed by transposition) and $-1$ (the antisymmetric matrices, negated) — the symmetric/antisymmetric splitting of Chapter 8, now read off an abstract operator's matrix. A four-dimensional space of matrices, an operator built from matrices, collapsed to a $4\times 4$ permutation: this is §35.4's construction in miniature, and it should make the upcoming general recipe feel inevitable.
FAQ: Do the domain and codomain of a linear map have to be the same space?
No — and most of the interesting maps go between different spaces. A linear transformation $T:V\to W$ allows $V$ and $W$ to be entirely different vector spaces, possibly of different dimensions: differentiation lowers degree ($\mathbb{P}_n\to\mathbb{P}_{n-1}$), integration raises it ($\mathbb{P}_n\to\mathbb{P}_{n+1}$), and evaluation crushes a polynomial space down to $\mathbb{R}^m$. When $W=V$ we give the map a special name, linear operator, because then we can iterate it ($T^2$, $T^3$) and ask about eigenvalues and invariant subspaces — questions that only make sense when outputs can be fed back in as inputs. The matrix of a general map $T:V\to W$ is $m\times n$ (rectangular, with $m=\dim W$, $n=\dim V$); the matrix of an operator is square. Keeping the two spaces distinct is exactly what keeps the kernel (in $V$) and the image (in $W$) from being confused, as the next sections insist.
35.4 What is the matrix of a linear map relative to chosen bases?
Now the central construction of the chapter, and the precise sense in which "a matrix is a linear map in coordinates." Suppose $V$ is $n$-dimensional with a chosen ordered basis $B=\{\mathbf{b}_1,\dots,\mathbf{b}_n\}$, and $W$ is $m$-dimensional with a chosen ordered basis $C=\{\mathbf{c}_1,\dots,\mathbf{c}_m\}$. We will manufacture, from any linear map $T:V\to W$, an $m\times n$ matrix that does the same thing once vectors are written in coordinates. The recipe is a direct generalization of Chapter 7's "columns are the images of the basis vectors."
The idea rests entirely on the fact from §35.2 that a linear map is determined by its action on a basis. Take any $\mathbf{v}\in V$ and write it in the basis $B$: $\mathbf{v}=x_1\mathbf{b}_1+\cdots+x_n\mathbf{b}_n$. The list $[\mathbf{v}]_B=(x_1,\dots,x_n)$ is its coordinate vector — the Chapter 15 idea. By linearity, $$ T(\mathbf{v}) = T\!\big(x_1\mathbf{b}_1+\cdots+x_n\mathbf{b}_n\big) = x_1\,T(\mathbf{b}_1)+\cdots+x_n\,T(\mathbf{b}_n). $$ So to know $T(\mathbf{v})$ for every $\mathbf{v}$, it suffices to know the $n$ images $T(\mathbf{b}_1),\dots,T(\mathbf{b}_n)$. Each image lives in $W$, so write it in the basis $C$: let $[T(\mathbf{b}_j)]_C$ be the coordinate column of the $j$-th image. Stack these $n$ columns side by side. That $m\times n$ array is the matrix of $T$ relative to the bases $B$ and $C$, written $[T]_{C\leftarrow B}$ (read: "the matrix of $T$ from $B$-coordinates to $C$-coordinates").
The Key Insight — The matrix of a linear map is built exactly the way Chapter 7 built it: its $j$-th column is the image of the $j$-th basis vector, written in the codomain's basis. Once you have it, applying the abstract map becomes ordinary matrix–vector multiplication on coordinate vectors: $[T(\mathbf{v})]_C = [T]_{C\leftarrow B}\,[\mathbf{v}]_B$. The abstract map upstairs is mirrored, perfectly, by a matrix downstairs in coordinates. Choosing bases is choosing the coordinate systems in which to photograph the map.
The defining equation deserves to be stated cleanly, because it is the hinge of the entire chapter. For every $\mathbf{v}\in V$, $$ \boxed{\;[T(\mathbf{v})]_C \;=\; [T]_{C\leftarrow B}\;[\mathbf{v}]_B\;} $$ The left side: take $\mathbf{v}$, apply the abstract map $T$, then read off coordinates in $C$. The right side: take the coordinates of $\mathbf{v}$ in $B$, then multiply by a fixed matrix. They agree for every input. This is the rigorous content of "a matrix represents a linear transformation," and it is why everything we proved about matrices in Parts II and III will transfer: in coordinates, an abstract linear map is a matrix.
Common Pitfall — The matrix of $T$ is not a property of $T$ alone — it depends on both chosen bases, and changing either one changes the matrix. Students often write "the matrix of differentiation is $D$" as if $D$ were intrinsic; it is the matrix of differentiation in the monomial basis, and §35.5 shows a different basis gives a different (cleaner!) matrix for the very same operator. Always ask "in which bases?" before writing down a matrix, the way you would ask "in which units?" before writing down a measurement. The map is the invariant; the matrix is basis-dependent.
How the matrix changes when you change the basis
This is the Chapter 16 story, now told for abstract maps, and it is so important we will prove it in §35.7. Suppose we keep the same operator $T:V\to V$ but switch from basis $B$ to a new basis $\tilde{B}$. Coordinates change by an invertible change-of-basis matrix $P$ (whose columns express the new basis vectors in the old coordinates, exactly as in Chapter 16): $[\mathbf{v}]_B = P[\mathbf{v}]_{\tilde B}$. Pushing this through the boxed equation shows that the matrix of $T$ transforms by $$ [T]_{\tilde B} = P^{-1}\,[T]_B\,P, $$ the similarity transformation you first met in Chapter 16 and used for diagonalization in Chapter 25. So the many matrices of a single operator — one for each basis — are all similar to one another. They share a determinant, a trace, a rank, a characteristic polynomial, and a set of eigenvalues, because those are properties of the operator, not of the coordinate system. The matrix changes; the invariants beneath it do not. That is recurring theme #1 made quantitative.
Check Your Understanding — Let $T:\mathbb{P}_2\to\mathbb{P}_2$ be the operator $T(p)(x)=p(x+1)$ (shift the input by one). Using the monomial basis $\{1,x,x^2\}$, find the matrix $[T]_B$ by computing $T$ on each basis vector.
Answer
Compute the images: $T(1)=1$, $T(x)=x+1$, and $T(x^2)=(x+1)^2=x^2+2x+1$. In coordinates over $\{1,x,x^2\}$ these are $(1,0,0)$, $(1,1,0)$, $(1,2,1)$, so $[T]_B=\left[\begin{smallmatrix}1&1&1\\0&1&2\\0&0&1\end{smallmatrix}\right]$ (the columns are the images of $1,x,x^2$). Notice this matrix is upper-triangular with $1$s on the diagonal — its only eigenvalue is $1$, and indeed the shift fixes the constant polynomial $1$. You have just built the matrix of an operator straight from the recipe of §35.4, and its columns are literally Pascal's-triangle binomial coefficients: $(x+1)^k$ expanded. This is the same map as "evaluate the Taylor series of $p$ at the point shifted by $1$," foreshadowing how $e^{D}$ becomes the unit shift in Chapter 37.Real-World Application — coordinate choice as a modeling lever (data science / numerical computing). The freedom to choose the basis is not academic bookkeeping; it is one of the most powerful levers in applied linear algebra. Diagonalization (Chapter 25) chooses the eigenbasis, in which an operator becomes a diagonal matrix and its repeated application is trivial — the engine behind PageRank's power iteration and behind solving systems of ODEs in Chapter 37. PCA (Chapter 32) chooses the basis of principal directions, in which a covariance operator is diagonal and the data's structure is laid bare. In every case the operator is fixed and we are hunting for the basis that makes its matrix simplest. The whole game of matrix decompositions is "find good coordinates," and this chapter is why that game is even legal: the same map wears many matrices, and we get to pick.
35.5 Differentiation as a matrix: the anchor worked in full
Time to cash in the anchor and watch the abstraction become a concrete, multipliable matrix. Work in $\mathbb{P}_3$, the polynomials of degree at most $3$ — a four-dimensional vector space — with the monomial basis $B=\{1,\,x,\,x^2,\,x^3\}$. The differentiation operator $D=\frac{d}{dx}$ is linear (§35.2) and lowers degree, so $D:\mathbb{P}_3\to\mathbb{P}_3$ (its image sits inside $\mathbb{P}_2$). Build its matrix by the recipe of §35.4: differentiate each basis vector and read off coordinates in $B$.
$$ D(1)=0,\qquad D(x)=1,\qquad D(x^2)=2x,\qquad D(x^3)=3x^2. $$ In coordinates over $B=\{1,x,x^2,x^3\}$ these images are $(0,0,0,0)$, $(1,0,0,0)$, $(0,2,0,0)$, $(0,0,3,0)$. Stack them as columns: $$ [D]_B \;=\; \begin{bmatrix} 0 & 1 & 0 & 0\\ 0 & 0 & 2 & 0\\ 0 & 0 & 0 & 3\\ 0 & 0 & 0 & 0 \end{bmatrix}. $$ Read the matrix and you can see differentiation in it: the entries $1,2,3$ marching up the superdiagonal are precisely the exponents that come down when you differentiate $x^1,x^2,x^3$, and the all-zero first column is the fact that $D(1)=0$. The matrix is a faithful portrait of the operator.
Now test the boxed equation on a real polynomial. Take $p(x)=2+3x+5x^2-x^3$, whose coordinate vector is $[\mathbf p]_B=(2,3,5,-1)$. By hand, $p'(x)=3+10x-3x^2$, with coordinates $(3,10,-3,0)$. By matrix, $$ [D]_B\,[\mathbf p]_B= \begin{bmatrix}0&1&0&0\\0&0&2&0\\0&0&0&3\\0&0&0&0\end{bmatrix} \begin{bmatrix}2\\3\\5\\-1\end{bmatrix} =\begin{bmatrix}3\\10\\-3\\0\end{bmatrix}. $$ The matrix product reproduces the hand derivative exactly: $3+10x-3x^2$. Differentiation, the operation at the heart of calculus, is literally this matrix multiplication once polynomials are written as coefficient vectors. (As always, mathematics indexes from $1$ — the entry $a_{12}=1$ — while numpy will index from $0$; we flag it where it bites.)
numpy verification: build $D$, differentiate, and show it is nilpotent
# Differentiation as a matrix on P_3 (degree <= 3), monomial basis {1, x, x^2, x^3}.
import numpy as np
n = 3
D = np.zeros((n + 1, n + 1))
for j in range(1, n + 1):
D[j - 1, j] = j # d/dx(x^j) = j x^(j-1): put exponent j in row j-1, col j
print(D.astype(int)) # the differentiation matrix
# [[0 1 0 0]
# [0 0 2 0]
# [0 0 0 3]
# [0 0 0 0]]
p = np.array([2., 3., 5., -1.]) # p = 2 + 3x + 5x^2 - x^3
print((D @ p).astype(int)) # [ 3 10 -3 0] -> 3 + 10x - 3x^2, the hand derivative
# D is NILPOTENT: differentiate a degree-3 polynomial 4 times and you get 0.
print(np.allclose(np.linalg.matrix_power(D, 4), 0)) # True -> D^4 = 0
print((D @ D).astype(int)) # D^2: differentiate twice
# [[0 0 2 0]
# [0 0 0 6]
# [0 0 0 0]
# [0 0 0 0]]
The outputs match the hand work exactly: the matrix is the superdiagonal $1,2,3$; applying it to $(2,3,5,-1)$ yields $(3,10,-3,0)$; and $D^4=0$. That last fact is worth dwelling on. A matrix $A$ is nilpotent if some power $A^k$ is the zero matrix. Here $D^{4}=0$ on $\mathbb{P}_3$, and the reason is pure calculus: differentiate a cubic four times and nothing is left. More generally on $\mathbb{P}_n$, $D^{n+1}=0$ — the $(n+1)$-th derivative annihilates every polynomial of degree at most $n$. Nilpotence is the algebraic shadow of "differentiation eventually exhausts a polynomial," and it is the reason $D$ has no nonzero eigenvalues (a fact we revisit when Jordan form appears in Chapter 36 — every nilpotent operator is the purest example of the defective matrices that chapter studies).
Geometric Intuition — Picture $\mathbb{P}_3$ as a four-rung ladder: constants on the bottom rung, then $x$, then $x^2$, then $x^3$ on top. Differentiation $D$ knocks every polynomial down one rung — and whatever was on the bottom rung (the constants) falls off the ladder entirely, becoming zero. Apply $D$ four times and even a top-rung cubic has fallen off the bottom: that is $D^4=0$, nilpotence, drawn as a picture. Integration $S$ is the same ladder run upward, lifting each rung up one and never losing anything. The superdiagonal of $[D]_B$ is exactly "shift down one rung, scaled by the exponent."
The shift in disguise: choosing a smarter basis
Here the freedom of §35.4 pays a concrete dividend, and the change-of-basis idea of Chapter 16 becomes vivid. Keep the same operator $D$ but change the basis from monomials to the factorial basis $C=\big\{1,\;x,\;\tfrac{x^2}{2},\;\tfrac{x^3}{6}\big\}=\{1,x,\tfrac{x^2}{2!},\tfrac{x^3}{3!}\}$. Why this basis? Because $D$ acts beautifully on it: $\frac{d}{dx}\!\big(\tfrac{x^k}{k!}\big)=\tfrac{x^{k-1}}{(k-1)!}$ — differentiation sends each basis vector to the previous one, with no scaling factor at all. Build the matrix: the images are $D(1)=0$, $D(x)=1$, $D(\tfrac{x^2}{2})=x$, $D(\tfrac{x^3}{6})=\tfrac{x^2}{2}$, whose coordinates in $C$ are $(0,0,0,0)$, $(1,0,0,0)$, $(0,1,0,0)$, $(0,0,1,0)$. So $$ [D]_C= \begin{bmatrix} 0&1&0&0\\ 0&0&1&0\\ 0&0&0&1\\ 0&0&0&0 \end{bmatrix} $$ — pure $1$s on the superdiagonal. This is exactly the shift operator $L$ of §35.3. The same differentiation operator that looked like $\operatorname{diag}$-superdiagonal-$(1,2,3)$ in the monomial basis is the clean leftward shift in the factorial basis. The operator never changed; we changed the coordinates in which we photographed it, and a better basis produced a cleaner matrix. That is the entire content of similarity, made concrete on the anchor.
# The SAME operator D in the factorial basis {1, x, x^2/2, x^3/6} via similarity P^{-1} D P.
import numpy as np
D_B = np.array([[0,1,0,0],[0,0,2,0],[0,0,0,3],[0,0,0,0]], float)
P = np.array([[1,0,0,0],[0,1,0,0],[0,0,0.5,0],[0,0,0,1/6]]) # cols: C-basis in monomial coords
D_C = np.linalg.inv(P) @ D_B @ P
print(np.round(D_C).astype(int)) # the clean shift
# [[0 1 0 0]
# [0 0 1 0]
# [0 0 0 1]
# [0 0 0 0]]
The output is the superdiagonal of all $1$s: $[D]_C$ is the shift. Two matrices, $[D]_B$ and $[D]_C$, similar via $P$, representing one operator. (You may recognize the factorial basis as the Taylor-series basis — and that $D$ acting as a clean shift on it is the reason the matrix exponential $e^{Dt}$ will turn out to be the Taylor shift, the Taylor expansion itself, when we meet $e^{At}$ in Chapter 37.)
FAQ: Why does the same operator have different matrices?
Because a matrix encodes the operator plus a choice of coordinates, and there are infinitely many coordinate systems. The operator $D$ is a single, fixed rule — "take the derivative" — but to turn it into a grid of numbers you must say which basis you are measuring inputs and outputs against. The monomial basis records the exponents that fall when you differentiate ($1,2,3$ on the superdiagonal); the factorial basis pre-divides by those exponents, so they cancel and you are left with a bare shift. Both matrices apply the identical operation to the identical polynomials; they merely write it in different coordinates, and they are related by the similarity $P^{-1}[D]_BP$ of Chapter 16. This is exactly why we hunt for good bases (the eigenbasis, the principal-component basis): the operator is fixed, but the matrix can be made simple or messy, and simple matrices reveal what the operator really does.
35.6 Kernel and image: the null space and column space, freed from coordinates
Every matrix carried two fundamental subspaces from Chapter 13 — the null space $N(A)$ (vectors sent to $\mathbf{0}$) and the column space $C(A)$ (vectors that can be reached). Abstract linear maps carry the exact same two subspaces, under classical names. Ground them, first, in differentiation.
What does $D$ send to zero? A polynomial whose derivative is identically zero — that is, a constant. So the "things $D$ kills" are exactly the constant polynomials, a one-dimensional space spanned by $1$. What can $D$ reach? Differentiating a polynomial of degree $\le 3$ produces a polynomial of degree $\le 2$, and every degree-$\le 2$ polynomial is reachable (it is the derivative of its own antiderivative). So the "things $D$ outputs" are exactly $\mathbb{P}_2$, a three-dimensional space. These two subspaces — the constants, and $\mathbb{P}_2$ — are the kernel and image of $D$, and we now give the general definitions they instantiate.
The kernel of a linear map $T:V\to W$ is the set of vectors in the domain that $T$ sends to the zero vector of $W$: $$ \ker T = \{\,\mathbf{v}\in V : T(\mathbf{v})=\mathbf{0}\,\}. $$ The image (or range) of $T$ is the set of vectors in the codomain that $T$ actually hits: $$ \operatorname{im} T = \{\,T(\mathbf{v}) : \mathbf{v}\in V\,\}\subseteq W. $$ Both are subspaces — the kernel is a subspace of $V$, the image a subspace of $W$ — and the proofs are the same closure arguments you ran in Chapter 13, now using only additivity and homogeneity. (Kernel: if $T(\mathbf{u})=T(\mathbf{v})=\mathbf{0}$ then $T(\mathbf{u}+c\mathbf{v})=T(\mathbf{u})+cT(\mathbf{v})=\mathbf{0}$, so the kernel is closed under linear combinations. Image: if $\mathbf{w}_1=T(\mathbf{u})$ and $\mathbf{w}_2=T(\mathbf{v})$ are in the image, so is $\mathbf{w}_1+c\mathbf{w}_2=T(\mathbf{u}+c\mathbf{v})$.)
The Key Insight — Kernel and image are the null space and column space, freed from coordinates. Pick bases and represent $T$ by its matrix $[T]_{C\leftarrow B}$, and then $\ker T$ is exactly the null space $N([T]_{C\leftarrow B})$ and $\operatorname{im}T$ is exactly the column space $C([T]_{C\leftarrow B})$ — the same subspaces of Chapter 13, read in coordinates. Everything you learned about null and column spaces (how to find them by row reduction, what their dimensions mean, how rank counts pivots) applies verbatim to abstract maps, once you choose a basis. The four-fundamental-subspaces framework (recurring theme #5) was never about matrices specifically; it was about linear maps, and matrices were the coordinate version.
For our anchor, this dictionary reads: $\ker D=\operatorname{span}\{1\}$ (the constants), and $\operatorname{im}D=\mathbb{P}_2$. And these match the matrix $[D]_B$ from §35.5 perfectly. The first column of $[D]_B$ is all zeros, so the coordinate vector $(1,0,0,0)$ — the polynomial $1$ — is in the null space, confirming $\ker D=\operatorname{span}\{1\}$. The columns of $[D]_B$ span the first three coordinate directions, so the column space is the degree-$\le 2$ polynomials, confirming $\operatorname{im}D=\mathbb{P}_2$. The abstract kernel and image and the matrix's null and column spaces are the same subspaces, exactly as the boxed equation promised.
There is a beautiful relationship between differentiation's kernel and the integration operator $S$ that deserves to be stated as a matrix identity, because it is the Fundamental Theorem of Calculus in linear-algebra clothing. Differentiating an antiderivative gives back the original function: $D\big(S(p)\big)=p$ for every $p$, since $\frac{d}{dx}\int_0^x p\,dt=p(x)$. In coordinates this says $[D]\,[S]=I$ on $\mathbb{P}_2$ — the differentiation matrix times the integration matrix is the identity. (We verified exactly this in the numpy of §35.3's spirit: $DS$ comes out as the identity on the lower-degree block.) But the other order fails: $S\big(D(p)\big)\neq p$ in general, because integrating a derivative recovers the polynomial only up to its constant term — $S(D(p))=p-p(0)$. The discrepancy $p(0)$ is precisely the part of $p$ living in $\ker D$, the constants. So "$+C$," the integration constant that every calculus student adds, is exactly the kernel of $D$ refusing to be recovered by integration. $S$ is a right inverse of $D$ but not a left inverse, and the obstruction is the nonzero kernel — a fact rank–nullity will quantify in §35.8. The Fundamental Theorem of Calculus, read this way, is a statement about two linear maps and the kernel that separates "right inverse" from "true inverse."
Common Pitfall — The kernel is a subspace of the domain $V$; the image is a subspace of the codomain $W$. When $T:V\to W$ maps between different spaces these live in different places, and confusing them is a frequent error — the kernel can never contain output vectors, and the image can never contain input vectors. For the evaluation map $\operatorname{ev}_0:\mathbb{P}_3\to\mathbb{R}$, the kernel is the polynomials with $p(0)=0$ (a subspace of $\mathbb{P}_3$) while the image is a subspace of $\mathbb{R}$. Keep the two spaces straight, and the dimensions in rank–nullity (next section) never confuse you.
Injective and surjective, read off the kernel and image
Two of the most important questions about any map — is it one-to-one? is it onto? — are answered directly by these subspaces, and the criteria are clean. A linear map $T$ is injective (one-to-one) if and only if $\ker T=\{\mathbf{0}\}$. The reason is linearity: $T(\mathbf{u})=T(\mathbf{v})$ iff $T(\mathbf{u}-\mathbf{v})=\mathbf{0}$ iff $\mathbf{u}-\mathbf{v}\in\ker T$, so distinct inputs can collide only through a nonzero kernel vector. A trivial kernel means no collisions. A linear map is surjective (onto) if and only if $\operatorname{im}T=W$ — onto is just "the image fills the whole codomain," by definition. So differentiation $D:\mathbb{P}_3\to\mathbb{P}_3$ is neither injective (its kernel, the constants, is nonzero — many polynomials share a derivative, differing by a constant of integration) nor surjective onto $\mathbb{P}_3$ (its image is only $\mathbb{P}_2$; you can never differentiate your way to a genuine cubic). The "$+C$" of indefinite integration is precisely the nontrivial kernel of $D$ stated in calculus language.
Check Your Understanding — Consider the integration operator $S:\mathbb{P}_2\to\mathbb{P}_3$, $S(p)(x)=\int_0^x p(t)\,dt$. What is its kernel, and is it injective? Is it surjective onto $\mathbb{P}_3$?
Answer
The kernel is $\{\mathbf{0}\}$: if $\int_0^x p(t)\,dt$ is the zero polynomial, then differentiating gives $p=0$, so only the zero polynomial integrates to zero. Hence $S$ is injective (one-to-one) — integration loses no information. But $S$ is not surjective onto $\mathbb{P}_3$: every output $\int_0^x p\,dt$ has zero constant term (it vanishes at $x=0$), so a polynomial like $1+x$ with nonzero constant term is unreachable. The image is the three-dimensional subspace of $\mathbb{P}_3$ of polynomials with $p(0)=0$. Notice $S$ raises dimension's "room" but cannot fill the extra dimension — exactly what rank–nullity will quantify next.
A second concrete case: the kernel and image of the evaluation map
To see kernel and image when the domain and codomain are genuinely different spaces, work the double-evaluation map $\operatorname{ev}:\mathbb{P}_3\to\mathbb{R}^2$, $\operatorname{ev}(p)=(p(0),\,p(1))$ — "report the value at $0$ and the value at $1$." It is linear (each coordinate is an evaluation, and evaluations are linear), and it maps a four-dimensional space to a two-dimensional one. Build its matrix in the monomial basis: the images of $1,x,x^2,x^3$ are $(1,1)$, $(0,1)$, $(0,1)$, $(0,1)$, since $1$ has value $1$ at both points while $x,x^2,x^3$ all vanish at $0$ and equal $1$ at $1$. Stacking columns, $$ [\operatorname{ev}]=\begin{bmatrix}1&0&0&0\\1&1&1&1\end{bmatrix}. $$ The image is all of $\mathbb{R}^2$: the first two columns $(1,1)$ and $(0,1)$ are already independent, so the columns span $\mathbb{R}^2$ — every pair of target values $(p(0),p(1))$ is achievable, and $\operatorname{ev}$ is surjective. The kernel is the polynomials that vanish at both $0$ and $1$, i.e. $p(0)=p(1)=0$; from the matrix, the null space is solutions of $a_0=0$ and $a_0+a_1+a_2+a_3=0$, which leaves two free parameters — a two-dimensional kernel, spanned for instance by $x^2-x$ (coordinates $(0,-1,1,0)$) and $x^3-x$ (coordinates $(0,-1,0,1)$), both of which vanish at $0$ and $1$. So here $\dim\ker=2$ and $\dim\operatorname{im}=2$, a different split from differentiation's $1$ and $3$ — but, as the next section's theorem demands, again summing to $\dim\mathbb{P}_3=4$. The kernel lives in $\mathbb{P}_3$ (it is made of polynomials), the image lives in $\mathbb{R}^2$ (it is made of value-pairs): two subspaces in two different spaces, exactly as the pitfall warned. This is the polynomial-interpolation kernel made visible — two evaluation conditions kill two dimensions' worth of polynomials, which is why a cubic is not pinned down by its values at only two points.
FAQ: How are kernel and image different from null space and column space?
They are the same objects under more general names. "Null space" and "column space" are the names we use when the map is presented as a matrix acting on $\mathbb{R}^n$; "kernel" and "image" are the names we use for an abstract linear map between any vector spaces. The null space is the kernel of $\mathbf{x}\mapsto A\mathbf{x}$; the column space is its image (the span of the columns is exactly the set of reachable $A\mathbf{x}$). When you represent an abstract map by a matrix in chosen bases, its kernel becomes that matrix's null space and its image becomes that matrix's column space — so every computational technique from Chapter 13, row reduction included, transfers directly. The two vocabularies describe one idea at two levels of abstraction.
35.7 Why is the matrix of $T$ basis-dependent? A motivated proof
We have asserted repeatedly that the matrix of a linear map depends on the chosen bases, and that changing the basis transforms the matrix by similarity. This claim sits at the very center of the book's first theme — the transformation is the real object; the matrix is its basis-dependent shadow — so it deserves a proof in the format of §10 of the style bible, not just an assertion.
Why we care. This theorem is the precise, provable statement of recurring theme #1: a single linear operator has many matrices, one per basis, all similar to one another. It is why diagonalization is even possible (Chapter 25 picks the basis that makes the matrix diagonal), why similar matrices share eigenvalues, determinant, and trace (those are operator invariants, blind to coordinates), and why "find better coordinates" is the unifying strategy behind every decomposition in Part VI. Without this result, we could not separate what belongs to the map from what is merely an artifact of how we chose to write it down.
Key idea. Inserting a basis change is like inserting a translator on each side. To apply the operator's matrix in the new basis, first translate new-basis coordinates into old-basis coordinates (multiply by $P$), then apply the operator in the old basis (multiply by $[T]_B$), then translate the result back into the new basis (multiply by $P^{-1}$). Composed, those three steps are the matrix $P^{-1}[T]_BP$ — and composition of these coordinate translations is, once again, matrix multiplication.
Theorem (change of basis for the matrix of an operator). Let $T:V\to V$ be a linear operator on a finite-dimensional vector space $V$, let $B$ and $\tilde B$ be two ordered bases of $V$, and let $P$ be the change-of-basis matrix whose columns are the $\tilde B$-basis vectors expressed in $B$-coordinates, so that $[\mathbf{v}]_B=P\,[\mathbf{v}]_{\tilde B}$ for every $\mathbf{v}\in V$ (this $P$ is invertible — Chapter 16). Then the matrices of $T$ in the two bases are related by $$ [T]_{\tilde B} \;=\; P^{-1}\,[T]_{B}\,P. $$ In particular, any two matrices representing the same operator are similar, and therefore share their determinant, trace, rank, characteristic polynomial, and eigenvalues.
Proof. Fix an arbitrary vector $\mathbf{v}\in V$; we track its coordinates through both descriptions and show the two matrices must agree. By the defining property of the matrix of a map (the boxed equation of §35.4), applied in the basis $\tilde B$, $$ [T(\mathbf{v})]_{\tilde B} = [T]_{\tilde B}\,[\mathbf{v}]_{\tilde B}. \tag{1} $$ Now compute the same quantity $[T(\mathbf{v})]_{\tilde B}$ by routing through the basis $B$. First, the change-of-basis relation says coordinates convert by $P$ and $P^{-1}$: $$ [\mathbf{v}]_B = P\,[\mathbf{v}]_{\tilde B}, \qquad\text{and for any vector } \mathbf{w},\quad [\mathbf{w}]_{\tilde B} = P^{-1}\,[\mathbf{w}]_B. \tag{2} $$ Apply the second relation in (2) to the vector $\mathbf{w}=T(\mathbf{v})$, then use the boxed equation in basis $B$ to rewrite $[T(\mathbf{v})]_B=[T]_B[\mathbf{v}]_B$, and finally the first relation in (2) to rewrite $[\mathbf{v}]_B$: $$ [T(\mathbf{v})]_{\tilde B} = P^{-1}\,[T(\mathbf{v})]_B = P^{-1}\,[T]_B\,[\mathbf{v}]_B = P^{-1}\,[T]_B\,P\,[\mathbf{v}]_{\tilde B}. \tag{3} $$ Lines (1) and (3) compute the same vector $[T(\mathbf{v})]_{\tilde B}$ two ways, so for every $\mathbf{v}\in V$, $$ [T]_{\tilde B}\,[\mathbf{v}]_{\tilde B} = \big(P^{-1}\,[T]_B\,P\big)\,[\mathbf{v}]_{\tilde B}. $$ As $\mathbf{v}$ ranges over $V$, its coordinate vector $[\mathbf{v}]_{\tilde B}$ ranges over all of $\mathbb{R}^n$. Two matrices that agree on every input vector are equal (apply both to each standard basis vector $\mathbf{e}_i$ to read off matching columns). Therefore $[T]_{\tilde B}=P^{-1}\,[T]_B\,P$, which is what we set out to prove. The shared-invariants claim then follows from Chapter 16: similar matrices have equal determinant ($\det(P^{-1}AP)=\det A$), equal trace, equal rank, and the same characteristic polynomial, hence the same eigenvalues. $\blacksquare$
What this means. The collection of all matrices of one operator is a similarity class — a family of matrices, all linked by $P^{-1}(\cdot)P$, that are different photographs of one invariant object. Geometry-and-algebra (recurring theme #2) reaches its sharpest form here: the operator is the geometric reality; its matrices are algebraic descriptions in chosen coordinates; and the quantities that survive every basis change (determinant, trace, eigenvalues) are exactly the intrinsic features of the transformation. We confirmed this concretely in §35.5: $[D]_B$ (superdiagonal $1,2,3$) and $[D]_C$ (the clean shift) are similar, and indeed both have determinant $0$, trace $0$, rank $3$, and the single eigenvalue $0$ — the invariants of the operator $D$, identical across both photographs.
35.8 Rank–Nullity, abstractly — and why every $n$-dimensional space is $\mathbb{R}^n$
We come to the structural heart of the chapter: the abstract Rank–Nullity theorem, and the isomorphism theorem that finally explains why coordinates work. Both are pure linear-map facts — true in any abstract space — and both we will see instantly on the anchor.
The rank of a linear map $T$ is $\operatorname{rank}T=\dim(\operatorname{im}T)$, the dimension of its image; the nullity is $\operatorname{null}T=\dim(\ker T)$, the dimension of its kernel. These generalize the matrix rank and nullity of Chapter 14 exactly. The theorem ties them to the dimension of the domain.
Theorem (Rank–Nullity, abstract form). Let $T:V\to W$ be a linear map with $V$ finite-dimensional. Then $$ \dim(\ker T) + \dim(\operatorname{im}T) = \dim V. $$
Why we care. Rank–Nullity is a conservation law for dimension: every dimension of the domain is accounted for, either collapsed into the kernel (sent to zero) or surviving into the image (faithfully reproduced). It instantly settles questions of injectivity and surjectivity, it forces the rank to obey hard bounds, and — combined with the next theorem — it is what makes "every $n$-dimensional space is $\mathbb{R}^n$" true. It is the same theorem you proved for matrices in Chapter 14, now liberated to all linear maps.
Key idea. Start with a basis of the kernel — the directions that get crushed. Extend it to a basis of the whole domain $V$; the extra basis vectors are the directions not crushed. The map sends those extra directions to a basis of the image, one survivor for each. So $\dim V$ splits cleanly into "crushed directions" (the kernel) plus "surviving directions" (the image), and the survivors are in one-to-one correspondence with a basis of the image.
Proof. Let $\dim V=n$ and $\dim(\ker T)=k$. Choose a basis $\{\mathbf{u}_1,\dots,\mathbf{u}_k\}$ of $\ker T$. Since these are linearly independent in $V$, extend them (Chapter 15's basis-extension theorem) to a full basis of $V$: $$ \{\mathbf{u}_1,\dots,\mathbf{u}_k,\ \mathbf{w}_1,\dots,\mathbf{w}_{n-k}\}. $$ We claim the images $\{T(\mathbf{w}_1),\dots,T(\mathbf{w}_{n-k})\}$ form a basis of $\operatorname{im}T$; if so, then $\dim(\operatorname{im}T)=n-k$ and the theorem reads $k+(n-k)=n$, as required. Two things to verify.
They span the image. Take any $\mathbf{y}\in\operatorname{im}T$, say $\mathbf{y}=T(\mathbf{v})$. Write $\mathbf{v}$ in the full basis: $\mathbf{v}=\sum_i a_i\mathbf{u}_i+\sum_j b_j\mathbf{w}_j$. Apply $T$ and use linearity; since each $\mathbf{u}_i\in\ker T$, every $T(\mathbf{u}_i)=\mathbf{0}$, leaving $$ \mathbf{y}=T(\mathbf{v})=\sum_j b_j\,T(\mathbf{w}_j). $$ So every image vector is a combination of the $T(\mathbf{w}_j)$ — they span $\operatorname{im}T$.
They are linearly independent. Suppose $\sum_j c_j\,T(\mathbf{w}_j)=\mathbf{0}$. By linearity $T\!\big(\sum_j c_j\mathbf{w}_j\big)=\mathbf{0}$, so the vector $\mathbf{z}=\sum_j c_j\mathbf{w}_j$ lies in $\ker T$. But the kernel is spanned by the $\mathbf{u}_i$, so $\mathbf{z}=\sum_i d_i\mathbf{u}_i$ for some scalars $d_i$. Then $$ \sum_j c_j\mathbf{w}_j - \sum_i d_i\mathbf{u}_i = \mathbf{0}, $$ a linear dependence among the vectors $\{\mathbf{u}_1,\dots,\mathbf{u}_k,\mathbf{w}_1,\dots,\mathbf{w}_{n-k}\}$. But those are a basis of $V$, hence linearly independent, so every coefficient is zero — in particular every $c_j=0$. Thus the $T(\mathbf{w}_j)$ are linearly independent. Spanning plus independent makes them a basis of $\operatorname{im}T$, with $\dim(\operatorname{im}T)=n-k$, and $\dim(\ker T)+\dim(\operatorname{im}T)=k+(n-k)=n=\dim V$. $\blacksquare$
What this means. Dimension is conserved: the domain's $n$ dimensions are partitioned into the kernel's $k$ (annihilated) and the image's $n-k$ (faithfully transmitted). On the anchor $D:\mathbb{P}_3\to\mathbb{P}_3$ the bookkeeping is immediate — $\dim\ker D=1$ (the constants), $\dim\operatorname{im}D=3$ (which is $\mathbb{P}_2$), and $1+3=4=\dim\mathbb{P}_3$. The one dimension of constants is exactly the dimension differentiation destroys; the three surviving dimensions are exactly $\mathbb{P}_2$. Rank–Nullity is the precise statement that "what differentiation kills plus what it keeps equals everything it started with."
# Abstract Rank-Nullity on the differentiation operator D : P_3 -> P_3.
import numpy as np
D = np.array([[0,1,0,0],[0,0,2,0],[0,0,0,3],[0,0,0,0]], float)
rank = np.linalg.matrix_rank(D) # dim(image)
nullity = D.shape[1] - rank # dim(kernel) = n - rank
print(rank, nullity, rank + nullity) # 3 1 4 -> dim im + dim ker = dim P_3 = 4
print((D @ np.array([1,0,0,0.]))) # [0 0 0 0] -> the constant 1 is in the kernel
The output 3 1 4 is rank–nullity for differentiation: image dimension $3$, kernel dimension $1$, summing to $\dim\mathbb{P}_3=4$. And $D$ applied to the coordinate vector of the constant $1$ returns zero, confirming the kernel. (As an evaluation-map example with a different dimension split, the map $\operatorname{ev}:\mathbb{P}_3\to\mathbb{R}^2$, $p\mapsto(p(0),p(1))$, has matrix $\left[\begin{smallmatrix}1&0&0&0\\1&1&1&1\end{smallmatrix}\right]$ with rank $2$ and nullity $2$, again summing to $4$ — Case Study 2 builds it out.)
Isomorphisms: why coordinates work at all
We now reach the theorem that has been humming under the entire book. An isomorphism is a linear map $T:V\to W$ that is bijective — both injective ($\ker T=\{\mathbf{0}\}$) and surjective ($\operatorname{im}T=W$). When an isomorphism exists, $V$ and $W$ are isomorphic, written $V\cong W$, and they are "the same vector space wearing different labels": every linear-algebraic statement true in one is true in the other, transported by $T$. An isomorphism has an inverse $T^{-1}$ that is also linear (a short exercise from the axioms), so the relabeling goes both ways losslessly.
The central fact is breathtakingly clean.
Theorem (classification of finite-dimensional vector spaces). Two finite-dimensional vector spaces over the same field are isomorphic if and only if they have the same dimension. In particular, every $n$-dimensional real vector space $V$ is isomorphic to $\mathbb{R}^n$.
The isomorphism is one you have used since Chapter 15 without naming it: the coordinate map. Choose a basis $B=\{\mathbf{b}_1,\dots,\mathbf{b}_n\}$ of $V$ and send each vector to its coordinate vector, $$ \Phi_B:V\to\mathbb{R}^n,\qquad \Phi_B(\mathbf{v})=[\mathbf{v}]_B=(x_1,\dots,x_n)\ \text{where}\ \mathbf{v}=\textstyle\sum_i x_i\mathbf{b}_i. $$ This map is linear (coordinates of a sum are the sum of coordinates; coordinates scale with the vector — a direct consequence of the basis being a basis), it is injective (the only vector with all-zero coordinates is $\mathbf{0}$, by independence of the basis), and it is surjective (every coordinate list $(x_1,\dots,x_n)$ names the vector $\sum x_i\mathbf{b}_i$, by spanning). So $\Phi_B$ is an isomorphism, and $V\cong\mathbb{R}^n$.
The Key Insight — Every $n$-dimensional vector space is $\mathbb{R}^n$, once you pick a basis — that is the precise reason coordinates are allowed to work. The space of polynomials $\mathbb{P}_3$, the space of $2\times 2$ matrices, the solution space of a linear differential equation: each is just $\mathbb{R}^4$ (or $\mathbb{R}^n$) in disguise, with the coordinate map providing the disguise and its inverse removing it. This is why we may compute with coordinate vectors and matrices while thinking about abstract maps — the two are isomorphic, and the isomorphism is faithful. Dimension is the complete invariant: it alone determines a finite-dimensional space up to isomorphism. Nothing else about the space matters to its linear-algebraic structure.
This single theorem retroactively justifies the entire computational machinery of the book. When we represented an abstract map by a matrix in §35.4, we were secretly composing three isomorphisms-and-a-map: take $\mathbf{v}\in V$, coordinate-map it to $[\mathbf{v}]_B\in\mathbb{R}^n$, multiply by the matrix to land in $\mathbb{R}^m$, then inverse-coordinate-map back to $T(\mathbf{v})\in W$. The matrix is the abstract map, viewed through the coordinate isomorphisms on each side. Everything we did in $\mathbb{R}^n$ for thirty-four chapters applies to every finite-dimensional space, because every finite-dimensional space is $\mathbb{R}^n$.
Math-Major Sidebar — what isomorphism does and does not preserve, and the infinite-dimensional caveat. An isomorphism preserves all purely linear structure: dimension, subspace lattices, rank, kernel/image dimensions, linear independence, the entire similarity theory. It does not automatically preserve extra structure the spaces might carry. In particular, a bare vector-space isomorphism need not preserve an inner product (lengths and angles) — that requires an isometry (a unitary or orthogonal map, Chapter 21), a stronger notion; the coordinate map $\Phi_B$ is an isometry exactly when $B$ is orthonormal. Two warnings on scope. First, the classification theorem is a finite-dimensional result: in infinite dimensions, dimension (cardinality of a basis) still classifies vector spaces algebraically, but the more useful classification adds topology — e.g., all separable infinite-dimensional Hilbert spaces are isomorphic as Hilbert spaces to $\ell^2$ (Chapter 34), a far deeper statement than bare algebra. Second, the isomorphism $V\cong\mathbb{R}^n$ is not canonical: it depends on the arbitrary choice of basis $B$, and a different basis gives a genuinely different isomorphism. This is why coordinate-free arguments are prized in pure mathematics — they never smuggle in a basis-dependent artifact, and they make manifest which facts belong to the map rather than to our bookkeeping. The dual space $V^*$ of linear functionals $V\to\mathbb{R}$ is the classic cautionary tale: $V\cong V^*$ in finite dimensions (same dimension), but there is no natural isomorphism without choosing a basis — whereas $V\cong V^{**}$ is canonical. Dimension is the complete invariant; canonicity is a separate, subtler question.
FAQ: If every $n$-dimensional space is just $\mathbb{R}^n$, why not always work in $\mathbb{R}^n$?
Because the isomorphism, while faithful, throws away the meaning that the original space carried, and meaning is what guides modeling. The coordinate vector $(2,3,5,-1)\in\mathbb{R}^4$ is the polynomial $2+3x+5x^2-x^3$, but only the polynomial form tells you that differentiating it is natural, that evaluating it at a point is natural, that its degree matters. Stripped to a bare $\mathbb{R}^4$ tuple, those structures vanish from view. Working abstractly keeps the relevant operations in sight and stops you from inventing meaningless ones; descending to $\mathbb{R}^n$ via coordinates is the right move only when you want to compute, after the abstract structure has told you what to compute. The art of applied linear algebra is choosing the basis (hence the isomorphism) that makes the meaningful operations simplest — which is exactly the "find good coordinates" theme of §35.4.
35.9 Putting it together: operators, and where this goes
Step back and survey what we have built, because the view from here reaches across mathematics. We started from Chapter 7's slogan — a matrix is a function that transforms space — and removed the coordinates from it, arriving at the linear transformation $T:V\to W$ between abstract spaces. We found that choosing bases collapses any such map into a matrix $[T]_{C\leftarrow B}$ whose columns are the images of the basis vectors; that changing the bases transforms the matrix by similarity, leaving the operator's invariants untouched; that the kernel and image are the null and column spaces freed from coordinates; that rank–nullity conserves dimension between them; and that every finite-dimensional space is, up to a choice of basis, just $\mathbb{R}^n$. Throughout, differentiation served as the anchor: a linear operator whose matrix in the monomial basis displays the falling exponents, whose kernel is the constants, whose image is the lower-degree polynomials, and which is nilpotent because derivatives eventually exhaust a polynomial.
FAQ: Where do abstract linear transformations actually show up?
Everywhere the most important objects refuse to be finite lists of numbers. The clearest abstract vector space examples are operators on function spaces: differentiation and integration on polynomials or smooth functions (Case Study 1 turns a differential equation into an operator equation), the Fourier and Laplace transforms (linear maps between function spaces, Chapter 22), and the shift operator that underlies every digital filter and time-series model. In computer science, the encoding map of a linear error-correcting code is a linear transformation whose image is the codebook and whose companion parity-check map has that codebook as its kernel (Case Study 2). In physics, observables are linear operators on a state space, with measurable quantities appearing as eigenvalues. In data science, a neural-network layer's linear part, an embedding map, and a change of feature coordinates are all linear transformations. In each case the object is born abstract — defined by what it does, not by a chosen matrix — and we descend to a matrix only when we choose a basis to compute.
The deepest reason to care about this generalization is that the most important objects in physics, signal processing, and differential equations are linear operators on infinite-dimensional spaces, and they are studied with exactly the apparatus of this chapter. The derivative as an operator is the gateway: once $\frac{d}{dx}$ is a linear map, a linear differential equation $a_n y^{(n)}+\cdots+a_1 y'+a_0 y=f$ becomes an operator equation $L(y)=f$, where $L$ is a polynomial in the differentiation operator — and solving it is finding the preimage of $f$ under a linear map, with the kernel of $L$ giving the homogeneous solutions (the "$+C$" generalized). This is the viewpoint that makes Chapter 37's matrix exponential natural, and it is the entryway to the functional analysis Chapter 40 surveys.
Real-World Application — operators in quantum mechanics (physics). In quantum mechanics, every physical observable — position, momentum, energy — is a linear operator on the state space, exactly the abstract linear maps of this chapter, and the connection runs deep. Momentum is (up to a constant) the differentiation operator $\frac{d}{dx}$ acting on wavefunctions; energy is the Hamiltonian operator; and measurement outcomes are the eigenvalues of these operators — the invariant scalars that, as §35.7 showed, survive every change of basis and so are genuinely properties of the operator, not of a coordinate choice. Choosing a basis of the state space turns an operator into a matrix (Chapter 25's diagonalization is the physicist's "going to the energy eigenbasis"), and the operators in quantum mechanics are represented by Hermitian matrices (Chapter 27) precisely so that those measurable eigenvalues come out real. The differentiation operator you built as a matrix in §35.5 is, almost literally, the momentum operator of a particle — calculus, linear algebra, and physics are one subject here.
Historical Note. The abstract, coordinate-free conception of a vector space and the linear maps between them crystallized in the early twentieth century — Hermann Weyl's Space, Time, Matter (1918) and Stefan Banach's 1922 thesis on what became Banach spaces are landmarks — though the matrix-of-a-transformation calculus traces to Arthur Cayley's 1858 Memoir on the Theory of Matrices, which already treated a matrix as a single algebraic object representing a linear substitution. The recognition that differentiation and integration are linear operators, to be studied like matrices, was central to the operational calculus of Oliver Heaviside in the 1890s and was made rigorous in the functional analysis of the 1900s–1930s. [verify] (The broad arc — Cayley's matrices, Heaviside's operational calculus, the early-twentieth-century abstraction by Weyl, Banach, and others — is well attested; precise priority and dates vary across historical sources.)
Build the matrix of differentiation — your toolkit contribution
The chapter's whole thesis, that an abstract linear operator becomes an ordinary matrix once you choose a basis, is best cemented by building that matrix yourself and confirming it does exactly what hand differentiation does.
Build Your Toolkit. Add the matrix of a linear operator to your toolkit — pure Python in the implementation, numpy only to verify. - In
toolkit/linear_maps.py, implementdiff_matrix(n)returning the $(n+1)\times(n+1)$ matrix $D$ of the differentiation operator on $\mathbb{P}_n$ in the monomial basis $\{1,x,\dots,x^n\}$: it is all zeros except $D[j-1][j]=j$ for $j=1,\dots,n$ (mathematics indexes from $1$; in $0$-indexed Python the exponent $j$ sits in row $j-1$, column $j$). Build it from scratch with nested lists — no numpy. - Addapply_matrix(M, coeffs)that multiplies $M$ by a coefficient vector using your Chapter 7/8matmul, so thatapply_matrix(diff_matrix(3), [2,3,5,-1])returns[3,10,-3,0]— the coordinate vector of $\frac{d}{dx}(2+3x+5x^2-x^3)=3+10x-3x^2$. - Verify three things against numpy. (1)diff_matrix(3)matches the hand matrix of §35.5. (2) For a handful of random integer coefficient vectors,apply_matrix(diff_matrix(n), c)matches the coefficient vector of the symbolic derivative (differentiate by hand or withnumpy.polynomial.polynomial.polyder). (3) Nilpotence: the $(n+1)$-th matrix power $D^{\,n+1}$ is the zero matrix — compute it with repeatedmatmuland confirm withnumpy.linalg.matrix_power(D, n+1). The point you are proving in code is the point of the chapter: an operation from calculus, made into a matrix in a chosen basis, multiplies coordinate vectors exactly as the operator acts — and its nilpotence is "you can only differentiate a degree-$n$ polynomial $n$ times before nothing is left," written in linear algebra.
35.10 Summary and the road ahead
This chapter completed the arc the book opened with. A linear transformation $T:V\to W$ between abstract vector spaces is a map satisfying just two rules — additivity $T(\mathbf{u}+\mathbf{v})=T(\mathbf{u})+T(\mathbf{v})$ and homogeneity $T(c\mathbf{v})=cT(\mathbf{v})$ — the very rules that made a matrix linear in Chapter 7, now stated without coordinates. We grounded every step in the anchor, differentiation, whose two famous calculus rules are those two axioms. Choosing a basis $B$ of the domain and $C$ of the codomain produces the matrix of $T$, $[T]_{C\leftarrow B}$, whose $j$-th column is the image of the $j$-th basis vector in $C$-coordinates, satisfying $[T(\mathbf{v})]_C=[T]_{C\leftarrow B}[\mathbf{v}]_B$. That matrix is basis-dependent: change the basis and it transforms by the similarity $P^{-1}[T]_BP$ of Chapter 16 — which we proved — so a single operator has many matrices, all sharing determinant, trace, rank, and eigenvalues. The kernel and image of $T$ are the null space and column space of Chapter 13 freed from coordinates, governing injectivity ($\ker T=\{\mathbf{0}\}$) and surjectivity ($\operatorname{im}T=W$). The abstract Rank–Nullity theorem, $\dim\ker T+\dim\operatorname{im}T=\dim V$, conserves dimension — crushed plus surviving equals the whole domain — which on the anchor reads $1+3=4$. And the classification theorem says every $n$-dimensional space is isomorphic to $\mathbb{R}^n$ via the coordinate map, which is exactly why coordinates work: we think abstractly and compute in $\mathbb{R}^n$, faithfully, because the two are the same space relabeled.
So what is the single thing to remember from this chapter? The matrix was never the transformation. A linear map $T:V\to W$ is the real object; choosing bases merely photographs it as a matrix, and a different basis gives a different photograph of the same map. Kernel and image are the null and column spaces; rank–nullity conserves dimension; and every finite-dimensional space is $\mathbb{R}^n$ in disguise. Hold onto the map, treat its matrix as one coordinate-dependent view, and the entire book reorganizes around the transformation as the thing that is real.
Where this goes. We have now freed both halves of coordinate-free linear algebra — the inner product in Chapter 34, the linear map here — and the rest of Part VII presses the operator viewpoint forward. Chapter 36 confronts the operators whose matrices cannot be made diagonal no matter which basis you choose: the defective case, repaired with generalized eigenvectors into the Jordan normal form, the closest-to-diagonal matrix an operator will allow — and the nilpotent operator $D$ of this chapter is the purest seed of that theory. Chapter 37 treats differentiation's grandest application, the matrix exponential $e^{At}$ that solves the operator equation $\mathbf{x}'=A\mathbf{x}$, where the differentiation operator and the matrix viewpoint of this chapter fuse into the engine of differential equations. The thread is unbroken: the transformation is the real object, the matrix is its shadow, and from here on we study the operator itself.
The Key Insight — Linear algebra is the study of linear transformations, and a matrix is merely how we represent one in a chosen pair of bases. Strip the coordinates and the truth stands clear: $T:V\to W$ is the invariant; $[T]_{C\leftarrow B}$ is a basis-dependent photograph; kernel and image are the null and column spaces; rank–nullity conserves dimension; and every $n$-dimensional space is $\mathbb{R}^n$ in disguise. The transformation was always the noun, and the matrix was always its shadow — which is the first theme of this book, finally stated in full.