> Learning paths. Math majors — read everything, especially the similarity derivation in §16.5 and the Math-Major Sidebar on conjugation and equivalence relations in §16.7; the change-of-basis matrix and the relation $B = P^{-1}AP$ are the precise...
Prerequisites
- chapter-15-dimension-basis-coordinates
Learning Objectives
- Build the change-of-basis matrix P whose columns are the new basis vectors written in the old coordinates, and explain why those columns are the natural choice.
- Convert a coordinate vector from one basis to another with [v]_new = P^{-1} [v]_old, and run the round trip back to confirm it recovers the original coordinates.
- Derive and apply the similarity formula B = P^{-1} A P that rewrites the matrix of a transformation in a new basis, and explain why the underlying transformation is unchanged.
- Diagnose the direction of P versus P^{-1} and avoid the most common change-of-basis sign-and-direction error.
- Show, with the recurring 2D visualizer, that the same transformation looks different on the standard grid and on a skewed basis grid, and that trace and determinant are basis-independent.
- Implement change_basis_matrix(old_basis, new_basis) and a coordinate converter from scratch, and verify a round trip returns the original coordinates.
In This Chapter
- 16.1 Why should the same vector have different coordinates?
- 16.2 What is the change-of-basis matrix?
- 16.3 How do you convert a coordinate vector from one basis to another?
- 16.4 What does a change of basis look like in the visualizer?
- 16.5 How does the matrix of a transformation change under a change of basis?
- 16.6 How does change of basis work in three dimensions?
- 16.7 Why is this the same transformation? (the deep idea, made rigorous)
- 16.8 Where does change of basis show up in the real world?
- 16.9 How do we build a change of basis from scratch?
- 16.10 What should you carry forward from this chapter?
Change of Basis: Same Vector, Different Coordinate Systems
Learning paths. Math majors — read everything, especially the similarity derivation in §16.5 and the Math-Major Sidebar on conjugation and equivalence relations in §16.7; the change-of-basis matrix and the relation $B = P^{-1}AP$ are the precise statements that the rest of Part V (eigenvalues, diagonalization) rests upon. CS / Data Science — focus on the Geometric Intuition callouts, the coordinate-conversion recipe in §16.3, the visualizer re-gridding in §16.4, the
numpyverifications, and the two case studies (a better basis for data; a rotated frame in robotics); the idea that "the right basis makes the problem easy" is the entire premise of PCA and embeddings. Physics / Engineering — focus on the geometry of "same vector, different coordinates," the invariance of trace and determinant, and the rotated-frame application in Case Study 2. This chapter assumes the dimension, basis, and coordinate ideas of Chapter 15 and the matrix-as-function viewpoint of Chapter 7.
16.1 Why should the same vector have different coordinates?
Hold up two fingers in front of you and call the tip of one of them a point in space. That point does not change when you decide to describe it differently — and yet every number you would use to describe it depends on a choice you make. Stand facing north and you might say the point is "two steps forward, one step left." Turn forty-five degrees and the very same point becomes "roughly two and a half steps forward, half a step back." The point did not move. Your coordinate system moved, and so every coordinate moved with it. Change of basis is the precise mathematics of this everyday fact, and it is the technical heart of the most important idea in this whole book: a list of numbers is never the vector, only a description of it relative to a chosen basis.
We laid the groundwork for this in Chapter 15. There you learned that a basis is a minimal set of building-block directions for a space, and that the coordinate vector $[\mathbf{v}]_{\mathcal{B}}$ records the unique recipe — the weights $c_1, c_2, \dots, c_n$ — for building $\mathbf{v}$ out of the basis vectors $\mathbf{b}_1, \dots, \mathbf{b}_n$: $$\mathbf{v} = c_1\mathbf{b}_1 + c_2\mathbf{b}_2 + \dots + c_n\mathbf{b}_n, \qquad [\mathbf{v}]_{\mathcal{B}} = (c_1, c_2, \dots, c_n).$$ The same arrow $\mathbf{v}$ in the plane gets one coordinate vector relative to the standard basis $\{\mathbf{e}_1, \mathbf{e}_2\}$ and a completely different coordinate vector relative to some skewed basis — even though the geometric arrow is identical. This chapter answers the obvious follow-up question: given the coordinates in one basis, how do we compute the coordinates in another? And then it asks the same question, one level up, of transformations: if a matrix $A$ describes a transformation in one basis, what matrix describes the same transformation in another basis?
The Key Insight — Numbers are not vectors; numbers are descriptions of vectors in a chosen coordinate system. The arrow $\mathbf{v}$, the transformation "rotate by 90°," the act of projecting onto a plane — these are coordinate-free objects. A coordinate vector and a matrix are the shadows those objects cast once you pick a basis to look through. Change the basis and every shadow changes shape, but the object casting it does not budge. This single distinction — object versus representation — is the threshold concept that reorganizes everything that follows, and it is recurring theme #1 of this book made operational.
This is not abstraction for its own sake. The entire reason advanced linear algebra works is that you are free to choose the coordinate system, and a clever choice can turn a hopeless-looking problem into a trivial one. A tangled transformation, written in the wrong basis, looks like a dense matrix of unrelated numbers; written in the right basis — its eigenbasis, which Chapter 23 will find for us — the very same transformation becomes a diagonal matrix that just stretches each axis independently. Principal Component Analysis (Chapter 32) is nothing but a change of basis chosen so that data looks as simple as possible. Quantum mechanics computes the same physical prediction in a position basis or a momentum basis depending on which is easier. In every one of these, the transformation never changes — only the coordinate system we describe it in does. Learn to change basis fluently and you hold the master key to the second half of this book.
There is a reason this chapter sits where it does, at the close of Part III. You have spent three parts learning to read a matrix — what its columns mean (Chapter 7), what it reaches and destroys (Chapters 13–14), how many independent directions it carries (Chapter 15). Change of basis is the moment those threads converge into a single liberating realization: the matrix was never the point. A matrix is one of infinitely many descriptions of a transformation, no more privileged than the standard basis that happened to produce it. Once you internalize that the entries of a matrix are negotiable — that you may re-grid the space and watch them rearrange — you stop treating "the matrix of $T$" as a fixed fact and start treating it as a choice you can optimize. Every major decomposition in Parts V and VI (diagonalization, the spectral theorem, the SVD) is, at heart, a particularly clever exercise of that choice. This chapter teaches the choice itself, stripped to its essentials.
A small warning about language, because it trips up careful readers. Throughout this chapter we speak of an "old basis" and a "new basis," and we usually take the old basis to be the standard one so that "old coordinates" just means "the obvious coordinates." But there is nothing special about the standard basis except familiarity — it is simply the basis in which $\mathbf{e}_1 = (1,0,\dots)$ and so on, the basis our numbers are born in. The whole machinery works between any two bases (we handle that general case in §16.3.1). When you see "$P$ converts new to old," read it as "$P$ converts the basis you are switching to into the basis you are switching from" — the words "old" and "new" are bookkeeping labels for the direction of translation, not claims that one basis is more real than the other. Both are equally legitimate coordinate systems for the same underlying space.
Let's begin, as always, with the picture: the same arrow, seen through two grids.
16.1.1 The same arrow on two grids
Picture the plane with the ordinary square grid drawn on it — the grid of the standard basis $\mathbf{e}_1 = (1,0)$ and $\mathbf{e}_2 = (0,1)$. An arrow reaching to the point $(4, 2)$ is read off by counting grid lines: four to the right, two up. The coordinates $(4, 2)$ are literally a count of grid steps, and the grid is built from the basis vectors.
Now erase the square grid and draw a new one, built from two different basis vectors. Take $\mathbf{b}_1 = (1, 1)$ (a diagonal arrow pointing up-and-right) and $\mathbf{b}_2 = (-1, 1)$ (a diagonal arrow pointing up-and-left). These two arrows are not parallel, so they form a basis — a perfectly good, if tilted, coordinate system. Their grid is a lattice of diamonds rather than squares. The same physical arrow reaching to $(4, 2)$ now needs a new recipe: how many $\mathbf{b}_1$'s and how many $\mathbf{b}_2$'s add up to it? We will compute in §16.2 that the answer is three of $\mathbf{b}_1$ and negative one of $\mathbf{b}_2$, so in the new basis the very same arrow has coordinate vector $(3, -1)$.
Geometric Intuition — The arrow to $(4,2)$ is a fixed object pinned to the plane. The two coordinate vectors $(4, 2)$ and $(3, -1)$ are not two different arrows — they are two different addresses for the one arrow, written in two different addressing schemes. The standard grid says "4 right, 2 up." The skewed grid says "3 of $\mathbf{b}_1$, minus 1 of $\mathbf{b}_2$." Both addresses lead to exactly the same point. Changing basis is changing the addressing scheme; the city does not move when you switch from street numbers to GPS coordinates.
Hold that image — one arrow, two grids, two addresses — because the rest of the chapter is a careful unpacking of it. First we will find the matrix that translates one address into the other (the change-of-basis matrix, §16.2–16.3). Then we will discover that transformations obey an analogous, slightly subtler translation law (similarity, §16.5). And throughout, the recurring 2D visualizer will let us literally re-grid a transformation and watch its matrix change while its action stays put.
16.2 What is the change-of-basis matrix?
To translate addresses between two bases, we need a machine: a matrix that eats coordinates in one basis and spits out coordinates in another. Building that machine is wonderfully direct, and the construction reveals exactly why its columns are what they are.
Let us set up the two bases cleanly. We have an old basis — for now, the standard basis $\{\mathbf{e}_1, \mathbf{e}_2\}$, the one whose coordinates are "obvious" — and a new basis $\{\mathbf{b}_1, \mathbf{b}_2\}$ that we want to switch into. The fundamental relationship we exploit is the one from Chapter 15: a coordinate vector is just a recipe for combining basis vectors. So if a vector $\mathbf{v}$ has new coordinates $[\mathbf{v}]_{\text{new}} = (c_1, c_2)$, that means $$\mathbf{v} = c_1\mathbf{b}_1 + c_2\mathbf{b}_2.$$ Now read the right-hand side in the old (standard) coordinates. The standard coordinates of $\mathbf{v}$ are just $\mathbf{v}$ itself, and the standard coordinates of $\mathbf{b}_1, \mathbf{b}_2$ are simply the entries of those vectors. Writing the linear combination as a matrix–vector product — exactly the columns-times-weights view from Chapter 7 — gives $$\underbrace{\mathbf{v}}_{[\mathbf{v}]_{\text{old}}} = c_1\mathbf{b}_1 + c_2\mathbf{b}_2 = \underbrace{\big[\,\mathbf{b}_1 \mid \mathbf{b}_2\,\big]}_{P} \begin{bmatrix} c_1 \\ c_2 \end{bmatrix} = P\,[\mathbf{v}]_{\text{new}}.$$ There it is. The matrix $P$ whose columns are the new basis vectors $\mathbf{b}_1, \mathbf{b}_2$ written in the old coordinates converts new coordinates into old ones.
The change-of-basis matrix $P$ from the new basis to the old basis is the matrix whose columns are the new basis vectors expressed in the old coordinates: $$P = \big[\, [\mathbf{b}_1]_{\text{old}} \mid [\mathbf{b}_2]_{\text{old}} \mid \cdots \mid [\mathbf{b}_n]_{\text{old}} \,\big], \qquad [\mathbf{v}]_{\text{old}} = P\,[\mathbf{v}]_{\text{new}}.$$ When the old basis is the standard basis, $[\mathbf{b}_i]_{\text{old}} = \mathbf{b}_i$, so $P$ is simply the matrix whose columns are the new basis vectors.
Stare at the direction of that equation for a moment, because it is the source of nearly every error in this material. The matrix $P$ takes coordinates in the new basis and returns coordinates in the old basis — it converts new $\to$ old, even though we built it to "change to the new basis." That feels backwards, and there is a clean reason it is not, which we will make precise in the Common Pitfall below. For now, anchor on the construction: $P$'s columns are the new basis vectors in old coordinates, and $P$ maps new coordinates to old.
Geometric Intuition — Why are the columns the new basis vectors? Recall from Chapter 7 that the columns of any matrix are the images of the standard basis vectors. The new coordinate $(1, 0)$ means "one step along $\mathbf{b}_1$, zero along $\mathbf{b}_2$" — i.e. the vector $\mathbf{b}_1$ itself. So $P$ must send the new-coordinate vector $(1,0)$ to the old-coordinate description of $\mathbf{b}_1$. A matrix that sends $(1,0)$ to $\mathbf{b}_1$ and $(0,1)$ to $\mathbf{b}_2$ is exactly the matrix with columns $\mathbf{b}_1, \mathbf{b}_2$. The change-of-basis matrix is just the matrix that "interprets" new-basis instructions back in old-basis language.
Let's build it for our running example. The new basis is $\mathbf{b}_1 = (1, 1)$ and $\mathbf{b}_2 = (-1, 1)$, given in standard coordinates. Stacking them as columns: $$P = \begin{bmatrix} 1 & -1 \\ 1 & 1 \end{bmatrix}.$$ This single matrix now converts any new-basis coordinate vector into its standard-basis equivalent. As a sanity check, feed it the new coordinates $(3, -1)$ that we claimed describe the arrow to $(4, 2)$: $$P\begin{bmatrix} 3 \\ -1 \end{bmatrix} = \begin{bmatrix} 1 & -1 \\ 1 & 1 \end{bmatrix}\begin{bmatrix} 3 \\ -1 \end{bmatrix} = \begin{bmatrix} 3\cdot 1 + (-1)(-1) \\ 3\cdot 1 + (-1)(1) \end{bmatrix} = \begin{bmatrix} 4 \\ 2 \end{bmatrix}. \checkmark$$ The standard coordinates come back as $(4, 2)$, confirming that $(3, -1)$ is indeed the address of our arrow in the new basis. We verified the claim; in the next section we learn to compute it directly with $P^{-1}$.
Computational Note — Building $P$ in
numpyis one line:P = np.column_stack([b1, b2]). The functionnp.column_stackstacks 1-D arrays as the columns of a matrix, which is exactly what the definition asks for. Beware the lazy alternativenp.array([b1, b2]), which stacks them as rows — giving you $P^{\mathsf{T}}$, a different (and usually wrong) matrix. The columns-are-new-basis-vectors rule is a literal instruction for the code.
16.3 How do you convert a coordinate vector from one basis to another?
The matrix $P$ converts new coordinates to old. But the question we usually face is the reverse: we have the old (standard) coordinates of a vector and we want its new coordinates. We want to go old $\to$ new. Since $P$ does new $\to$ old, the conversion we want is precisely its inverse.
Take the defining equation $[\mathbf{v}]_{\text{old}} = P\,[\mathbf{v}]_{\text{new}}$ and solve for the new coordinates. Because the columns of $P$ are a basis, they are linearly independent, so $P$ is invertible (Chapter 9 — a square matrix with independent columns has an inverse). Multiply both sides on the left by $P^{-1}$: $$P^{-1}[\mathbf{v}]_{\text{old}} = P^{-1}P\,[\mathbf{v}]_{\text{new}} = [\mathbf{v}]_{\text{new}}.$$ This is the central formula of the chapter.
The Key Insight — To convert a coordinate vector from the old basis to the new basis, multiply by the inverse of the change-of-basis matrix: $$\boxed{\;[\mathbf{v}]_{\text{new}} = P^{-1}\,[\mathbf{v}]_{\text{old}}\;}$$ where $P$ has the new basis vectors as its columns (in old coordinates). The matrix $P$ goes new $\to$ old; its inverse $P^{-1}$ goes old $\to$ new. The fact that $P$ is always invertible is exactly the fact that a basis is linearly independent — there is no change of basis without an inverse.
Let's run our example all the way through by hand. Our change-of-basis matrix and its inverse are
$$P = \begin{bmatrix} 1 & -1 \\ 1 & 1 \end{bmatrix}, \qquad P^{-1} = \frac{1}{\det P}\begin{bmatrix} 1 & 1 \\ -1 & 1 \end{bmatrix} = \frac{1}{2}\begin{bmatrix} 1 & 1 \\ -1 & 1 \end{bmatrix} = \begin{bmatrix} 0.5 & 0.5 \\ -0.5 & 0.5 \end{bmatrix},$$
using the $2\times 2$ inverse formula from Chapter 9 (swap the diagonal, negate the off-diagonal, divide by the determinant $\det P = (1)(1) - (-1)(1) = 2$). Now convert the standard coordinates $(4, 2)$ to the new basis:
$$[\mathbf{v}]_{\text{new}} = P^{-1}\begin{bmatrix} 4 \\ 2 \end{bmatrix} = \begin{bmatrix} 0.5 & 0.5 \\ -0.5 & 0.5 \end{bmatrix}\begin{bmatrix} 4 \\ 2 \end{bmatrix} = \begin{bmatrix} 0.5\cdot 4 + 0.5\cdot 2 \\ -0.5\cdot 4 + 0.5\cdot 2 \end{bmatrix} = \begin{bmatrix} 3 \\ -1 \end{bmatrix}.$$
The new coordinates are $(3, -1)$ — exactly the address we promised back in §16.1. The same arrow, $(4,2)$ on the square grid, is $(3, -1)$ on the diamond grid. Now confirm with numpy, and explicitly run the round trip old $\to$ new $\to$ old to check that we land back where we started:
# Change of basis: convert standard coords (4,2) to the new basis {(1,1),(-1,1)}.
import numpy as np
b1, b2 = np.array([1., 1.]), np.array([-1., 1.]) # new basis, in OLD (standard) coords
P = np.column_stack([b1, b2]) # columns ARE the new basis vectors
print("P =\n", P) # [[ 1. -1.] [ 1. 1.]]
print("P^-1 =\n", np.linalg.inv(P)) # [[ 0.5 0.5] [-0.5 0.5]]
v_old = np.array([4., 2.]) # the arrow, in old (standard) coords
v_new = np.linalg.inv(P) @ v_old # OLD -> NEW
print("[v]_new = P^-1 [v]_old =", v_new) # [ 3. -1.]
v_back = P @ v_new # NEW -> OLD (round trip)
print("round trip P [v]_new =", v_back, # [4. 2.] -- recovered!
" ok =", np.allclose(v_back, v_old)) # True
The output reads P = [[1. -1.] [1. 1.]], P^-1 = [[0.5 0.5] [-0.5 0.5]], then [v]_new = P^-1 [v]_old = [3. -1.], and the round trip P [v]_new = [4. 2.] with ok = True. The round trip is not a formality — it is the definition of a correct change of basis: converting to the new basis and back must be the identity, because $P P^{-1} = I$. Whenever you write your own coordinate converter (see the Build Your Toolkit callout in §16.9), the round-trip check is the single most valuable test you can run.
Common Pitfall — "To change INTO the new basis I multiply by $P$." This is the error that derails everyone, and it is worth slowing down for. The matrix $P$ — whose columns are the new basis vectors — converts new coordinates to old, not old to new. To go into the new basis (old $\to$ new), you need $P^{-1}$. The mnemonic that fixes it permanently: $P$ holds the new basis vectors, so $P$ knows how to "speak old" given a "new" instruction — feed it new, it answers in old. To reverse the translation, invert it. If your converted coordinates look wrong, the overwhelmingly likely cause is that you used $P$ where you needed $P^{-1}$ (or stacked the basis as rows instead of columns). Test the direction on a basis vector: $\mathbf{b}_1$ should have new coordinates $(1, 0)$ exactly — if your formula does not give that, the direction is flipped.
Check Your Understanding — Using the same $P$ above, what are the new-basis coordinates of the standard vector $\mathbf{v} = (2, 0)$? Compute $P^{-1}\mathbf{v}$ by hand.
Answer
$[\mathbf{v}]_{\text{new}} = P^{-1}\begin{bmatrix} 2 \\ 0 \end{bmatrix} = \begin{bmatrix} 0.5 & 0.5 \\ -0.5 & 0.5 \end{bmatrix}\begin{bmatrix} 2 \\ 0 \end{bmatrix} = \begin{bmatrix} 1 \\ -1 \end{bmatrix}$. So the arrow $(2,0)$ is described in the skewed basis as $1\cdot\mathbf{b}_1 + (-1)\cdot\mathbf{b}_2 = (1,1) - (-1,1) = (2, 0)$ — check. The arrow pointing two steps east is, in the diamond grid, "one of $\mathbf{b}_1$ minus one of $\mathbf{b}_2$." Same arrow, different address.
16.3.1 Converting between two non-standard bases
So far the old basis was the friendly standard one. What if neither basis is standard — you have coordinates in some basis $\mathcal{U}$ and want them in another basis $\mathcal{W}$? The same idea chains together. Let $P_{\mathcal{U}}$ be the matrix whose columns are the $\mathcal{U}$-basis vectors in standard coordinates, and $P_{\mathcal{W}}$ likewise for $\mathcal{W}$. Then $P_{\mathcal{U}}$ converts $\mathcal{U}$-coordinates $\to$ standard, and $P_{\mathcal{W}}^{-1}$ converts standard $\to$ $\mathcal{W}$-coordinates. Compose them: $$[\mathbf{v}]_{\mathcal{W}} = P_{\mathcal{W}}^{-1}\,P_{\mathcal{U}}\,[\mathbf{v}]_{\mathcal{U}}.$$ Read right to left, as always with composition (Chapter 8): first $P_{\mathcal{U}}$ lifts the $\mathcal{U}$-coordinates up to the common standard language, then $P_{\mathcal{W}}^{-1}$ brings them down into $\mathcal{W}$-coordinates. The standard basis is the hub through which the translation routes. The single matrix $M = P_{\mathcal{W}}^{-1}P_{\mathcal{U}}$ is the change-of-basis matrix directly from $\mathcal{U}$ to $\mathcal{W}$.
# Convert between two NON-standard bases via the standard-basis hub.
import numpy as np
u1, u2 = np.array([2., 0.]), np.array([0., 1.]) # old basis U (in standard coords)
w1, w2 = np.array([1., 1.]), np.array([-1., 1.]) # new basis W (in standard coords)
Pu, Pw = np.column_stack([u1, u2]), np.column_stack([w1, w2])
M = np.linalg.inv(Pw) @ Pu # U-coords -> W-coords directly
print("M = Pw^-1 Pu =\n", M) # [[ 1. 0.5] [-1. 0.5]]
v_U = np.array([1., 1.]) # coords of v in basis U
v_W = M @ v_U # same vector, coords in basis W
print("[v]_W =", v_W) # [ 1.5 -0.5]
# Sanity: the actual standard vector is Pu @ v_U = (2,1); and Pw^-1 (2,1) = (1.5,-0.5).
print("standard vector =", Pu @ v_U) # [2. 1.]
The output is M = [[1. 0.5] [-1. 0.5]] and [v]_W = [1.5 -0.5]. The vector that is $(1,1)$ in basis $\mathcal{U}$ is the physical arrow $(2, 1)$ in standard coordinates, and that same arrow is $(1.5, -0.5)$ in basis $\mathcal{W}$. Three addresses, one arrow. Notice the special case: when the old basis is standard, $P_{\mathcal{U}} = I$ and the formula collapses to $[\mathbf{v}]_{\mathcal{W}} = P_{\mathcal{W}}^{-1}[\mathbf{v}]_{\text{std}}$ — exactly the boxed formula of §16.3. Everything is one idea seen at different angles.
16.3.2 A second worked conversion: a genuinely skewed basis
Our diamond basis $\{(1,1),(-1,1)\}$ was perpendicular — its two vectors meet at a right angle, which made the picture especially tidy and (we will see) made $P^{-1}$ look almost like $P^{\mathsf{T}}$. But a basis need not be perpendicular; it only needs to be independent. To make sure the recipe does not secretly rely on right angles, let's convert in a deliberately skewed (oblique) basis whose vectors are neither perpendicular nor equal-length: $$\mathbf{b}_1 = \begin{bmatrix} 2 \\ 1 \end{bmatrix}, \qquad \mathbf{b}_2 = \begin{bmatrix} 1 \\ 3 \end{bmatrix}.$$ These are independent (neither is a multiple of the other), so they form a basis. Their grid is a lattice of slanted parallelograms — not diamonds, not squares. Stack them as columns and invert, using $\det P = (2)(3) - (1)(1) = 5$ and the $2\times 2$ inverse formula: $$P = \begin{bmatrix} 2 & 1 \\ 1 & 3 \end{bmatrix}, \qquad P^{-1} = \frac{1}{5}\begin{bmatrix} 3 & -1 \\ -1 & 2 \end{bmatrix} = \begin{bmatrix} 0.6 & -0.2 \\ -0.2 & 0.4 \end{bmatrix}.$$ Convert the standard vector $\mathbf{v} = (5, 5)$ to this skewed basis: $$[\mathbf{v}]_{\text{new}} = P^{-1}\begin{bmatrix} 5 \\ 5 \end{bmatrix} = \begin{bmatrix} 0.6 & -0.2 \\ -0.2 & 0.4 \end{bmatrix}\begin{bmatrix} 5 \\ 5 \end{bmatrix} = \begin{bmatrix} 3 - 1 \\ -1 + 2 \end{bmatrix} = \begin{bmatrix} 2 \\ 1 \end{bmatrix}.$$ So in the skewed basis, the arrow to $(5,5)$ is described as $2\mathbf{b}_1 + 1\mathbf{b}_2$. Check it: $2(2,1) + 1(1,3) = (4+1,\ 2+3) = (5,5)$. ✓
# A skewed (non-perpendicular) basis: the recipe does not need right angles.
import numpy as np
b1, b2 = np.array([2., 1.]), np.array([1., 3.]) # oblique basis (not perpendicular)
P = np.column_stack([b1, b2])
print("det P =", round(float(np.linalg.det(P)))) # 5
v_old = np.array([5., 5.])
v_new = np.linalg.inv(P) @ v_old
print("[v]_new =", v_new) # [2. 1.]
print("round trip =", P @ v_new, # [5. 5.]
" ok =", np.allclose(P @ v_new, v_old)) # True
The output is det P = 5, [v]_new = [2. 1.], and the round trip recovers [5. 5.]. Nothing about the construction assumed perpendicularity: $P$'s columns are the new basis vectors, $P^{-1}$ converts old to new, and the round trip closes — exactly as before. Orthogonality is a convenience, not a requirement, for change of basis. (When the basis is orthonormal, a beautiful simplification appears — $P^{-1} = P^{\mathsf{T}}$, so the inversion becomes a free transpose — but that is the special subject of orthogonal matrices in Chapter 21, and it is a bonus, not a prerequisite. For a general basis you genuinely invert $P$.)
16.4 What does a change of basis look like in the visualizer?
This is the chapter's promised return of the recurring 2D visualizer from Chapter 1, and it is the most important picture in the chapter. The visualizer draws the unit square and its image under a $2 \times 2$ matrix, along with the images of the basis vectors. Its standard grid is the standard basis. We are going to use it to make the abstract claim of §16.1 visible: the very same transformation, drawn against two different basis grids, has two different matrices.
First, recall the frozen tool exactly as introduced in Chapter 1 — we reuse it verbatim, changing only the matrix and the narration, so that every figure in the book looks identical:
# toolkit/visualizer.py — the recurring 2D transformation visualizer.
# Shows what a 2x2 matrix A does to the unit square and the basis vectors.
import numpy as np
import matplotlib.pyplot as plt
def visualize_2d(A, title="", ax=None):
"""Plot the action of 2x2 matrix A on the unit square and i-hat, j-hat."""
A = np.asarray(A, dtype=float)
square = np.array([[0, 1, 1, 0, 0],
[0, 0, 1, 1, 0]]) # unit-square corners (closed)
out = A @ square # transformed square
e1, e2 = A @ np.array([1, 0]), A @ np.array([0, 1]) # images of basis vectors
if ax is None:
_, ax = plt.subplots(figsize=(5, 5))
ax.plot(square[0], square[1], "b--", lw=1, label="input (unit square)")
ax.fill(out[0], out[1], alpha=0.25, color="C1")
ax.plot(out[0], out[1], "C1-", lw=2, label="A . (unit square)")
ax.arrow(0, 0, *e1, color="C3", width=0.02, length_includes_head=True) # A e1
ax.arrow(0, 0, *e2, color="C2", width=0.02, length_includes_head=True) # A e2
ax.axhline(0, color="gray", lw=0.5)
ax.axvline(0, color="gray", lw=0.5)
ax.set_aspect("equal")
ax.grid(True, alpha=0.3)
ax.set_title(title or f"det = {np.linalg.det(A):.2f}")
ax.legend(loc="best", fontsize=8)
return ax
# Example: a horizontal shear
# visualize_2d([[1, 1], [0, 1]], title="Shear")
# plt.show()
Now choose a concrete transformation to look at: the one represented in the standard basis by $$A = \begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix}.$$ Geometrically (Chapter 7) this stretches space — it pulls the unit square into a tilted parallelogram, pushing outward most strongly along the diagonal direction $(1,1)$. That outward-along-the-diagonal behavior is a clue we will cash in shortly. Drawn against the standard square grid with the visualizer, it produces Figure 16.1.
# Using the visualizer from Chapter 1: the transformation A on the STANDARD grid.
import numpy as np
from toolkit.visualizer import visualize_2d # the FROZEN 2D visualizer (Ch.1)
A = np.array([[2, 1],
[1, 2]], dtype=float)
print("det(A) =", round(float(np.linalg.det(A)), 2)) # 3.0
ax = visualize_2d(A, title="A = [[2,1],[1,2]] on the standard grid")
# plt.show()
Figure 16.1 — The transformation on the standard grid. The blue dashed unit square is stretched by $A$ into the orange parallelogram; the red and green arrows are the images of $\mathbf{e}_1 = (1,0)$ and $\mathbf{e}_2 = (0,1)$, landing at the columns $(2,1)$ and $(1,2)$. The determinant is $3$, so areas triple. Alt-text: a unit square mapped to a parallelogram stretched along the up-right diagonal, with the two transformed basis arrows pointing to (2,1) and (1,2).
Here is the move that makes the chapter. Choose a new basis aligned to what the transformation actually does — the diagonal directions $\mathbf{b}_1 = (1,1)$ and $\mathbf{b}_2 = (-1,1)$, our running pair. Against this grid, the transformation will turn out to have a much simpler matrix. We compute that matrix properly in §16.5 (it is the similarity $B = P^{-1}AP$); here we just look at it. Re-grid the same transformation by feeding the visualizer the new matrix $B$:
# Re-gridding: the SAME transformation A, viewed in the new basis {(1,1),(-1,1)}.
import numpy as np
from toolkit.visualizer import visualize_2d # SAME frozen visualizer, verbatim
A = np.array([[2, 1], [1, 2]], dtype=float)
P = np.column_stack([np.array([1., 1.]), np.array([-1., 1.])]) # new basis as columns
B = np.linalg.inv(P) @ A @ P # A re-expressed in the new basis
print("B = P^-1 A P =\n", B) # [[3. 0.] [0. 1.]] -- DIAGONAL!
ax = visualize_2d(B, title="Same transformation, new-basis coordinates: B")
# plt.show()
The printed matrix is $$B = P^{-1}AP = \begin{bmatrix} 3 & 0 \\ 0 & 1 \end{bmatrix}.$$ The dense, off-diagonal-laden $A = \begin{bmatrix}2&1\\1&2\end{bmatrix}$ has become the diagonal matrix $\begin{bmatrix}3&0\\0&1\end{bmatrix}$. In the new coordinate system, the transformation does something almost embarrassingly simple: it stretches the first new-axis by a factor of $3$ and leaves the second new-axis unchanged. That is the whole transformation, laid bare. The complexity in $A$ was never in the transformation — it was an artifact of describing a diagonal-along-the-diagonals stretch using the wrong (standard) grid.
Figure 16.2 — The same transformation, re-gridded. The visualizer applied to $B = \begin{bmatrix}3&0\\0&1\end{bmatrix}$: the unit square stretches to a $3 \times 1$ rectangle along the new axes — a pure axis-aligned scaling. This is the identical transformation as Figure 16.1, merely described in the basis $\{(1,1),(-1,1)\}$. The determinant is still $3$ (areas still triple). Alt-text: a unit square mapped to a 3-by-1 rectangle, the two transformed basis arrows pointing straight along the axes with lengths 3 and 1.
Geometric Intuition — Figures 16.1 and 16.2 are pictures of the same physical stretch of the plane. The transformation pulls space outward along $(1,1)$ by a factor of $3$ and does nothing along $(-1,1)$. On the standard grid (Figure 16.1) this diagonal stretch looks like a lopsided parallelogram with cross-terms in the matrix. On a grid aligned to the stretch directions (Figure 16.2) it looks like what it is: a clean $3\times 1$ rectangle, a diagonal matrix. We did not change the transformation. We changed the graph paper, and the right graph paper revealed the transformation's true, simple nature. This is the entire promise of diagonalization (Chapter 25): find the basis in which the matrix is diagonal, and you have found the basis in which the transformation is obvious.
It is no accident that the magic basis $\{(1,1),(-1,1)\}$ made the matrix diagonal. Those two directions are precisely the ones the transformation stretches without rotating — it scales $(1,1)$ by $3$ and $(-1,1)$ by $1$, never tilting them. Directions a matrix scales-without-rotating are called eigenvectors, and the scaling factors are eigenvalues (Chapter 23). The diagonal entries $3$ and $1$ of $B$ are exactly those eigenvalues. So this whole "re-grid to simplify" picture is a preview of the deepest machinery in the book: the eigenbasis is the coordinate system in which a transformation is diagonal. We will not need that vocabulary again until Part V — but you have now seen the phenomenon, and seen it through the same visualizer that has shown you shears, rotations, and collapses.
16.4.1 A cautionary re-grid: not every basis simplifies
The previous example might leave the impression that changing basis always simplifies a matrix. It does not. A change of basis is an honest re-description, and a bad choice of basis makes the matrix uglier, not cleaner. The simplification in Figure 16.2 happened only because we picked the one special basis aligned to the transformation. To inoculate yourself against the wrong lesson, watch a different transformation — the orthogonal projection onto the $x$-axis — re-gridded into the same skewed basis $\{(1,1),(-1,1)\}$.
In the standard basis, projection onto the $x$-axis has the transparent matrix $A = \begin{bmatrix}1&0\\0&0\end{bmatrix}$: it keeps the $x$-coordinate and zeroes the $y$-coordinate (Chapter 7). It is already diagonal in the standard basis — the standard basis is its eigenbasis. Now re-grid it through our diamond basis, which is the wrong basis for this transformation:
# Re-gridding a projection into the WRONG basis makes its matrix less obvious.
import numpy as np
from toolkit.visualizer import visualize_2d # SAME frozen visualizer, verbatim
A = np.array([[1., 0.], [0., 0.]]) # projection onto the x-axis
P = np.column_stack([np.array([1., 1.]), np.array([-1., 1.])])
B = np.linalg.inv(P) @ A @ P
print("B = P^-1 A P =\n", B) # [[ 0.5 -0.5] [-0.5 0.5]]
print("still a projection? B@B == B :", np.allclose(B @ B, B)) # True
ax = visualize_2d(B, title="A projection, re-gridded into a skewed basis")
# plt.show()
The clean diagonal $\begin{bmatrix}1&0\\0&0\end{bmatrix}$ becomes the cluttered $B = \begin{bmatrix}0.5&-0.5\\-0.5&0.5\end{bmatrix}$. Same transformation — it still squashes the plane onto a line, and the numpy check confirms it is still a projection ($B^2 = B$, the algebraic signature of "projecting twice is the same as projecting once"). But in the diamond coordinates the matrix has acquired off-diagonal cross-terms and looks nothing like a projection at a glance. The trace is still $1$ and the determinant still $0$ (a projection collapses area, so $\det = 0$ in every basis), as the invariants of §16.5.2 guarantee.
Common Pitfall — "Changing basis is a way to simplify a matrix, so the new matrix should always be cleaner." The projection example refutes this directly: re-gridding made the matrix worse. Only the right basis (the eigenbasis) simplifies a given transformation; a generic basis just produces a different, often messier, matrix that happens to share the invariants. The skill of Part V is not "change basis" — that is mechanical — but "change to the correct basis," which is the entire content of finding eigenvectors. A basis that diagonalizes $\begin{bmatrix}2&1\\1&2\end{bmatrix}$ is exactly the wrong basis for $\begin{bmatrix}1&0\\0&0\end{bmatrix}$, and vice versa: each transformation has its own preferred coordinate system.
16.5 How does the matrix of a transformation change under a change of basis?
We have changed the coordinates of vectors. Now we change the coordinates of transformations — and this is where the chapter earns its place in Part III, because the result, similarity, is the doorway to the entire eigenvalue theory ahead. We will derive the formula $B = P^{-1}AP$ slowly and geometrically, the way the book always does, so that it never feels like a memorized incantation.
Set the stage precisely. A transformation $T$ acts on the plane. In the old (standard) basis, $T$ is represented by the matrix $A$: feeding $T$ a vector with old coordinates $[\mathbf{x}]_{\text{old}}$ produces an output with old coordinates $$[T\mathbf{x}]_{\text{old}} = A\,[\mathbf{x}]_{\text{old}}.$$ (This is the defining property of "$A$ represents $T$ in the old basis," straight from Chapter 7.) We want the matrix $B$ that represents the same $T$ in the new basis — the matrix for which $$[T\mathbf{x}]_{\text{new}} = B\,[\mathbf{x}]_{\text{new}}$$ holds for every $\mathbf{x}$. The question is: what is $B$ in terms of $A$ and $P$?
The derivation is a three-step journey, and the geometry of each step is the punch line. We start with a vector's new coordinates and want its image's new coordinates. We have only one tool that knows how $T$ acts — the matrix $A$ — and that tool speaks old coordinates. So we must translate to old, act, and translate back:
Step 1 — Translate the input from new coordinates to old. We have $[\mathbf{x}]_{\text{new}}$ and need $[\mathbf{x}]_{\text{old}}$ so that $A$ can act on it. By the change-of-basis formula of §16.2, $[\mathbf{x}]_{\text{old}} = P\,[\mathbf{x}]_{\text{new}}$.
Step 2 — Apply the transformation in old coordinates. Now that the input is in old coordinates, $A$ does its job: $[T\mathbf{x}]_{\text{old}} = A\,[\mathbf{x}]_{\text{old}} = A\,P\,[\mathbf{x}]_{\text{new}}$.
Step 3 — Translate the output from old coordinates back to new. We have the output in old coordinates but want it in new coordinates. By §16.3, converting old $\to$ new means multiplying by $P^{-1}$: $[T\mathbf{x}]_{\text{new}} = P^{-1}\,[T\mathbf{x}]_{\text{old}} = P^{-1}AP\,[\mathbf{x}]_{\text{new}}$.
Chaining the three steps, $$[T\mathbf{x}]_{\text{new}} = \underbrace{P^{-1}}_{\text{back to new}}\;\underbrace{A}_{\text{act}}\;\underbrace{P}_{\text{to old}}\;[\mathbf{x}]_{\text{new}}.$$ Comparing with the definition $[T\mathbf{x}]_{\text{new}} = B\,[\mathbf{x}]_{\text{new}}$, we read off the matrix of $T$ in the new basis.
The matrix of a transformation under a change of basis (similarity). If $A$ represents a transformation $T$ in the old basis and $P$ is the change-of-basis matrix (new basis vectors as columns, in old coordinates), then the matrix of the same $T$ in the new basis is $$\boxed{\;B = P^{-1}AP\;}$$ Two matrices $A$ and $B$ related this way by an invertible $P$ are called similar, and they represent the same transformation in two different bases. The operation $A \mapsto P^{-1}AP$ is called conjugation by $P$.
The Key Insight — Read $B = P^{-1}AP$ as a recipe in three verbs: go to old coordinates ($P$), act ($A$), come back to new coordinates ($P^{-1}$). That is why the conjugation has $P$ on the right and $P^{-1}$ on the left — you read the product right-to-left, the order in which the steps happen (Chapter 8: matrix products compose right-to-left). The middle matrix $A$ is the real transformation; the outer $P$ and $P^{-1}$ are just the round-trip translation into and out of the coordinate system you want to view it in. Similar matrices are the same actor in different costumes.
16.5.1 Worked example: a 2×2 similarity, step by step
Let's run the derivation on the transformation from §16.4 and confirm it produces the diagonal $B$ we already saw. We have, in the standard basis,
$$A = \begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix}, \qquad P = \begin{bmatrix} 1 & -1 \\ 1 & 1 \end{bmatrix}, \qquad P^{-1} = \begin{bmatrix} 0.5 & 0.5 \\ -0.5 & 0.5 \end{bmatrix}.$$
Compute $B = P^{-1}AP$ in two matrix multiplications. First $AP$ (apply $A$ to each column of $P$, which is $A$ applied to each new basis vector — a meaningful intermediate):
$$AP = \begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix}\begin{bmatrix} 1 & -1 \\ 1 & 1 \end{bmatrix} = \begin{bmatrix} 2{\cdot}1 + 1{\cdot}1 & 2(-1) + 1{\cdot}1 \\ 1{\cdot}1 + 2{\cdot}1 & 1(-1) + 2{\cdot}1 \end{bmatrix} = \begin{bmatrix} 3 & -1 \\ 3 & 1 \end{bmatrix}.$$
Pause on those columns. The first column $(3, 3) = 3(1,1) = 3\mathbf{b}_1$: the transformation sends $\mathbf{b}_1$ to three times itself. The second column $(-1, 1) = 1\cdot(-1,1) = 1\mathbf{b}_2$: it sends $\mathbf{b}_2$ to itself. The new basis vectors are merely scaled, not rotated — the eigenvector behavior promised in §16.4. Now finish with $P^{-1}(AP)$:
$$B = P^{-1}AP = \begin{bmatrix} 0.5 & 0.5 \\ -0.5 & 0.5 \end{bmatrix}\begin{bmatrix} 3 & -1 \\ 3 & 1 \end{bmatrix} = \begin{bmatrix} 0.5{\cdot}3 + 0.5{\cdot}3 & 0.5(-1) + 0.5{\cdot}1 \\ -0.5{\cdot}3 + 0.5{\cdot}3 & -0.5(-1) + 0.5{\cdot}1 \end{bmatrix} = \begin{bmatrix} 3 & 0 \\ 0 & 1 \end{bmatrix}.$$
The diagonal matrix $\begin{bmatrix}3&0\\0&1\end{bmatrix}$, exactly as Figure 16.2 showed. The hand computation and the picture agree. Confirm with numpy:
# Similarity B = P^-1 A P by hand vs numpy; check trace and determinant are preserved.
import numpy as np
A = np.array([[2., 1.], [1., 2.]])
P = np.array([[1., -1.], [1., 1.]])
B = np.linalg.inv(P) @ A @ P
print("B = P^-1 A P =\n", B) # [[3. 0.] [0. 1.]]
print("trace: A =", np.trace(A), " B =", np.trace(B)) # 4.0 4.0
print("det: A =", round(np.linalg.det(A), 6),
" B =", round(np.linalg.det(B), 6)) # 3.0 3.0
The output is B = [[3. 0.] [0. 1.]], with trace A = 4.0, B = 4.0 and det A = 3.0, B = 3.0. Two facts to register. The diagonal entries $3$ and $1$ sum to the trace $4$ and multiply to the determinant $3$ — and both quantities are identical for $A$ and $B$. That is not a coincidence; it is the next big idea.
16.5.2 What is preserved? Trace, determinant, and the meaning of "invariant"
If $A$ and $B$ are the same transformation in different costumes, then any genuine, coordinate-free property of the transformation must come out the same whether you compute it from $A$ or from $B$. Such properties are called invariants of the matrix under similarity. The two you already know are the determinant and the trace.
The determinant is invariant because $\det$ is multiplicative (Chapter 11): for similar matrices, $$\det(B) = \det(P^{-1}AP) = \det(P^{-1})\det(A)\det(P) = \frac{1}{\det P}\det(A)\,\det(P) = \det(A).$$ The $\det(P)$ and $\det(P^{-1}) = 1/\det(P)$ cancel exactly. Geometrically this is obvious in hindsight: the determinant is the area-scaling factor of the transformation (Chapter 11), and how much a transformation scales area cannot depend on the grid you draw — area is area. Figures 16.1 and 16.2 both triple area, and indeed both matrices have determinant $3$.
The trace is invariant too. This rests on a small but useful fact, the cyclic property of the trace: $\operatorname{tr}(XY) = \operatorname{tr}(YX)$ for any conformable $X, Y$ (a one-line index computation; see the exercises). Grouping $B = (P^{-1})(AP)$ and cycling, $$\operatorname{tr}(B) = \operatorname{tr}(P^{-1}AP) = \operatorname{tr}\big((AP)P^{-1}\big) = \operatorname{tr}(A\,PP^{-1}) = \operatorname{tr}(A\,I) = \operatorname{tr}(A).$$ So the trace, like the determinant, is a property of the transformation, not of any one matrix that represents it.
The Key Insight — Similar matrices share their determinant, trace, rank, characteristic polynomial, and eigenvalues — because all of these are properties of the underlying transformation, not of the coordinate system. When you meet eigenvalues in Chapter 23, you will see they are the deepest similarity invariant of all: they are the transformation's intrinsic stretch factors, visible in every basis, and they read straight off the diagonal once you find the basis (the eigenbasis) that diagonalizes the matrix. Trace and determinant are the easy, early invariants — the sum and product of those eigenvalues. Anything you can compute that comes out basis-independent is telling you something true about the transformation itself.
Common Pitfall — "$B = P^{-1}AP$ means $B$ and $A$ are equal, just rearranged." No — similar is much weaker than equal. Similar matrices are generally different matrices with different entries (here $\begin{bmatrix}2&1\\1&2\end{bmatrix}$ versus $\begin{bmatrix}3&0\\0&1\end{bmatrix}$); they merely agree on every coordinate-free invariant. Equal matrices are similar (take $P = I$), but the converse fails badly. Conversely, two matrices with the same trace and determinant are not automatically similar — those two invariants are necessary, not sufficient (the exercises give a counterexample). Similarity is an honest equivalence relation that is finer than "share trace and det."
16.5.3 When the new basis is NOT special: similarity without diagonalization
The diagonal $B$ in our example was a happy accident of choosing the eigenbasis. To drive home that similarity is a general operation — and that $B$ is usually not diagonal — let's conjugate the same $A$ by a different, unremarkable basis $\mathcal{Q}$ with vectors $(2,1)$ and $(1,0)$:
# Conjugate the same A by a NON-eigen basis: B is similar to A but not diagonal.
import numpy as np
A = np.array([[2., 1.], [1., 2.]])
Q = np.column_stack([np.array([2., 1.]), np.array([1., 0.])]) # an arbitrary basis
B = np.linalg.inv(Q) @ A @ Q
print("B = Q^-1 A Q =\n", B) # [[ 4. 1.] [-3. 0.]]
print("trace =", np.trace(B), " det =", round(np.linalg.det(B), 6)) # 4.0 3.0
This prints $B = \begin{bmatrix}4&1\\-3&0\end{bmatrix}$ — emphatically not diagonal, and not symmetric, even though $A$ was symmetric. Yet its trace is still $4$ and its determinant still $3$: it is genuinely similar to $A$, representing the same transformation in the basis $\mathcal{Q}$.
Check Your Understanding — Suppose $A$ is a $2\times 2$ matrix with $\det(A) = 6$ and $\operatorname{tr}(A) = 5$, and $B = P^{-1}AP$ for some invertible $P$ you are not told. Without any further computation, what are $\det(B)$ and $\operatorname{tr}(B)$?
Answer
$\det(B) = 6$ and $\operatorname{tr}(B) = 5$ — identical to $A$, no matter what $P$ is. Determinant and trace are similarity invariants (§16.5.2), so they are unchanged by any change of basis. You do not need to know $P$, or even compute $B$, to know these two numbers: they are properties of the underlying transformation, fixed across every coordinate system. (As a bonus, the eigenvalues are the roots of $\lambda^2 - 5\lambda + 6 = (\lambda-2)(\lambda-3)$, namely $2$ and $3$ — also invariant, as Chapter 23 will confirm.)
The lesson is that every invertible $P$ gives a similar matrix, but only special choices of $P$ (the eigenbasis) give the simplest, diagonal form. Most coordinate changes just trade one complicated matrix for another equally complicated one that happens to share the invariants. The art of Part V is choosing the basis well.
At the opposite extreme from "messier" sits a striking case: some transformations look the same in a whole family of bases. Take the $90°$ rotation $R = \begin{bmatrix}0&-1\\1&0\end{bmatrix}$ and conjugate it by our diamond basis:
# Some transformations are unchanged by certain basis changes.
import numpy as np
R = np.array([[0., -1.], [1., 0.]]) # 90-degree rotation
P = np.column_stack([np.array([1., 1.]), np.array([-1., 1.])])
print("R in the skewed basis =\n", np.linalg.inv(P) @ R @ P) # [[0. -1.] [1. 0.]]
The output is the same matrix $\begin{bmatrix}0&-1\\1&0\end{bmatrix}$. The skewed basis $\{(1,1),(-1,1)\}$ is itself a rotated-and-scaled version of the standard basis, and rotation commutes with that change, so $R$ is unmoved. (Geometrically: a $90°$ rotation looks like a $90°$ rotation no matter how you tilt your square grid, as long as the grid stays "square enough" — here the two basis vectors are perpendicular and equal-length, a mere rotation-and-scaling of the standard frame.) A real rotation has no real eigenbasis that diagonalizes it — it rotates every real direction, so no real direction is merely scaled — which is the geometric reason its eigenvalues turn out complex (Chapter 26). Change of basis cannot diagonalize what has no eigenvectors to align to.
16.5.4 What stays the same, and what does not, under similarity?
It is worth collecting, in one place, the FAQ that students keep asking: given a transformation, which numbers depend on the basis and which do not? The answer is the cleanest possible test of whether you have internalized recurring theme #1. A quantity is basis-independent (an invariant) precisely when it is a property of the transformation; it is basis-dependent when it is an artifact of the coordinate description.
Basis-independent (same for all similar matrices): the trace (§16.5.2), the determinant (§16.5.2), the rank (the dimension of the image — see Chapters 13–14 — which is a property of the map, not the grid), the characteristic polynomial (Chapter 24), and the eigenvalues (Chapter 23). Each of these answers a question about what the transformation does — how it scales area, how many dimensions survive, which directions it merely stretches — and those answers cannot depend on the graph paper you measured them on.
Basis-dependent (can differ between similar matrices): the individual entries $b_{ij}$ (obviously — that is the whole point of this chapter), whether the matrix is diagonal, whether it is symmetric (our symmetric $A = \begin{bmatrix}2&1\\1&2\end{bmatrix}$ became the non-symmetric $\begin{bmatrix}4&1\\-3&0\end{bmatrix}$ in §16.5.3), whether it is upper-triangular, and the eigenvectors as coordinate vectors (the eigen-directions are basis-independent geometric objects, but their coordinate descriptions change with the basis, exactly like any other vector). The deep payoff: the search for a good basis is the search to push as much structure as possible from the "basis-dependent" column into a recognizable form (diagonal, in the best case), while the "basis-independent" column tells you the unchangeable truth about the transformation that no choice of basis can hide or alter.
Warning
— Equal trace and determinant are necessary for two matrices to be similar, but not sufficient — do not use them as a similarity test. The identity $I = \begin{bmatrix}1&0\\0&1\end{bmatrix}$ and the shear $J = \begin{bmatrix}1&1\\0&1\end{bmatrix}$ both have trace $2$ and determinant $1$, yet they are not similar: for any invertible $P$, $P^{-1}IP = P^{-1}P = I \neq J$, so the only matrix similar to $I$ is $I$ itself. The shear genuinely shears space while the identity does nothing — different transformations, forced apart despite identical trace and determinant. A complete invariant that does determine similarity exists — the Jordan normal form (Chapter 36) — but trace and determinant alone are only a partial fingerprint. Two matrices that disagree on either one are certainly not similar; agreeing on both is merely a hint.
16.6 How does change of basis work in three dimensions?
Two dimensions make pictures easy, but the formulas are dimension-blind. Let's confirm everything in $\mathbb{R}^3$ with a clean coordinate conversion. Take the new basis
$$\mathbf{b}_1 = \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix}, \quad \mathbf{b}_2 = \begin{bmatrix} 1 \\ 1 \\ 0 \end{bmatrix}, \quad \mathbf{b}_3 = \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix},$$
a "staircase" basis (each vector adds one more $1$). These are independent — the matrix with them as columns is upper-triangular with nonzero diagonal, so its determinant is $1 \neq 0$ (Chapter 11) — hence a genuine basis of $\mathbb{R}^3$. Stack them as columns to get the change-of-basis matrix and invert it:
$$P = \begin{bmatrix} 1 & 1 & 1 \\ 0 & 1 & 1 \\ 0 & 0 & 1 \end{bmatrix}, \qquad P^{-1} = \begin{bmatrix} 1 & -1 & 0 \\ 0 & 1 & -1 \\ 0 & 0 & 1 \end{bmatrix}.$$
You can verify $P^{-1}$ by hand with the Gauss–Jordan method of Chapter 9, or just check $PP^{-1} = I$ by inspection (the inverse of a "cumulative-sum" matrix is the "difference" matrix — a fact worth remembering). Now convert the standard vector $\mathbf{v} = (3, 5, 2)$ to the new basis:
$$[\mathbf{v}]_{\text{new}} = P^{-1}\begin{bmatrix} 3 \\ 5 \\ 2 \end{bmatrix} = \begin{bmatrix} 1 & -1 & 0 \\ 0 & 1 & -1 \\ 0 & 0 & 1 \end{bmatrix}\begin{bmatrix} 3 \\ 5 \\ 2 \end{bmatrix} = \begin{bmatrix} 3 - 5 \\ 5 - 2 \\ 2 \end{bmatrix} = \begin{bmatrix} -2 \\ 3 \\ 2 \end{bmatrix}.$$
So $\mathbf{v} = -2\mathbf{b}_1 + 3\mathbf{b}_2 + 2\mathbf{b}_3$, which you can check directly: $-2(1,0,0) + 3(1,1,0) + 2(1,1,1) = (-2+3+2,\ 3+2,\ 2) = (3, 5, 2)$. Confirm with numpy, round trip included:
# A 3x3 change of basis with the staircase basis; verify the round trip.
import numpy as np
B = np.array([[1., 1., 1.],
[0., 1., 1.],
[0., 0., 1.]]) # columns are b1, b2, b3 (already stacked)
P = B
v_old = np.array([3., 5., 2.])
v_new = np.linalg.inv(P) @ v_old # OLD -> NEW
print("[v]_new =", v_new) # [-2. 3. 2.]
print("round trip =", P @ v_new, # [3. 5. 2.]
" ok =", np.allclose(P @ v_new, v_old)) # True
The output is [v]_new = [-2. 3. 2.] and the round trip returns [3. 5. 2.] with ok = True. Everything that worked in $\mathbb{R}^2$ works verbatim in $\mathbb{R}^3$ — and in $\mathbb{R}^n$. The change-of-basis matrix is $n \times n$, its columns are the $n$ new basis vectors in old coordinates, and $P^{-1}$ converts old coordinates to new. Dimension changes the size of the matrices, never the ideas.
Check Your Understanding — For the staircase basis above, what are the new coordinates of the standard basis vector $\mathbf{e}_2 = (0,1,0)$?
Answer
$[\mathbf{e}_2]_{\text{new}} = P^{-1}(0,1,0) = (-1, 1, 0)$ (the second column of $P^{-1}$). Check: $-1\mathbf{b}_1 + 1\mathbf{b}_2 + 0\mathbf{b}_3 = -(1,0,0) + (1,1,0) = (0,1,0) = \mathbf{e}_2$. The columns of $P^{-1}$ are exactly the new-basis coordinates of the old basis vectors — a tidy way to read off a whole change of basis at once.
16.7 Why is this the same transformation? (the deep idea, made rigorous)
We keep asserting that $A$ and $B = P^{-1}AP$ "are the same transformation." This section makes that claim precise, because it is the conceptual payoff of the chapter and the foundation of everything in Part V. The claim has a clean formal statement and a clean proof.
1. Why we care. Recurring theme #1 of this book is that linear algebra is the study of linear transformations, and a matrix is merely how we represent a transformation once we fix a basis. This section is where that slogan becomes a theorem. If we can prove that $A$ and $B$ act identically on the underlying transformation — that they are two faithful descriptions of one map — then we have license to choose the basis that makes the matrix simplest, knowing the transformation is untouched. That license is the engine of diagonalization, the SVD, and most of applied linear algebra.
2. Key idea. A transformation $T$ is defined by what it does to vectors, independent of coordinates. Two matrices represent the same $T$ exactly when, after you account for the coordinate translation $P$ between the bases, they agree on every input. The diagram below commutes, and "commutes" is the whole content.
3. Proof. We claim: $B = P^{-1}AP$ if and only if $A$ and $B$ represent the same linear transformation $T$ in the old and new bases respectively. Suppose $A$ represents $T$ in the old basis, so $[T\mathbf{x}]_{\text{old}} = A[\mathbf{x}]_{\text{old}}$ for all $\mathbf{x}$, and let $B$ be defined by $B = P^{-1}AP$. We show $B$ represents $T$ in the new basis, i.e. $[T\mathbf{x}]_{\text{new}} = B[\mathbf{x}]_{\text{new}}$ for all $\mathbf{x}$. Start from the right side and use the three translation facts (§16.2–16.3): $$B[\mathbf{x}]_{\text{new}} = P^{-1}AP\,[\mathbf{x}]_{\text{new}} = P^{-1}A\,[\mathbf{x}]_{\text{old}} = P^{-1}\,[T\mathbf{x}]_{\text{old}} = [T\mathbf{x}]_{\text{new}}.$$ Each equality is one earned step: $P[\mathbf{x}]_{\text{new}} = [\mathbf{x}]_{\text{old}}$ (new$\to$old), then $A[\mathbf{x}]_{\text{old}} = [T\mathbf{x}]_{\text{old}}$ ($A$ represents $T$), then $P^{-1}[T\mathbf{x}]_{\text{old}} = [T\mathbf{x}]_{\text{new}}$ (old$\to$new). So $B$ does represent $T$ in the new basis. The converse runs the same equalities backwards: if both $A$ and $B$ represent $T$ in their respective bases, then $B[\mathbf{x}]_{\text{new}} = [T\mathbf{x}]_{\text{new}} = P^{-1}[T\mathbf{x}]_{\text{old}} = P^{-1}A[\mathbf{x}]_{\text{old}} = P^{-1}AP[\mathbf{x}]_{\text{new}}$ for all $[\mathbf{x}]_{\text{new}}$, forcing $B = P^{-1}AP$. $\quad\blacksquare$
4. What this means. The formula $B = P^{-1}AP$ is not a definition pulled from a hat; it is forced by the requirement that $A$ and $B$ describe the same transformation. Geometrically, the proof says: to find what the transformation does in new coordinates, you may always detour through old coordinates (apply $P$), do the work there with $A$, and translate the answer back ($P^{-1}$). The detour is invisible from outside — the transformation $T$ never knew which basis you used to compute with it. This is the precise sense in which the matrix is the shadow and the transformation is the object.
Geometric Intuition — Think of $B = P^{-1}AP$ as a commuting diagram you can walk two ways. From the new-coordinate input, you can go straight across (apply $B$) to the new-coordinate output. Or you can go down ($P$, into old coordinates), across ($A$, the transformation), and up ($P^{-1}$, back to new coordinates). The theorem says both routes land on the same point — always. That is what it means for $B$ to be "$A$ seen from the new basis": the two paths around the square of arrows agree.
Let's see both routes land together on a concrete input. Take the vector whose new-basis coordinates are $[\mathbf{x}]_{\text{new}} = (1, 2)$, and use the diagonal $B = \begin{bmatrix}3&0\\0&1\end{bmatrix}$ and $A = \begin{bmatrix}2&1\\1&2\end{bmatrix}$ from before:
# The commuting diagram: route 'B directly' vs route 'P, then A, then P^-1' agree.
import numpy as np
A = np.array([[2., 1.], [1., 2.]])
P = np.column_stack([np.array([1., 1.]), np.array([-1., 1.])])
B = np.linalg.inv(P) @ A @ P # [[3. 0.] [0. 1.]]
x_new = np.array([1., 2.])
direct = B @ x_new # straight across
detour = np.linalg.inv(P) @ (A @ (P @ x_new)) # down, across, up
print("direct B @ x_new =", direct) # [3. 2.]
print("detour P^-1 A P x_new =", detour) # [3. 2.]
print("agree:", np.allclose(direct, detour)) # True
Both print [3. 2.]. Tracing the detour by hand makes the geometry vivid: the input $(1,2)$ in new coordinates is the physical arrow $P(1,2) = (-1, 3)$ in old coordinates; the transformation $A$ sends $(-1,3)$ to $(1, 5)$; and converting $(1,5)$ back to new coordinates via $P^{-1}$ gives $(3, 2)$ — the same answer $B$ produced in one step. The diagram commutes, exactly as the proof promised, and you have now watched a single vector make the round trip through the underlying transformation and come out matching the new-basis matrix. This is what "$B$ is $A$ in the new basis" means, made arithmetic.
Math-Major Sidebar — Similarity is an equivalence relation on the set of $n\times n$ matrices: it is reflexive ($A = I^{-1}AI$), symmetric (if $B = P^{-1}AP$ then $A = (P^{-1})^{-1}B(P^{-1})$, so the roles swap), and transitive (conjugating by $P$ then $Q$ is conjugating by the product $PQ$, since $Q^{-1}(P^{-1}AP)Q = (PQ)^{-1}A(PQ)$). The equivalence classes are exactly the coordinate-free linear operators: each class is one transformation, and its members are all the matrices that represent it in some basis. A central project of the theory is to find a canonical representative of each class — the simplest matrix in it. For matrices with $n$ independent eigenvectors that representative is diagonal (Chapter 25); when no such eigenbasis exists, the cleanest possible representative is the Jordan normal form (Chapter 36), the "almost diagonal" matrix with eigenvalues on the diagonal and a few $1$'s just above it. The entire arc from here to Chapter 36 is the search for canonical forms under similarity. The closely related notion where one allows two different bases (one for the input space, one for the output) and the relation becomes $B = Q^{-1}AP$ is matrix equivalence; its canonical form is the rank-revealing $\begin{bmatrix} I_r & 0 \\ 0 & 0\end{bmatrix}$, and the SVD of Chapter 30 is its orthonormal refinement.
16.8 Where does change of basis show up in the real world?
Change of basis is not a bookkeeping nicety — it is the conceptual core of an enormous amount of applied mathematics, precisely because the right basis makes a hard problem easy. A representative tour, deliberately spanning fields beyond physics:
Real-World Application — In data science, Principal Component Analysis is exactly a change of basis. Your data lives in a standard coordinate system (say, height-and-weight, or thousands of pixel intensities) where the coordinates are correlated and redundant. PCA finds a new orthogonal basis — the principal components — in which the covariance matrix becomes diagonal, just like our $B = \begin{bmatrix}3&0\\0&1\end{bmatrix}$. In that basis the coordinates are uncorrelated and ordered by how much variance they capture, so you can throw away the small ones and compress the data with minimal loss. The change-of-basis matrix $P$ is built from the eigenvectors of the covariance matrix; Chapter 32 makes this precise, and Case Study 1 of this chapter previews it concretely. The slogan "find the basis that diagonalizes the problem" is the whole of PCA.
The same idea recurs across the applied landscape. In computer graphics and robotics, every object carries its own local coordinate frame, and rendering or controlling a scene is a relentless sequence of basis changes — converting a point from a robot's gripper frame to its base frame to the world frame, each step a change-of-basis matrix (Case Study 2 works a rotated frame in detail). In signal processing, the Fourier transform is a change of basis from the standard "one coordinate per time sample" basis to a basis of pure sinusoids (Chapter 22); a signal that looks like noise in the time basis is often a handful of sharp spikes in the frequency basis, which is why audio compression and noise removal are done after the change of basis. In differential equations, changing to the eigenbasis decouples a system $\mathbf{x}' = A\mathbf{x}$ into independent scalar equations (Chapter 37). And in quantum mechanics, the representations in quantum mechanics — the position basis and the momentum basis — are two coordinate systems for the same quantum state, related by a change of basis (a Fourier transform again); a physicist computes whichever is easier and translates back, exactly as our similarity derivation translated through old coordinates.
There is even a sense in which change of basis underlies machine learning embeddings. When a model maps words, images, or users into a vector space (Chapter 33), it is learning a basis — choosing the coordinate directions in which the data's structure becomes linearly accessible. A recommender system that factors a user–item matrix is finding a small set of latent coordinate axes ("taste dimensions") and re-expressing every user and item in that learned basis; the entire trick is that similarity, which was tangled in the raw coordinates, becomes a simple dot product in the learned one. The recurring lesson is the same in every field: the data or the operator carries an intrinsic structure, and the work is finding the coordinate system that exposes it.
Geometric Intuition — The unifying picture behind all of these is the one from §16.4: a transformation or a dataset that looks complicated is often simple in disguise, and the disguise is the wrong basis. PCA, Fourier analysis, normal modes of a vibrating system, principal axes of an ellipse, decoupling a system of ODEs, learned embeddings — every one of them is the act of rotating your coordinate axes until the cross-terms vanish and the underlying simplicity shows through. Change of basis is the mathematics of choosing a good point of view, and a startling amount of applied work is, at bottom, exactly that choice. The standard basis is rarely the one nature or data prefers; the standard basis is just where the numbers first arrived.
16.9 How do we build a change of basis from scratch?
By hand, a change of basis costs one matrix inversion ($P^{-1}$) and some matrix multiplications — operations the toolkit already knows from Chapters 9 (inverse) and 8 (matmul). It is time to package the change-of-basis machinery into reusable code, in pure Python, and verify it against numpy exactly as we verified every hand computation above. The defining test of correctness is the round trip: convert old coordinates to new and back, and you must recover the original.
Build Your Toolkit — Add two functions to
toolkit/change_of_basis.py, both built on the toolkit's existinginverse(Chapter 9) andmatmul/apply(Chapters 7–8):
change_basis_matrix(old_basis, new_basis)— given the old and new bases each as a list of column vectors in standard coordinates, return the matrix $P$ that converts new coordinates to old. Whenold_basisis the standard basis, $P$ is just the matrix whose columns arenew_basis; in general, build $P_{\text{old}}$ and $P_{\text{new}}$ (each with its basis vectors as columns) and return $P_{\text{old}}^{-1} \cdot P_{\text{new}}$. (Hint: the columns of the result are each new basis vector expressed in old coordinates — §16.2.)to_new_coords(P, v_old)— given the change-of-basis matrixPand a coordinate vectorv_oldin the old basis, return the coordinates in the new basis as $P^{-1}\,\mathbf{v}_{\text{old}}$ (apply the from-scratchinverse, thenapply). Provide the partnerto_old_coords(P, v_new)returning $P\,\mathbf{v}_{\text{new}}$.Verify (numpy only for checking, never inside the implementation): build $P$ for the new basis $\{(1,1),(-1,1)\}$ over the standard old basis and confirm
np.allclose(P, [[1,-1],[1,1]]); checkto_new_coords(P, [4,2])returns[3,-1]matching §16.3; and — the essential test — confirm the round tripto_old_coords(P, to_new_coords(P, v))returns the originalvfor several randomv(np.allclose(round_trip, v)). A correct change of basis is precisely one whose round trip is the identity, because $PP^{-1} = I$.
A sketch of the converter, to show how cleanly it reads (the full module is assembled separately; implement it yourself before peeking):
# Sketch: coordinate converter. Reuses toolkit inverse (Ch.9) and apply (Ch.7).
def to_new_coords(P, v_old):
P_inv = inverse(P) # from-scratch Gauss-Jordan inverse (Chapter 9)
return apply(P_inv, v_old) # [v]_new = P^-1 [v]_old (the boxed formula, §16.3)
def to_old_coords(P, v_new):
return apply(P, v_new) # [v]_old = P [v]_new (P maps new -> old, §16.2)
# Round-trip check (the definition of correctness):
# to_old_coords(P, to_new_coords(P, v)) == v for every v, since P @ P^-1 = I.
The implementation never imports numpy; we use numpy only in the tests to confirm the round trip and the worked numbers. That is the toolkit's contract throughout the book (Chapter 4): build it from scratch, check it against the library. This chapter's contribution, the change-of-basis matrix and the coordinate converter, will be reused implicitly every time a later chapter re-expresses a transformation in a better basis — most heavily when the eigen-machinery of Chapter 23 hands us the basis that diagonalizes.
Historical Note — The idea that geometric objects have an existence independent of coordinates — that a transformation is more fundamental than any matrix representing it — crystallized in the late 19th and early 20th centuries. Felix Klein's Erlangen Program (1872) reframed geometry around the transformations and invariants preserved under a group of coordinate changes, and the resulting "coordinate-free" viewpoint became the modern standard [verify]. The notion of similar matrices and the search for canonical forms under similarity grew from the work of Camille Jordan (the Jordan normal form, 1870) and Karl Weierstrass on the classification of bilinear and quadratic forms [verify]. The word "similar" for matrices related by $B = P^{-1}AP$ reflects exactly the intuition of this chapter: they are not equal, but they are the same shape — the same transformation seen from different angles.
16.10 What should you carry forward from this chapter?
We learned to switch coordinate systems for both vectors and transformations, and in doing so we made recurring theme #1 of this book fully operational. For vectors: the change-of-basis matrix $P$ has the new basis vectors as its columns (in old coordinates), it converts new coordinates to old via $[\mathbf{v}]_{\text{old}} = P[\mathbf{v}]_{\text{new}}$, and its inverse converts the other way, $[\mathbf{v}]_{\text{new}} = P^{-1}[\mathbf{v}]_{\text{old}}$ — the central formula, confirmed every time by the round trip returning the original. For transformations: the matrix of a transformation in a new basis is the similarity $B = P^{-1}AP$ — "go to old coordinates, act, come back" — and similar matrices share every coordinate-free invariant: trace, determinant, rank, and (soon) eigenvalues. The single image to keep is the re-gridded visualizer of §16.4: the same stretch of the plane, dense matrix $\begin{bmatrix}2&1\\1&2\end{bmatrix}$ on the standard grid, clean diagonal $\begin{bmatrix}3&0\\0&1\end{bmatrix}$ on the aligned grid. The transformation never moved; only its representation did.
That last picture is a direct preview of the most powerful idea ahead. Chapter 23 will give a name — eigenvectors — to the special directions $(1,1)$ and $(-1,1)$ that the transformation merely scaled, and eigenvalues $3$ and $1$ to the scaling factors that appeared on the diagonal of $B$. Chapter 25 will then make the whole maneuver systematic: diagonalization is precisely the act of choosing the eigenbasis as your new basis so that $B = P^{-1}AP$ comes out diagonal, $A = PDP^{-1}$, which makes powers, exponentials, and the long-run behavior of a transformation trivial to compute. Everything in this chapter — $P$ with basis vectors as columns, the conjugation $P^{-1}AP$, the invariance of trace and determinant — is the grammar you will speak fluently throughout Part V. When the spectral theorem (Chapter 27) and the SVD (Chapter 30) arrive, they will be statements about which change of basis is available and how clean it can be made.
This chapter advanced two of the book's recurring themes in a way worth naming. Linear algebra is the study of linear transformations, and a matrix is just a representation in a chosen coordinate system — that slogan stopped being a slogan in §16.7, where we proved that $A$ and $P^{-1}AP$ are the same transformation, and so earned the right to choose whatever basis makes the matrix simplest. And geometry and algebra are two views of one object: the algebraic operation $A \mapsto P^{-1}AP$ is the geometric act of re-drawing the grid, and the invariance of the determinant is just the geometric truth that area-scaling cannot depend on your choice of graph paper. Keep the diamond grid and the re-gridded square in your mind's eye, change a few bases by hand until $P^{-1}AP$ is automatic, and you will be ready to meet the eigenbasis — the best coordinate system of all — in Part V.