53 min read

> Learning paths. Math majors — read everything, especially the proof that linearity is equivalent to being representable by a matrix (§7.8) and the Math-Major Sidebars on bases and on transformations between spaces of different dimension. CS / Data...

Prerequisites

  • chapter-02-vectors
  • chapter-06-subspaces-span-independence

Learning Objectives

  • Explain why a matrix IS a linear transformation, and state the two-rule definition of linearity precisely.
  • Read the columns of a matrix as the images of the standard basis vectors, and reconstruct the matrix of a transformation by asking where e1 and e2 go.
  • Compute a matrix-vector product as the weighted sum of the columns, and justify that formula from linearity rather than memorizing a row-times-column rule.
  • Build the matrix of a rotation, scaling, shear, projection, and reflection from scratch, and derive the standard 2x2 rotation matrix.
  • Use the recurring 2D visualizer to predict and confirm what a given 2x2 matrix does to the unit square and the basis vectors.
  • Implement apply(A, v) and transpose(A) from scratch in toolkit/matrices.py and verify them against numpy.

Matrices as Functions: What a Matrix DOES to Space

Learning paths. Math majors — read everything, especially the proof that linearity is equivalent to being representable by a matrix (§7.8) and the Math-Major Sidebars on bases and on transformations between spaces of different dimension. CS / Data Science — focus on the Geometric Intuition callouts, the visualizer experiments, the numpy snippets, and the applications; the proofs build intuition but the sidebars are optional. Physics / Engineering — focus on the geometry of each transformation, the derivation of the rotation matrix, and the field-rotation application; keep the picture of the moving unit square in your head. This is the chapter where Chapter 1's slogan — a matrix is a function that transforms space — finally becomes something you can compute with.

A note on this chapter (§9). Everything in Part II grows from one reframing: a matrix is a verb, not a noun. Chapter 1 previewed it as a slogan and Chapter 6 gave us the language of span and independence; this chapter makes the slogan rigorous and computational. We will not present the matrix-vector product as a row-times-column rule to memorize — that framing belongs in Chapter 8, where it falls out of composition. Here the product is the weighted sum of the columns, derived honestly from linearity, because that is the version that shows you what the matrix is doing.

7.1 What does a matrix actually do to a vector?

Open almost any first linear-algebra course and you will meet a matrix as a rectangle of numbers, accompanied by a rule for multiplying it against a column of numbers: "march along the row, down the column, multiply pairwise, add them up." You memorize the rule, you pass the quiz, and you never find out why. The rule seems to fall from the sky. Worse, it makes a matrix feel like a passive container — a spreadsheet — when in fact a matrix is one of the most active objects in mathematics.

So let us ask the question this whole chapter answers, the question that is also our primary search phrase because so many people type it into a search bar in frustration: what does a matrix do to a vector? The honest one-sentence answer is the thesis of this book: a matrix transforms the vector. You hand the matrix a vector — an arrow in space — and the matrix hands you back a different arrow, the original one moved: rotated, stretched, skewed, flattened, or flipped. A matrix is a machine that eats vectors and outputs transformed vectors. The numbers inside the matrix are simply the settings of that machine.

That reframing — a matrix as a linear transformation rather than a grid of numbers — is the single most valuable idea you will take from Part II, and probably from the first half of this book. Everything downstream depends on it. The determinant (Chapter 11) will measure how much the transformation stretches area. Eigenvectors (Chapter 23) will be the special arrows the transformation does not knock off their own line. The SVD (Chapter 30) will reveal that every matrix, no matter how complicated, is secretly just a rotation, then a stretch, then another rotation. None of that makes sense if a matrix is a spreadsheet. All of it is obvious if a matrix is a motion of space.

The Key Insight — A matrix is a linear transformation written down in coordinates. Multiplying a vector by a matrix applies that transformation to the vector. The matrix is the noun we use to record a verb; the verb — "rotate," "scale," "project" — is the real thing.

Here is our plan. First we pin down exactly which transformations matrices can represent (the linear ones, defined precisely). Then we discover the most important fact in the chapter: a linear transformation is completely determined by where it sends the basis vectors, and a matrix is nothing but the record of those landing spots, stored as its columns. From there the matrix-vector product writes itself — it has to be the weighted sum of the columns, with no other choice possible. Finally we run the recurring 2D visualizer on the whole zoo of transformations — identity, scaling, rotation, shear, reflection, projection — building each matrix from the same question every time: where do $\mathbf{e}_1$ and $\mathbf{e}_2$ go? Along the way we derive the famous rotation matrix from scratch, and you'll never have to memorize it again.

Let's begin, as always, with a picture.

FAQ: Is a matrix the same thing as a transformation?

Not quite, and the distinction is worth getting straight early, because it is one of the recurring themes of this book. A linear transformation is the geometric act — rotate the plane by 30°, say. A matrix is a specific way of writing that act down once you have chosen a coordinate system (a basis). Choose a different coordinate system and the same rotation gets recorded by a different matrix, even though nothing about the geometric motion changed. So a matrix is a representation of a transformation, the way a decimal numeral is a representation of a number. For now we work in the standard coordinate system, where the distinction is invisible and "matrix" and "transformation" can be used interchangeably; Chapter 16 (change of basis) is where the difference becomes the whole point.

7.2 Which transformations can a matrix represent? (Linearity, precisely)

Picture the flat plane as an infinite sheet of graph paper, with its grid of perfectly square cells. In Chapter 1 we described the transformations linear algebra cares about as exactly those you can do to the sheet while keeping the grid lines straight, keeping them evenly spaced, and keeping the origin pinned in place. Stretching, rotating, and skewing all pass. Crumpling, tearing, bending into a wave, or sliding the whole sheet sideways (which moves the origin) all fail.

That visual test has a precise algebraic twin. Recall from Chapter 1 the definition: a transformation $T$ — a function that takes a vector $\mathbf{v}$ and returns a transformed vector $T(\mathbf{v})$ — is linear when it obeys two rules.

Rule 1 (additivity). For all vectors $\mathbf{u}$ and $\mathbf{v}$, $$T(\mathbf{u} + \mathbf{v}) = T(\mathbf{u}) + T(\mathbf{v}).$$

Rule 2 (homogeneity). For every vector $\mathbf{v}$ and every scalar $c$, $$T(c\,\mathbf{v}) = c\,T(\mathbf{v}).$$

Bundled together, the two rules become the superposition principle: for all scalars $c, d$ and vectors $\mathbf{u}, \mathbf{v}$, $$T(c\,\mathbf{u} + d\,\mathbf{v}) = c\,T(\mathbf{u}) + d\,T(\mathbf{v}).$$ Break the input into scaled pieces, transform each piece, scale and add the results — you get the right answer. This is the property that makes linear systems the ones we can actually solve, and it is the engine of everything in this chapter.

Geometric Intuition — Linearity is the algebra of "grid lines stay straight, parallel, and evenly spaced, and the origin stays put." Additivity is the statement that the diagonal of a parallelogram transforms to the diagonal of the transformed parallelogram. Homogeneity is the statement that a point twice as far out along a line lands twice as far out along the transformed line. Together they say: the transformation respects the grid.

Why does this matter for matrices? Because of a clean two-way street that we will spend the chapter unpacking and then prove in §7.8:

The Key Insight — A transformation $T:\mathbb{R}^n \to \mathbb{R}^m$ can be represented by a matrix if and only if it is linear. Matrices and linear transformations are the same objects seen two ways. Every matrix gives a linear transformation (multiply by it); every linear transformation, once you fix coordinates, is given by exactly one matrix.

That biconditional is the reason this chapter exists. Half of it — "every matrix is linear" — is a short computation we do in a moment. The other half — "every linear transformation is a matrix" — is the deep direction, and it forces the columns-as-images picture upon us.

Before we get there, one quick sanity check that pays for itself constantly. Set $c = 0$ in Rule 2: $T(\mathbf{0}) = T(0\cdot\mathbf{v}) = 0\cdot T(\mathbf{v}) = \mathbf{0}$. A linear transformation must fix the origin. So the instant a candidate transformation moves the origin, it cannot be linear, and no matrix can represent it on its own.

Common Pitfall"Translation is linear; it's the simplest motion there is." Sliding the plane by a fixed vector, $T(\mathbf{x}) = \mathbf{x} + \mathbf{b}$ with $\mathbf{b}\neq\mathbf{0}$, is not linear, because it moves the origin: $T(\mathbf{0}) = \mathbf{b} \neq \mathbf{0}$. It is affine (linear-plus-a-shift). No $2\times 2$ matrix can translate the plane. This is exactly why computer graphics adds a dimension — homogeneous coordinates, Chapter 12 — to smuggle translation back into the matrix framework. For all of Part II, "matrix transformation" means the origin stays put.

Check Your Understanding — Is the transformation $T(x, y) = (2x, x + y)$ linear? Quick test: does it fix the origin, and does it preserve sums?

Answer

Yes. It fixes the origin: $T(0,0) = (0,0)$. And it preserves sums and scalings: each output coordinate is a sum of scalar multiples of the inputs, with no constants and no products of inputs. Concretely, $T(x_1{+}x_2,\, y_1{+}y_2) = (2(x_1{+}x_2),\, (x_1{+}x_2)+(y_1{+}y_2)) = (2x_1, x_1{+}y_1) + (2x_2, x_2{+}y_2) = T(x_1,y_1)+T(x_2,y_2)$. By the end of this chapter you will read off its matrix in one glance: $\begin{bmatrix} 2 & 0 \\ 1 & 1\end{bmatrix}$. Contrast $T(x,y) = (x^2, y)$ or $T(x,y)=(x+1, y)$, which fail (a square, and a shifted origin, respectively).

7.3 Why do the columns of a matrix tell you where the basis vectors go?

This is the heart of the chapter, so we slow down. We are going to show that a linear transformation of the plane is completely determined by what it does to just two arrows — and that a matrix is precisely the device that records those two arrows.

Recall the standard basis vectors from Chapter 2: in $\mathbb{R}^2$, $$\mathbf{e}_1 = \begin{bmatrix} 1 \\ 0 \end{bmatrix} \quad\text{(one unit east)}, \qquad \mathbf{e}_2 = \begin{bmatrix} 0 \\ 1 \end{bmatrix} \quad\text{(one unit north)}.$$ The reason these two arrows are special is the fact we established in Chapter 6: every vector in $\mathbb{R}^2$ is a unique combination of them. The vector $\mathbf{v} = \begin{bmatrix} x \\ y \end{bmatrix}$ is exactly $$\mathbf{v} = x\,\mathbf{e}_1 + y\,\mathbf{e}_2,$$ which just says "to reach the point $(x,y)$, go $x$ units east and $y$ units north." The basis vectors are the building blocks; the coordinates $x, y$ are the recipe.

Now apply a linear transformation $T$ and watch superposition do all the work: $$T(\mathbf{v}) = T(x\,\mathbf{e}_1 + y\,\mathbf{e}_2) = x\,T(\mathbf{e}_1) + y\,T(\mathbf{e}_2).$$ Stare at that. The output $T(\mathbf{v})$ for any vector $\mathbf{v}$ is built from the same two arrows every time — $T(\mathbf{e}_1)$ and $T(\mathbf{e}_2)$ — combined with the same recipe $x, y$ that built $\mathbf{v}$ in the first place. We never had to know what $T$ does to all the infinitely many vectors of the plane. We only had to know where it sends east and where it sends north. Everything else is reconstruction by superposition.

Geometric Intuition — A linear transformation drags the whole grid along rigidly with its two basis arrows. Decide where $\mathbf{e}_1$ and $\mathbf{e}_2$ land, and you have decided where every point lands, because every point is just "$x$ steps along the new $\mathbf{e}_1$ plus $y$ steps along the new $\mathbf{e}_2$." Watching $\mathbf{e}_1$ and $\mathbf{e}_2$ move is watching the entire transformation.

So the two landing arrows $T(\mathbf{e}_1)$ and $T(\mathbf{e}_2)$ carry all the information about $T$. We need a place to store them. That place is a matrix:

The Key Insight — The matrix of a linear transformation $T$ is the table whose columns are the images of the standard basis vectors: $$A = \begin{bmatrix} \big| & \big| \\ T(\mathbf{e}_1) & T(\mathbf{e}_2) \\ \big| & \big| \end{bmatrix}.$$ The first column is where east goes; the second column is where north goes. To build the matrix of any transformation you know, you ask one question: where do $\mathbf{e}_1$ and $\mathbf{e}_2$ go? — and you write the answers down as columns. That single move generates every matrix in this chapter.

Let's make it concrete. Suppose $T$ stretches everything horizontally by a factor of $3$ and leaves the vertical direction alone. Where does east go? $\mathbf{e}_1 = (1,0)$ stretches to $(3, 0)$. Where does north go? $\mathbf{e}_2 = (0,1)$ is left alone, landing at $(0,1)$. Stack those as columns: $$A = \begin{bmatrix} 3 & 0 \\ 0 & 1 \end{bmatrix}.$$ You just built a matrix without multiplying anything — you only asked where the basis vectors go. This is the workflow for the entire chapter, and it is far more powerful than memorizing matrix forms, because it works for any transformation you can picture.

Common PitfallConfusing rows and columns. The columns hold the images of the basis vectors, not the rows. A frequent error is to write the image of $\mathbf{e}_1$ as the first row. If you do, you will get the transpose of the matrix you wanted — which, as §7.9 will show, is usually a different transformation. When in doubt, recompute the first column as $A\mathbf{e}_1$ and check it equals $T(\mathbf{e}_1)$. (And note for later: in code, the first column of a 2D numpy array A is A[:, 0], because numpy indexes from 0 while our math indexes from 1 — the first place that mismatch bites in this chapter.)

The same logic works in any dimension. A linear map $T:\mathbb{R}^n \to \mathbb{R}^m$ is determined by where it sends the $n$ standard basis vectors $\mathbf{e}_1, \dots, \mathbf{e}_n$ of $\mathbb{R}^n$; each image $T(\mathbf{e}_j)$ is a vector in $\mathbb{R}^m$, and stacking those $n$ images as columns produces an $m \times n$ matrix. The shape tells the story: $n$ columns (one per input direction), $m$ rows (the dimension of the output space). A transformation from 3D to 2D, for instance, is a $2\times 3$ matrix — three columns, each a point in the plane. We will mostly stay in 2D so we can see everything, but nothing about the columns-as-images principle is special to two dimensions.

Math-Major Sidebar (optional) — The precise statement is: if $\{\mathbf{e}_1,\dots,\mathbf{e}_n\}$ is a basis of $V$ and $T:V\to W$ is linear, then $T$ is uniquely determined by the list of images $T(\mathbf{e}_1),\dots,T(\mathbf{e}_n)$, and moreover those images may be chosen freely — any assignment of basis vectors to target vectors extends to exactly one linear map. Existence: define $T(\sum c_j \mathbf{e}_j) := \sum c_j T(\mathbf{e}_j)$; the basis property makes this well defined, and it is routine to check additivity and homogeneity. Uniqueness: any linear map agreeing with $T$ on a basis agrees with it on every linear combination, hence everywhere. This "freely and uniquely" theorem is why matrices work, and it is special to a basis — it fails for a spanning set that is not independent (the assignment may be inconsistent) and for an independent set that does not span (some inputs are unreached). We return to it for abstract spaces in Chapter 35.

7.4 How do you multiply a matrix by a vector? (The weighted sum of columns)

We now have the matrix $A$ whose columns are $T(\mathbf{e}_1)$ and $T(\mathbf{e}_2)$. We want a notation for "apply $T$ to $\mathbf{v}$," and the natural one is $A\mathbf{v}$. The question is: what should $A\mathbf{v}$ equal? We do not get to choose freely — linearity already settled it. Recall: $$T(\mathbf{v}) = x\,T(\mathbf{e}_1) + y\,T(\mathbf{e}_2),$$ and $T(\mathbf{e}_1), T(\mathbf{e}_2)$ are exactly the columns of $A$. Therefore, with no further assumptions, the matrix-vector product must be the weighted sum of the columns, where the weights are the entries of the vector:

$$A\mathbf{v} = \begin{bmatrix} a & b \\ c & d \end{bmatrix}\begin{bmatrix} x \\ y \end{bmatrix} = x\begin{bmatrix} a \\ c \end{bmatrix} + y\begin{bmatrix} b \\ d \end{bmatrix} = \begin{bmatrix} ax + by \\ cx + dy \end{bmatrix}.$$

The Key InsightMatrix times vector = weighted sum of the matrix's columns, with the vector's entries as the weights. This is not a rule handed down from above; it is forced by linearity the moment you decide the columns hold the basis images. Read $A\mathbf{v}$ as: "take $x$ copies of the first column, $y$ copies of the second, and add."

This is worth contrasting sharply with the rule you may have seen elsewhere. The "row dotted with the vector" recipe gives the same numbers — the first output entry $ax + by$ is indeed the first row $(a,b)$ dotted with $(x,y)$ — but it hides the meaning. The row picture tells you how to compute; the column picture tells you what is happening: you are reaching a point that is a combination of the destination arrows. We will rehabilitate the row picture in Chapter 8, where it emerges naturally from viewing matrix-matrix multiplication as composition. For now, resist it. The columns are where the geometry lives.

Common PitfallReaching for the row-times-column rule. If your instinct is to compute $A\mathbf{v}$ by sliding along rows, you will get the right number and the wrong understanding. Worse, the row recipe makes it mysterious why $A\mathbf{v}$ lands in the column space of $A$ — the span of the columns (Chapter 6). With the weighted-sum view it's obvious: $A\mathbf{v}$ is by construction a combination of the columns, so it can only ever land in their span. That single observation is the seed of Chapter 13's column space. Train yourself to see columns first.

Let's do one entirely by hand. Take $$A = \begin{bmatrix} 2 & 1 \\ 0 & 3 \end{bmatrix}, \qquad \mathbf{v} = \begin{bmatrix} 4 \\ 5 \end{bmatrix}.$$ The weighted-sum recipe says: 4 copies of the first column plus 5 copies of the second. $$A\mathbf{v} = 4\begin{bmatrix} 2 \\ 0 \end{bmatrix} + 5\begin{bmatrix} 1 \\ 3 \end{bmatrix} = \begin{bmatrix} 8 \\ 0 \end{bmatrix} + \begin{bmatrix} 5 \\ 15 \end{bmatrix} = \begin{bmatrix} 13 \\ 15 \end{bmatrix}.$$ So $A$ sends the point $(4,5)$ to the point $(13,15)$. Geometrically, the matrix took 4 steps along its first destination-arrow $(2,0)$ and 5 along its second $(1,3)$, and that is where we ended up.

Now confirm with numpy. In numpy, the @ operator is matrix-vector (and matrix-matrix) multiplication; np.array([[...],[...]]) builds the matrix row by row, so the rows of the literal are the rows of $A$.

# Matrix-vector product: numpy confirms our hand computation.
import numpy as np
A = np.array([[2, 1],
              [0, 3]])
v = np.array([4, 5])
print(A @ v)                       # the transformed point
# To SEE the weighted-sum-of-columns picture explicitly:
print(4 * A[:, 0] + 5 * A[:, 1])   # 4*(first column) + 5*(second column)
[13 15]
[13 15]

Both lines print [13 15], matching our hand result. The second line is the whole chapter in one expression: A @ v is literally v[0]*A[:,0] + v[1]*A[:,1], the weighted sum of the columns. (Note the index shift: our first column $T(\mathbf{e}_1)$ is A[:, 0] in numpy.)

Check Your Understanding — Using only the weighted-sum-of-columns idea, what is $A\mathbf{e}_1$ and $A\mathbf{e}_2$ for a $2\times 2$ matrix $A$? Why does this make the columns-as-images picture self-consistent?

Answer

$A\mathbf{e}_1 = 1\cdot(\text{col }1) + 0\cdot(\text{col }2) = $ the first column. Likewise $A\mathbf{e}_2 = $ the second column. So multiplying $A$ by a basis vector simply extracts the corresponding column. That is exactly the columns-as-images statement: the first column is $A\mathbf{e}_1 = T(\mathbf{e}_1)$, the image of east. The picture and the product agree because they are the same fact stated twice.

7.5 What do the standard transformations look like, and how do you build their matrices?

Now we cash in. For each classic transformation we will (1) picture it, (2) ask where do $\mathbf{e}_1$ and $\mathbf{e}_2$ go? to build its matrix, and (3) run the recurring 2D visualizer to watch it act on the unit square. The visualizer was introduced in Chapter 1 and lives in toolkit/visualizer.py; we reuse it verbatim, changing only the matrix and the narration, so that all of the book's transformation figures look identical. Here it is, exactly as frozen in the style bible:

# toolkit/visualizer.py — the recurring 2D transformation visualizer.
# Shows what a 2x2 matrix A does to the unit square and the basis vectors.
import numpy as np
import matplotlib.pyplot as plt

def visualize_2d(A, title="", ax=None):
    """Plot the action of 2x2 matrix A on the unit square and i-hat, j-hat."""
    A = np.asarray(A, dtype=float)
    square = np.array([[0, 1, 1, 0, 0],
                       [0, 0, 1, 1, 0]])          # unit-square corners (closed)
    out = A @ square                               # transformed square
    e1, e2 = A @ np.array([1, 0]), A @ np.array([0, 1])   # images of basis vectors
    if ax is None:
        _, ax = plt.subplots(figsize=(5, 5))
    ax.plot(square[0], square[1], "b--", lw=1, label="input (unit square)")
    ax.fill(out[0], out[1], alpha=0.25, color="C1")
    ax.plot(out[0], out[1], "C1-", lw=2, label="A · (unit square)")
    ax.arrow(0, 0, *e1, color="C3", width=0.02, length_includes_head=True)  # A e1
    ax.arrow(0, 0, *e2, color="C2", width=0.02, length_includes_head=True)  # A e2
    ax.axhline(0, color="gray", lw=0.5); ax.axvline(0, color="gray", lw=0.5)
    ax.set_aspect("equal"); ax.grid(True, alpha=0.3)
    ax.set_title(title or f"det = {np.linalg.det(A):.2f}")
    ax.legend(loc="best", fontsize=8)
    return ax

# Example: a horizontal shear
# visualize_2d([[1, 1], [0, 1]], title="Shear")
# plt.show()

In every figure that follows: the blue dashed outline is the input unit square (corners at $(0,0),(1,0),(1,1),(0,1)$); the orange filled region is its image under $A$; the red arrow is $A\mathbf{e}_1$ (where east lands, the first column); the green arrow is $A\mathbf{e}_2$ (where north lands, the second column). Watching the two arrows is watching the transformation, since they are the columns of $A$.

7.5.1 The identity: doing nothing

The simplest transformation leaves every vector exactly where it is: $T(\mathbf{v}) = \mathbf{v}$. Where does $\mathbf{e}_1$ go? To $\mathbf{e}_1 = (1,0)$. Where does $\mathbf{e}_2$ go? To $\mathbf{e}_2 = (0,1)$. Stack as columns: $$I = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}.$$ This is the identity matrix $I$. Its columns are the basis vectors themselves, which is the algebraic way of saying "east stays east, north stays north." For any vector, $I\mathbf{v} = \mathbf{v}$, as the weighted-sum check confirms: $x\,(1,0) + y\,(0,1) = (x,y)$.

# The identity does nothing; the image equals the input.
from toolkit.visualizer import visualize_2d
import matplotlib.pyplot as plt
I = [[1, 0], [0, 1]]
visualize_2d(I, title="Identity: det = 1.00")
plt.show()

Figure 7.1. The identity transformation. The orange image square sits exactly on top of the blue dashed input square; the red arrow ($A\mathbf{e}_1$) points one unit east and the green arrow ($A\mathbf{e}_2$) one unit north, unchanged. Title reads det = 1.00. Alt-text: A unit square with its transformed image perfectly overlapping it, and two perpendicular unit arrows along the positive x- and y-axes, illustrating that the identity matrix leaves space unchanged.

7.5.2 Scaling: stretching and squashing

A scaling multiplies the horizontal direction by $s_x$ and the vertical by $s_y$. Where does $\mathbf{e}_1$ go? To $(s_x, 0)$. Where does $\mathbf{e}_2$ go? To $(0, s_y)$. So $$A = \begin{bmatrix} s_x & 0 \\ 0 & s_y \end{bmatrix}.$$ A diagonal matrix is a pure stretch along the axes; the diagonal entries are the stretch factors. If $s_x = s_y$ it's a uniform zoom; if they differ it's a directional stretch. Take $s_x = 2, s_y = \tfrac{1}{2}$ — stretch east twofold, squash north by half:

# Scaling: stretch x by 2, squash y by 1/2. Area scales by 2 * 1/2 = 1.
from toolkit.visualizer import visualize_2d
import matplotlib.pyplot as plt
A = [[2, 0], [0, 0.5]]
visualize_2d(A, title="Scaling (2, 1/2): det = 1.00")
plt.show()

Figure 7.2. A non-uniform scaling. The orange image is a $2 \times \tfrac{1}{2}$ rectangle: twice as wide, half as tall as the input. The red arrow now reaches $(2,0)$; the green arrow only reaches $(0, 0.5)$. The title reads det = 1.00 because widening by 2 and shortening by $\tfrac{1}{2}$ leaves the area unchanged — a preview of Chapter 11's reading of the determinant as an area-scaling factor. Alt-text: A wide, short orange rectangle replacing the blue unit square, with a long horizontal red arrow and a short vertical green arrow.

Geometric Intuition — Diagonal matrices are the "easy" transformations: each axis is scaled independently and nothing tilts. A great deal of advanced linear algebra is the project of making a matrix diagonal by choosing the right coordinate system (diagonalization, Chapter 25; SVD, Chapter 30), precisely because diagonal transformations are so transparent — you can see at a glance what they do.

7.5.3 Rotation: deriving the matrix everyone memorizes

Now the star of the chapter. We will derive the rotation matrix instead of memorizing it, using only "where do the basis vectors go?" and a little trigonometry.

Rotate the whole plane counterclockwise by angle $\theta$ about the origin. Where does east, $\mathbf{e}_1 = (1,0)$, go? It swings up to the point on the unit circle at angle $\theta$ from the positive $x$-axis, which by the definition of sine and cosine is $$T(\mathbf{e}_1) = (\cos\theta,\ \sin\theta).$$ Where does north, $\mathbf{e}_2 = (0,1)$, go? It starts at angle $90°$ and rotates to angle $90° + \theta$. Using the angle-sum identities (or just noticing that north is east rotated $90°$ counterclockwise, so its image is east's image rotated another $90°$ counterclockwise, i.e. $(\cos(\theta+90°), \sin(\theta+90°))$): $$T(\mathbf{e}_2) = (\cos(\theta + 90°),\ \sin(\theta + 90°)) = (-\sin\theta,\ \cos\theta).$$ (Rotating a vector $(\cos\alpha, \sin\alpha)$ by $90°$ counterclockwise sends it to $(-\sin\alpha, \cos\alpha)$ — the perpendicular, turned left.) Stack the two images as columns:

$$\boxed{\,R(\theta) = \begin{bmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix}\,}$$

There it is — the standard $2\times 2$ rotation matrix, and you built it yourself from two questions and the unit circle. No memorization required: if you ever forget the signs, just re-derive where east goes ($(\cos\theta,\sin\theta)$, easy) and where north goes (east turned another quarter turn). The minus sign sits on the top-right because rotating north counterclockwise pushes it into negative-$x$ territory.

Let's verify on a clean angle, $\theta = 90°$. Then $\cos 90° = 0$ and $\sin 90° = 1$, so $$R(90°) = \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix}.$$ Check the basis images: $R(90°)\mathbf{e}_1 = (0,1)$ (east becomes north — correct, a quarter turn left) and $R(90°)\mathbf{e}_2 = (-1,0)$ (north becomes west — correct). Now a $30°$ rotation, computed and visualized:

# Rotation by theta (counterclockwise). Build R(theta) from cos/sin.
import numpy as np, matplotlib.pyplot as plt
from toolkit.visualizer import visualize_2d
theta = np.radians(30)
R = np.array([[np.cos(theta), -np.sin(theta)],
              [np.sin(theta),  np.cos(theta)]])
print(np.round(R, 4))
print("det =", round(float(np.linalg.det(R)), 4))   # rotations preserve area
visualize_2d(R, title="Rotation 30°: det = 1.00")
plt.show()
[[ 0.866 -0.5  ]
 [ 0.5    0.866]]
det = 1.0

The printed matrix matches $\begin{bmatrix}\cos 30° & -\sin 30° \\ \sin 30° & \cos 30°\end{bmatrix} = \begin{bmatrix}0.866 & -0.5 \\ 0.5 & 0.866\end{bmatrix}$, and the determinant is exactly $1$ because a rotation moves the square without stretching it (since $\cos^2\theta + \sin^2\theta = 1$).

Figure 7.3. Rotation by 30° counterclockwise. The orange image square is the input square pivoted 30° about the origin — same size and shape, tilted left. The red arrow ($A\mathbf{e}_1$) points to $(\cos 30°, \sin 30°) \approx (0.87, 0.5)$; the green arrow ($A\mathbf{e}_2$) points to $(-\sin 30°, \cos 30°) \approx (-0.5, 0.87)$, still perpendicular to the red one. Title det = 1.00. Alt-text: A unit square tilted 30 degrees counterclockwise, with two perpendicular unit arrows also rotated 30 degrees, showing a rigid rotation that preserves lengths and angles.

Geometric Intuition — A rotation keeps every length and every angle: the image square is congruent to the input, just turned. The two column-arrows stay unit-length and stay perpendicular to each other. Transformations that preserve lengths and angles like this are called orthogonal, and rotations are the orientation-preserving ones; we devote all of Chapter 21 to them, and they reappear as the $U$ and $V$ of the SVD in Chapter 30.

Common PitfallSign and direction confusion. The matrix above rotates counterclockwise for a positive angle, in the standard math convention with the $y$-axis pointing up. Two traps: (1) screen coordinates in many graphics systems put $y$ pointing down, which visually flips the rotation direction — see Case Study 1. (2) Mixing degrees and radians: numpy's np.cos/np.sin expect radians, so convert with np.radians(30) as above. Feeding np.cos(30) treats 30 as radians and silently gives nonsense.

7.5.4 Shear: sliding layers past each other

A shear slides the plane parallel to one axis by an amount proportional to the other coordinate — like pushing the top of a deck of cards sideways while the bottom stays put. A horizontal shear with factor $k$ leaves east fixed and tilts north. Where does $\mathbf{e}_1$ go? It stays at $(1,0)$ (points on the $x$-axis don't move). Where does $\mathbf{e}_2 = (0,1)$ go? It slides right by $k$, to $(k, 1)$. So $$A = \begin{bmatrix} 1 & k \\ 0 & 1 \end{bmatrix}.$$

# Horizontal shear with k = 1: i-hat stays, j-hat slides right by 1.
from toolkit.visualizer import visualize_2d
import matplotlib.pyplot as plt
A = [[1, 1], [0, 1]]
visualize_2d(A, title="Shear (k=1): det = 1.00")
plt.show()

Figure 7.4. A horizontal shear, $k = 1$. The orange image is a parallelogram: the bottom edge stayed put along the $x$-axis, but the top edge slid one unit to the right, so the square leans like italic type. The red arrow ($A\mathbf{e}_1$) is unchanged at $(1,0)$; the green arrow ($A\mathbf{e}_2$) now points to $(1,1)$, the tilted "north." Title det = 1.00. Alt-text: A leaning parallelogram replacing the unit square, with the horizontal red arrow unchanged and the green arrow tilted to point diagonally up and to the right.

Geometric Intuition — A shear is sneaky: it changes shape (squares become slanted parallelograms) and yet preserves area, which is why its determinant is $1$. The base stays the same length and the height stays the same; only the slant changes, and slant doesn't affect area (base × height). Shears are the workhorses of italic fonts, of certain image-warping effects, and — surprisingly — of efficient algorithms, since every elementary row operation in Gaussian elimination (Chapter 4) is a shear in disguise.

7.5.5 Reflection: flipping the plane over

A reflection flips the plane across a line, like a mirror. Reflect across the $x$-axis: east is on the axis, so it stays; north flips to point south. Where does $\mathbf{e}_1$ go? To $(1, 0)$. Where does $\mathbf{e}_2$ go? To $(0, -1)$. So $$A = \begin{bmatrix} 1 & 0 \\ 0 & -1 \end{bmatrix}.$$

# Reflection across the x-axis: y flips sign. Orientation reverses (det < 0).
from toolkit.visualizer import visualize_2d
import matplotlib.pyplot as plt
A = [[1, 0], [0, -1]]
visualize_2d(A, title="Reflect across x-axis: det = -1.00")
plt.show()

Figure 7.5. Reflection across the $x$-axis. The orange image square hangs below the $x$-axis — the input square mirror-flipped downward. The red arrow ($A\mathbf{e}_1$) is unchanged at $(1,0)$; the green arrow ($A\mathbf{e}_2$) now points down to $(0,-1)$. The title reads det = -1.00: the negative determinant is the algebraic fingerprint of a flip. Alt-text: A unit square reflected below the x-axis, with the horizontal red arrow unchanged and the green arrow pointing straight down.

The Key Insight — A negative determinant means orientation reversed — the transformation includes a flip, turning a left hand into a right hand. A rotation (det $= +1$) you could achieve by physically turning the page; a reflection (det $= -1$) you cannot, no matter how you spin the paper, because it swaps clockwise for counterclockwise. We make "determinant = signed area scaling" rigorous in Chapter 11; the sign is the orientation.

Warning — Not every matrix with a $-1$ on the diagonal is a simple reflection, and not every "flip" is across an axis. Reflection across the line $y = x$ is $\begin{bmatrix}0 & 1 \\ 1 & 0\end{bmatrix}$ (it swaps east and north — check the columns!), with determinant $-1$ but no negative entries at all. The reliable test for "does this transformation include a flip?" is the sign of the determinant, not the signs of the entries. Reading entries to guess orientation is a common source of error.

7.5.6 Reflection across an arbitrary line: the workflow at full strength

The two reflections we just built — across the $x$-axis and across $y = x$ — were easy because the basis vectors had obvious shadows. But the columns-as-images workflow does not need a convenient mirror. Let us reflect across a line through the origin tilted at an arbitrary angle $\phi$ above the positive $x$-axis, and watch the same two questions deliver a matrix that no one could write down by guessing entries.

A reflection across a line is a mirror standing on that line: a point on the line stays put, and a point off the line jumps to its mirror image on the far side, the same distance away. So the question "where does $\mathbf{e}_1$ go?" is "what is the mirror image of east across the line at angle $\phi$?" Here trigonometry does the work. Reflecting the direction at angle $0$ (east) across a mirror at angle $\phi$ produces the direction at angle $2\phi$, because reflection across a line doubles the angle between the input and the mirror, throwing the ray to the symmetric position on the other side. Hence east lands on the unit vector at angle $2\phi$: $$T(\mathbf{e}_1) = (\cos 2\phi,\ \sin 2\phi).$$ The same reasoning sends north. East started at angle $0$ and reflected to $2\phi$; north starts at angle $90°$, and reflecting it across the same mirror produces the direction at angle $2\phi - 90°$. Using $\cos(2\phi - 90°) = \sin 2\phi$ and $\sin(2\phi - 90°) = -\cos 2\phi$, $$T(\mathbf{e}_2) = (\sin 2\phi,\ -\cos 2\phi).$$ Stack the two images as columns and you have the general reflection matrix, built — as always — from nothing but where east and north land: $$\boxed{\,F(\phi) = \begin{bmatrix} \cos 2\phi & \sin 2\phi \\ \sin 2\phi & -\cos 2\phi \end{bmatrix}\,}$$ Two sanity checks reassure us this is the right object. Set $\phi = 0°$ (mirror along the $x$-axis): $2\phi = 0$, so $F = \begin{bmatrix}1 & 0 \\ 0 & -1\end{bmatrix}$ — exactly the $x$-axis reflection of §7.5.5. Set $\phi = 45°$ (mirror along $y = x$): $2\phi = 90°$, so $F = \begin{bmatrix}0 & 1 \\ 1 & 0\end{bmatrix}$ — exactly the $y=x$ reflection from the Warning above. Our two earlier reflections were special cases of one formula all along.

Now pick a genuinely tilted mirror, $\phi = 30°$, so $2\phi = 60°$. Then $\cos 60° = \tfrac12$ and $\sin 60° = \tfrac{\sqrt3}{2} \approx 0.866$, giving $$F(30°) = \begin{bmatrix} 0.5 & 0.866 \\ 0.866 & -0.5 \end{bmatrix}.$$

# Reflection across the line through the origin at angle phi = 30 degrees.
import numpy as np, matplotlib.pyplot as plt
from toolkit.visualizer import visualize_2d
phi = np.radians(30)
F = np.array([[np.cos(2*phi),  np.sin(2*phi)],
              [np.sin(2*phi), -np.cos(2*phi)]])
print(np.round(F, 4))
print("det =", round(float(np.linalg.det(F)), 4))   # a flip: expect -1
u = np.array([np.cos(phi), np.sin(phi)])             # a vector ON the mirror line
print("F @ u =", np.round(F @ u, 4), " (should equal u, unchanged)")
visualize_2d(F, title="Reflect across 30° line: det = -1.00")
plt.show()
[[ 0.5    0.866]
 [ 0.866 -0.5  ]]
det = -1.0
F @ u = [0.866 0.5  ] (should equal u, unchanged)

The determinant is $-1$ — the orientation-reversing fingerprint of every reflection, exactly as in §7.5.5 — and the vector $\mathbf{u} = (\cos 30°, \sin 30°)$ that lies on the mirror comes back unchanged, because the mirror leaves its own line fixed. That fixed direction is your first sighting of an eigenvector with eigenvalue $+1$ (Chapter 23): a reflection has a whole line of vectors it does not move (the mirror) and a perpendicular line it flips to the opposite sign (eigenvalue $-1$). The columns built the matrix; the geometry of the mirror reads it right back.

Figure 7.6. Reflection across the line at $30°$. The orange image square is the input square flipped across the dashed mirror line that rises at $30°$ through the origin — same size and shape, but mirror-reversed, so it has swapped handedness. The red arrow ($A\mathbf{e}_1$) points to $(0.5, 0.866)$, which sits at angle $60° = 2\phi$; the green arrow ($A\mathbf{e}_2$) points to $(0.866, -0.5)$. Title det = -1.00. Alt-text: A unit square reflected across a line tilted thirty degrees from horizontal, with the red arrow swung up to sixty degrees and the green arrow pointing down and to the right, the image being a mirror image of the input.

Geometric Intuition — Watch what the angle-doubling buys you: as the mirror rotates from $\phi$ to $\phi + 90°$, the matrix $F(\phi)$ runs through $F(\phi + 90°)$, which (since $2\phi$ advances by $180°$) negates every entry of $F(\phi)$ — reflecting across a line and across its perpendicular differ by a half-turn of the plane. Reflections, unlike rotations, are involutions: do the same reflection twice and you are back where you started, so $F(\phi)^2 = I$ for every $\phi$. You can feel this in the picture — flip across a mirror, flip again across the same mirror, and the square lands exactly home.

Common PitfallConfusing the reflection matrix with the rotation matrix, and mis-handling the angle. The reflection $F(\phi) = \begin{bmatrix}\cos 2\phi & \sin 2\phi \\ \sin 2\phi & -\cos 2\phi\end{bmatrix}$ looks deceptively close to the rotation $R(\theta) = \begin{bmatrix}\cos\theta & -\sin\theta \\ \sin\theta & \cos\theta\end{bmatrix}$, but three differences separate them and each matters. First, the angle is doubled: the matrix for a mirror at $30°$ uses $\cos 60°$ and $\sin 60°$, not $\cos 30°$, because reflection sends the angle $\alpha$ to $2\phi - \alpha$, doubling $\phi$. Forgetting the factor of two is the single most common error here. Second, the bottom-right entry is $-\cos 2\phi$ (a minus sign), and the off-diagonal entries are both $+\sin 2\phi$ — the rotation's off-diagonals have opposite signs, the reflection's match. Third, and most telling, the determinant: $\det R(\theta) = +1$ (a rigid turn you could perform by spinning the page), while $\det F(\phi) = -\cos^2 2\phi - \sin^2 2\phi = -1$ (a flip you cannot). If you ever blank on whether your matrix is a rotation or a reflection, compute the determinant — its sign tells you instantly, no memory required. And when you call numpy, feed 2*phi into np.cos and np.sin, in radians, exactly as the snippet above does; passing phi instead silently builds the reflection across the half-angle line.

Check Your Understanding — Without multiplying anything, what does $F(90°)$ — reflection across the vertical line through the origin (the $y$-axis) — do to $\mathbf{e}_1$ and $\mathbf{e}_2$, and what matrix results?

Answer

A mirror along the $y$-axis flips left and right and leaves up/down alone. East $\mathbf{e}_1 = (1,0)$ reflects to $(-1, 0)$; north $\mathbf{e}_2 = (0,1)$ is on the mirror, so it stays at $(0,1)$. The matrix is $\begin{bmatrix}-1 & 0 \\ 0 & 1\end{bmatrix}$. Confirm with the formula: $\phi = 90°$ gives $2\phi = 180°$, so $\cos 2\phi = -1$ and $\sin 2\phi = 0$, yielding $\begin{bmatrix}-1 & 0 \\ 0 & 1\end{bmatrix}$. The formula and the picture agree.

7.5.7 Projection: flattening space onto a line

Our last transformation is different in kind, and it previews a major theme. A projection onto the $x$-axis sends every point straight down (or up) to its shadow on the $x$-axis, throwing away the height. Where does $\mathbf{e}_1 = (1,0)$ go? It's already on the axis, so it stays at $(1,0)$. Where does $\mathbf{e}_2 = (0,1)$ go? Its shadow on the $x$-axis is the origin, $(0,0)$. So $$A = \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix}.$$

# Projection onto the x-axis: everything collapses onto a line. det = 0.
from toolkit.visualizer import visualize_2d
import matplotlib.pyplot as plt
A = [[1, 0], [0, 0]]
visualize_2d(A, title="Project onto x-axis: det = 0.00")
plt.show()

Figure 7.7. Projection onto the $x$-axis. The orange "square" has collapsed into a flat orange segment lying along the $x$-axis from $(0,0)$ to $(1,0)$ — the whole two-dimensional square squashed onto a one-dimensional line. The red arrow ($A\mathbf{e}_1$) survives at $(1,0)$; the green arrow ($A\mathbf{e}_2$) has vanished to the origin (it's a zero-length arrow). Title det = 0.00. Alt-text: The unit square flattened into a horizontal line segment on the x-axis, with the horizontal red arrow intact and the vertical green arrow collapsed to a point at the origin.

The Key Insight — A zero determinant means space got flattened — the transformation collapses 2D onto a lower-dimensional shadow (here, a line). Such a matrix is singular: it destroys information (the entire vertical direction is annihilated) and therefore cannot be undone. You can't recover a point's height from its shadow. This is the geometric meaning of non-invertibility, the subject of Chapter 9, and it is why $\det(A) = 0$ is the universal alarm bell for "this transformation is irreversible."

Geometric Intuition — Projection is the first transformation we've met that loses dimensions, and it sits at the center of a huge swath of applied linear algebra. Least-squares regression (Chapter 17, 19) finds the best-fit line by projecting data onto a subspace. Data whitening and PCA (Chapter 32) project data onto its most important directions. The shadow you just watched the unit square cast is the same operation that powers fitting models to noisy data.

Real-World ApplicationWhitening a dataset (data science / machine learning). Suppose each data point is a vector of two features measured in clashing units — say a person's height in centimeters (spread of tens) and their reaction time in seconds (spread of fractions). Plotted, the cloud is a stretched ellipse: enormous along the height axis, pencil-thin along the time axis. Many algorithms (gradient descent, $k$-nearest-neighbors, anything using Euclidean distance) misbehave on such a lopsided cloud, because distance is dominated by whichever feature happens to have the larger numbers. Whitening is the preprocessing step that fixes this by rescaling each axis by the reciprocal of its spread (its standard deviation), turning the ellipse back into a round blob. In the simplest case — features already aligned with the axes — whitening is a pure diagonal scaling, the §7.5.2 transformation read in reverse: where east goes is "the height axis, shrunk to unit spread," and where north goes is "the time axis, stretched to unit spread." If height has standard deviation $2$ and reaction time has standard deviation $\tfrac12$, the whitening matrix is $W = \begin{bmatrix} 1/2 & 0 \\ 0 & 2 \end{bmatrix}$, whose columns say exactly that: east lands at $(0.5, 0)$ and north lands at $(0, 2)$. Apply $W$ to a far-flung sample like $(4, 0.5)$ and it moves to $W\mathbf{x} = 4(0.5,0) + 0.5(0,2) = (2,1)$ — both coordinates now order-one. Because $\det W = \tfrac12 \cdot 2 = 1$, this particular whitening preserves area even as it reshapes the cloud (a coincidence of these numbers, not a rule). When the features are correlated — the ellipse tilted off the axes — the same idea needs a rotation first to find the ellipse's natural axes, then a diagonal scaling along them; that rotate-then-scale recipe is the eigen-decomposition (Chapter 25) and, in its most general form, the SVD (Chapter 30). Whitening is your first glimpse that real data-preprocessing is, underneath, just the unit-square transformations of this chapter applied to a cloud of points. The covariance matrix that drives it has the structure $X^{\mathsf{T}}X$, foreshadowed in §7.9 and central to dimensionality reduction in data science.

Real-World ApplicationRotating a vector field (physics / engineering / graphics). A vector field assigns an arrow to every point of space — wind velocity over a map, the force on a charged particle, the flow of water around a hull. To re-express such a field in a frame rotated by $\theta$ (say you tilt your sensor, or change from map-north to true-north), you apply the rotation matrix $R(\theta)$ to every arrow in the field at once. Because all the arrows are rotated by the same matrix, the field's structure — where it swirls, where it converges — rotates rigidly with the frame, while magnitudes (lengths) are untouched, since rotations preserve length. This is the everyday workhorse of robotics (transforming a sensor reading into the robot's body frame), of computational fluid dynamics, and of graphics, where the same $R(\theta)$ that you just derived spins thousands of vertices per frame. The fact that one tiny $2\times 2$ matrix rotates an entire field is superposition cashed out: do the basis vectors right, and every vector follows. This idea recurs in transformations in video game design, where each frame applies the same transformation matrix to every point of every model.

7.5.8 A general transformation: when the columns aren't special

The six matrices above were chosen to be recognizable — pure stretches, turns, and flips. But most matrices you meet in the wild are none of these; they are general $2\times 2$ tables that mix stretching, rotating, and shearing all at once. The beauty of the columns-as-images picture is that it reads them anyway, with no special-case knowledge. Take $$A = \begin{bmatrix} 1 & -1 \\ 1 & 2 \end{bmatrix}.$$ Where does east go? To the first column, $(1,1)$ — up and to the right, at $45°$, length $\sqrt 2$. Where does north go? To the second column, $(-1,2)$ — up and to the left, length $\sqrt 5$. The two destination arrows are neither perpendicular nor the same length, so this is not a rotation, not a uniform scale, not a clean shear — it is a generic linear map that does a bit of everything. You still know exactly what it does to every point, though: the point $(x,y)$ lands at $x(1,1) + y(-1,2)$. That is the entire power of the chapter: you never need a transformation to be "nice" to read it, because the columns always tell you where the basis vectors go.

# A generic 2x2 map: not a rotation/scale/shear, but the columns still tell all.
import numpy as np, matplotlib.pyplot as plt
from toolkit.visualizer import visualize_2d
A = np.array([[1, -1], [1, 2]])
print("A e1 =", A @ np.array([1, 0]), "  A e2 =", A @ np.array([0, 1]))
print("det  =", int(round(np.linalg.det(A))))   # 1*2 - (-1)*1 = 3
visualize_2d(A, title="A generic map: det = 3.00")
plt.show()
A e1 = [1 1]   A e2 = [-1  2]
det  = 3

Figure 7.8. A generic linear transformation. The orange image is a slanted parallelogram (not a rectangle, not a rotated square): the red arrow ($A\mathbf{e}_1$) points to $(1,1)$ and the green arrow ($A\mathbf{e}_2$) to $(-1,2)$, forming two adjacent edges of the parallelogram that the unit square became. The title reads det = 3.00, so this map triples area. Alt-text: A slanted parallelogram replacing the unit square, with a red arrow pointing up-right to (1,1) and a green arrow pointing up-left to (-1,2), the two arrows being neither perpendicular nor equal in length.

The Key Insight — Every $2\times 2$ matrix turns the unit square into a parallelogram whose two edge-arrows are exactly the columns. "Square $\to$ parallelogram" is the universal picture of a linear map of the plane; the special transformations (rotation, scaling, shear, reflection, projection) are just the parallelograms with extra symmetry. The area of that parallelogram is $|\det A|$ — here $3$ — which is why the determinant is the area-scaling factor (Chapter 11).

7.6 How do you combine two transformations?

Here is a question that practically asks itself: if a matrix is a transformation, what happens when you do two of them, one after another? Scale, then rotate. Reflect, then shear. The answer is the doorway to Chapter 8, but we can already understand it geometrically with the only tool we need — "where do the basis vectors go?" — and that preview will make Chapter 8's algebra feel inevitable rather than arbitrary.

Suppose you first apply a transformation $B$, then apply a transformation $A$ to the result. Start with the basis vectors and just follow them through the pipeline. The vector $\mathbf{e}_1$ first goes to $B\mathbf{e}_1$ (the first column of $B$), and then that vector gets transformed by $A$, landing at $A(B\mathbf{e}_1)$. Likewise $\mathbf{e}_2$ ends at $A(B\mathbf{e}_2)$. But "where the basis vectors finally land" is precisely the recipe for the matrix of the combined transformation — so the combined matrix has columns $A(B\mathbf{e}_1)$ and $A(B\mathbf{e}_2)$. Combining transformations means tracking the basis vectors through both steps and recording the final landing spots as columns. That combined matrix is what Chapter 8 will call the product $AB$, and the chain of reasoning you just followed — apply $B$, then $A$ — is exactly why the product is written with $A$ on the left: it acts last.

Let's see it with a concrete pair. Let $B = \begin{bmatrix} 2 & 0 \\ 0 & 2 \end{bmatrix}$ (scale by 2) and let $A = R(90°) = \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix}$ (quarter turn). Do $B$ first, then $A$. Track $\mathbf{e}_1$: scaling sends it to $(2,0)$, then the quarter turn sends $(2,0)$ to $(0,2)$. Track $\mathbf{e}_2$: scaling sends it to $(0,2)$, then the quarter turn sends $(0,2)$ to $(-2,0)$. So the combined transformation has columns $(0,2)$ and $(-2,0)$: $$\text{(rotate)}\circ\text{(scale)} \;\longrightarrow\; \begin{bmatrix} 0 & -2 \\ 2 & 0 \end{bmatrix}.$$ That is exactly the "rotate-90°-and-scale-by-2" matrix $\begin{bmatrix}0 & -2 \\ 2 & 0\end{bmatrix}$ that we will read cold in §7.7 — and indeed scaling by 2 then turning is the same as a quarter-turn-with-zoom. We can confirm by tracking the basis vectors in numpy without yet naming matrix multiplication:

# Combine transformations by tracking the basis vectors through both steps.
import numpy as np
B = np.array([[2, 0], [0, 2]])                 # do this first (scale by 2)
A = np.array([[0, -1], [1, 0]])                # then this (rotate 90 deg)
col1 = A @ (B @ np.array([1, 0]))              # where e1 lands after B then A
col2 = A @ (B @ np.array([0, 1]))              # where e2 lands after B then A
print("combined columns:", col1, col2)
combined columns: [0 2] [-2  0]

Geometric Intuition — Composing transformations is composing motions: "scale then rotate" is one combined motion of space, and like every linear motion it has its own matrix, whose columns are — what else — the final destinations of the basis vectors. You don't need new machinery to understand it, only to compute it efficiently, which is Chapter 8's job.

Common PitfallAssuming order doesn't matter. It usually does. "Scale-then-rotate" and "rotate-then-scale" happen to agree for a uniform scale (because a uniform scale commutes with everything), but try "shear then rotate" versus "rotate then shear" and you'll get two different matrices — track the basis vectors both ways and see. Chapter 8 states this as the headline fact that matrix multiplication is not commutative: $AB \neq BA$ in general. The order in which you transform space matters, like the order of putting on socks and shoes.

7.7 How do you read a matrix you've never seen before?

You can now go the other direction: given a matrix, describe what it does — the skill of reading a transformation. The recipe is to look at the two columns, because they are $A\mathbf{e}_1$ and $A\mathbf{e}_2$, the destinations of east and north. Three diagnostics get you most of the way:

  1. Look at the columns — they tell you where the basis arrows land, which is the whole transformation. Are they still perpendicular and unit-length (a rotation or reflection)? Stretched along the axes (a scaling)? One unchanged and one tilted (a shear)? One collapsed to zero (a projection)?
  2. Check the determinant (Chapter 11 makes this precise, but the sign is already meaningful): $\det(A) > 0$ preserves orientation, $\det(A) < 0$ flips it, $\det(A) = 0$ flattens space (singular, irreversible). The magnitude is the area-scaling factor.
  3. Run the visualizer — when in doubt, look. One call to visualize_2d(A) turns any mystery matrix into a picture.

Let's read a matrix cold: $$A = \begin{bmatrix} 0 & -2 \\ 2 & 0 \end{bmatrix}.$$ Columns: east $\mathbf{e}_1$ lands at $(0,2)$ and north $\mathbf{e}_2$ lands at $(-2,0)$. Both column-arrows have length $2$ and they are perpendicular (their dot product is $0\cdot(-2) + 2\cdot 0 = 0$). East rotated to point north and grew to length 2; north rotated to point west and grew to length 2. So this is a rotation by $90°$ combined with a uniform scaling by $2$ — a "spiral out." The determinant is $0\cdot 0 - (-2)(2) = 4 > 0$, confirming orientation is preserved and area is scaled by $4$ (which is $2^2$, the square of the linear scale factor — area scales as the square of length).

# Read a mystery matrix: rotate 90° and scale by 2. det should be 4.
import numpy as np
A = np.array([[0, -2], [2, 0]])
print("A e1 =", A @ np.array([1, 0]))   # where east goes
print("A e2 =", A @ np.array([0, 1]))   # where north goes
print("det  =", int(round(np.linalg.det(A))))
A e1 = [0 2]
A e2 = [-2  0]
det  = 4

The output confirms our reading: east goes to $(0,2)$, north to $(-2,0)$, determinant $4$. Reading a matrix is just looking at where it sends the basis vectors — the same skill as building one, run backward.

Check Your Understanding — Describe the transformation $\begin{bmatrix} 3 & 0 \\ 0 & 3 \end{bmatrix}$ and the transformation $\begin{bmatrix} -1 & 0 \\ 0 & 1 \end{bmatrix}$ from their columns.

Answer

First matrix: east $\to (3,0)$, north $\to (0,3)$ — both directions stretched by 3 with no tilt or flip. It's a uniform scaling by 3 (a zoom). Determinant $= 9$, area scaled ninefold. Second matrix: east $\to (-1, 0)$ (flipped to point west), north $\to (0,1)$ (unchanged). It's a reflection across the $y$-axis (left-right mirror). Determinant $= -1$: orientation reversed, area preserved.

The columns also tell you something the determinant alone cannot: which directions are special. Look once more at a pure scaling $\begin{bmatrix} 2 & 0 \\ 0 & 3\end{bmatrix}$. East lands on the east axis (just longer) and north lands on the north axis (just longer) — so the $x$- and $y$-axes are directions the matrix stretches without rotating them off their own line. Those un-rotated directions are the seeds of eigenvectors (Chapter 23), the single most revealing thing about a transformation. You are not ready to compute them yet, but you can already spot them in the simplest cases by reading the columns: any basis vector that lands on a scalar multiple of itself is along a special direction. Keep that observation in your pocket; it blossoms into the heart of the book.

Real-World ApplicationState transitions in economics. In a simple model of a labor market, let the vector $\mathbf{x} = (e, u)$ hold the number of employed and unemployed workers. Each month a fixed fraction of the employed lose their jobs and a fixed fraction of the unemployed find work; the new state is $A\mathbf{x}$ for a fixed $2\times 2$ transition matrix $A$ whose columns encode "where do all the employed go?" and "where do all the unemployed go?" — exactly our where-do-the-basis-vectors-go reading, now applied to populations instead of arrows. Iterating, the state after $n$ months is $A$ applied $n$ times, and the long-run behavior (does unemployment settle to a steady rate?) is governed by the matrix's eigenvalues — the same Chapter 23 machinery, and the same engine behind Google's PageRank in Chapter 29. A matrix transforming a state vector month after month is the discrete-time heartbeat of quantitative economics, ecology, and epidemiology.

Historical Note — The word matrix (Latin for "womb," the thing from which something is generated) was coined by James Joseph Sylvester around 1850, and his friend Arthur Cayley developed the algebra of matrices — including their multiplication as the composition of linear transformations — in his 1858 Memoir on the Theory of Matrices [verify]. Strikingly, the transformation viewpoint we treat as fundamental came first historically: Cayley thought of a matrix as a way of writing down a substitution of variables (a linear transformation), and the rules of matrix arithmetic followed from that. We are, in a sense, simply returning to the original idea.

7.8 Why is "linear" exactly the same as "representable by a matrix"? (The proof)

We've been using the biconditional from §7.2 — a transformation is linear if and only if it equals multiplication by some matrix — and now we prove it. This is the theorem that justifies the entire enterprise of using matrices to study transformations.

Why we care. If matrices captured only some linear transformations, we'd constantly worry whether the transformation in front of us is one of the lucky ones. The theorem removes the worry: matrices capture exactly the linear transformations — no more (every matrix is linear) and no fewer (every linear map is a matrix). It is the bridge that lets us move freely between geometry (transformations) and algebra (matrices) for the rest of the book.

Key idea. One direction is a direct computation; the other is the columns-as-images construction we already discovered, now stated as a proof.

Proof.

Direction 1: every matrix gives a linear transformation. Let $A$ be any $m\times n$ matrix and define $T(\mathbf{x}) = A\mathbf{x}$ using the weighted-sum-of-columns product. Write the columns of $A$ as $\mathbf{a}_1, \dots, \mathbf{a}_n$, so that $A\mathbf{x} = x_1\mathbf{a}_1 + \cdots + x_n\mathbf{a}_n$. We check the two rules.

Additivity. For vectors $\mathbf{x}, \mathbf{y}$ with components $x_j, y_j$, $$A(\mathbf{x} + \mathbf{y}) = \sum_{j} (x_j + y_j)\,\mathbf{a}_j = \sum_j x_j\mathbf{a}_j + \sum_j y_j\mathbf{a}_j = A\mathbf{x} + A\mathbf{y},$$ using only that scalar multiplication distributes over vector addition (Chapter 2). Homogeneity. For any scalar $c$, $$A(c\,\mathbf{x}) = \sum_j (c\,x_j)\,\mathbf{a}_j = c\sum_j x_j\mathbf{a}_j = c\,(A\mathbf{x}).$$ Both rules hold, so $T(\mathbf{x}) = A\mathbf{x}$ is linear. $\checkmark$

Direction 2: every linear transformation is a matrix. Let $T:\mathbb{R}^n \to \mathbb{R}^m$ be linear. Define $A$ to be the matrix whose $j$-th column is $T(\mathbf{e}_j)$, the image of the $j$-th standard basis vector. We claim $T(\mathbf{x}) = A\mathbf{x}$ for every $\mathbf{x}$. Write $\mathbf{x} = x_1\mathbf{e}_1 + \cdots + x_n\mathbf{e}_n$ (every vector is this unique combination of basis vectors, Chapter 6). Apply $T$ and use linearity repeatedly: $$T(\mathbf{x}) = T\!\left(\sum_j x_j \mathbf{e}_j\right) = \sum_j x_j\, T(\mathbf{e}_j) = \sum_j x_j\,\mathbf{a}_j = A\mathbf{x},$$ where the last equality is the definition of the matrix-vector product as the weighted sum of columns $\mathbf{a}_j = T(\mathbf{e}_j)$. So $T$ agrees with multiplication by $A$ on every vector. $\blacksquare$

What this means. The two directions together say linear transformations and matrices are the same objects. Direction 2 also delivers a bonus we'll lean on forever: the matrix is built by feeding the basis vectors through $T$ and recording the outputs as columns — the exact workflow of §7.5. And uniqueness comes free: since the columns must equal $T(\mathbf{e}_j)$, there is only one matrix that represents $T$ in the standard basis. (In a different basis the representing matrix changes — that's Chapter 16 — but for a fixed basis it's pinned down.)

Math-Major Sidebar (optional) — The construction in Direction 2 secretly establishes an isomorphism between the space of linear maps $\mathcal{L}(\mathbb{R}^n, \mathbb{R}^m)$ and the space of $m\times n$ matrices $\mathbb{R}^{m\times n}$: the assignment $T \mapsto A$ is itself linear (the matrix of $S + T$ is the sum of their matrices; the matrix of $cT$ is $c$ times the matrix) and bijective. So studying matrices is studying linear maps, not merely a tool for them. We will see in Chapter 8 that this correspondence even respects multiplication — the matrix of a composition $S \circ T$ is the product of the matrices — which is the real reason matrix multiplication is defined the way it is.

7.9 What does the transpose do, and why do we need it?

We close the conceptual development with one more operation on a matrix, the transpose, which we'll use immediately in the toolkit and give deeper geometric meaning to in later parts. The transpose of a matrix $A$, written $A^{\mathsf{T}}$ (always with the sans-serif $\mathsf{T}$, never $A^T$), is the matrix you get by flipping $A$ across its main diagonal: rows become columns and columns become rows. Formally, the $(i,j)$ entry of $A^{\mathsf{T}}$ is the $(j,i)$ entry of $A$: $$\big(A^{\mathsf{T}}\big)_{ij} = a_{ji}.$$ For example, $$A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} \;\Longrightarrow\; A^{\mathsf{T}} = \begin{bmatrix} 1 & 4 \\ 2 & 5 \\ 3 & 6 \end{bmatrix}.$$ A $2\times 3$ matrix becomes a $3\times 2$ matrix; the first row of $A$, $(1,2,3)$, becomes the first column of $A^{\mathsf{T}}$. The diagonal entries ($1$ and $5$ here) stay fixed — they're on the fold line.

# Transpose flips rows and columns; numpy spells it A.T.
import numpy as np
A = np.array([[1, 2, 3],
              [4, 5, 6]])
print(A.T)
print("shape:", A.shape, "->", A.T.shape)
[[1 4]
 [2 5]
 [3 6]]
shape: (2, 3) -> (3, 2)

Common PitfallThinking the transpose is the same transformation. It is usually a different one. For the shear $A = \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}$ (slides north rightward), the transpose $A^{\mathsf{T}} = \begin{bmatrix}1 & 0 \\ 1 & 1\end{bmatrix}$ is a vertical shear (slides east upward) — related, but not equal. Transpose is not "undo"; the inverse (Chapter 9) undoes. The transpose's true geometric meaning involves how the map interacts with the dot product, and emerges fully in Part IV; here, treat it as the row/column flip and a building block. There is one neat family where transpose does coincide with something special — symmetric matrices, where $A^{\mathsf{T}} = A$ — which become the stars of the Spectral Theorem (Chapter 27).

Why introduce it now, three parts before its geometry? Two reasons. First, it's a one-line operation that completes our basic matrix vocabulary alongside the matrix-vector product, and it's the natural second function for this chapter's toolkit module. Second, it shows up constantly in code: data matrices get transposed to switch between "rows are samples" and "columns are samples," and the expression $A^{\mathsf{T}}A$ (a matrix times its own transpose) is the engine of least squares (Chapter 19), covariance (Chapter 32), and the SVD (Chapter 30). Getting comfortable flipping a matrix now pays dividends throughout the book.

Real-World ApplicationData layout in machine learning (data science). A dataset is usually stored as a matrix $X$ with one row per sample and one column per feature — but many linear-algebra formulas (covariance, Gram matrices, the normal equations) are written assuming one column per sample. The bridge between the two conventions is the transpose: $X^{\mathsf{T}}$ swaps the roles. When you read a derivation that suddenly writes $X^{\mathsf{T}}X$, it is silently assuming a layout, and the transpose is reconciling it with how the data actually sits in memory. This same $X^{\mathsf{T}}X$ structure is the first thing computed inside the linear layers described in how neural networks work, where weight matrices transform activation vectors exactly as the matrices in this chapter transform the unit square — millions of times per forward pass.

7.10 Build Your Toolkit

Time to make this chapter's ideas executable. The recurring toolkit/ is the from-scratch heart of this book: each module implements an operation in pure Python (no numpy in the implementation), using numpy only to check the result. This chapter contributes the foundation of toolkit/matrices.py — the matrix-vector product, written the way this chapter taught it (as a weighted sum of columns), and the transpose.

Build Your Toolkit — Implement two functions in toolkit/matrices.py, in pure Python: - apply(A, v) — the matrix-vector product $A\mathbf{v}$, computed as the weighted sum of the columns of $A$ (not a row-times-column loop — write it so the code mirrors $\sum_j v_j\,(\text{column }j)$). Represent a matrix as a list of rows, e.g. A = [[2, 1], [0, 3]], and a vector as a list, v = [4, 5]. - transpose(A) — return a new matrix whose rows are the columns of A.

Then verify against numpy: for several random A and v, confirm apply(A, v) equals (np.array(A) @ np.array(v)).tolist() and transpose(A) equals np.array(A).T.tolist(). A reference skeleton (you fill in the bodies):

```python

toolkit/matrices.py — matrix-vector product and transpose, from scratch.

def apply(A, v): """A @ v as a weighted sum of the columns of A. A: list of rows; v: list.""" m = len(A) # number of rows (output dimension) n = len(A[0]) # number of columns (input dimension) = len(v) result = [0.0] * m for j in range(n): # for each column j ... col_weight = v[j] # ... weighted by v[j] ... for i in range(m): # ... add v[j] * (column j) to the result result[i] += col_weight * A[i][j] return result

def transpose(A): """Flip rows and columns: (A^T)[j][i] = A[i][j].""" m, n = len(A), len(A[0]) return [[A[i][j] for i in range(m)] for j in range(n)] `` Notice the loop order inapply: the **outer** loop is over columns $j$, and for each we addv[j]times that whole column into the accumulator. That is the weighted-sum-of-columns formula made literal — exactly the geometry of this chapter, not a memorized rule. (Matrix-*matrix* multiplication,matmul`, arrives in Chapter 8 as composition and joins this same module.)

A quick verification you can run:

# Verify the from-scratch matrix.py against numpy.
import numpy as np
from toolkit.matrices import apply, transpose
A = [[2, 1], [0, 3]]
v = [4, 5]
print(apply(A, v), "vs", (np.array(A) @ np.array(v)).tolist())
print(transpose(A), "vs", np.array(A).T.tolist())
[13.0, 15.0] vs [13, 15]
[[2, 0], [1, 3]] vs [[2, 0], [1, 3]]

The from-scratch results match numpy (up to the integer-versus-float distinction, which is harmless). You now own a working matrix-vector multiplier built on understanding, not magic.

Computational Note — Real numpy does not loop in Python; A @ v dispatches to optimized, compiled BLAS routines that process the arithmetic in bulk and exploit your CPU's vector units. Our pure-Python apply is for understanding, not speed — it would be hundreds of times slower on large matrices. The lesson of the toolkit is always the same: implement once from scratch to know what the operation is, then use numpy in production to do it fast. Theory and computation, each in its place.

7.11 What did we just learn, and where does it go?

Step back and see what changed. You walked in (perhaps) thinking of a matrix as a grid of numbers with a multiplication rule. You walk out seeing a matrix as a linear transformation of space, with every entry present for a reason: the columns are the destinations of the basis vectors. That single reframing did an enormous amount of work in one chapter.

  • A matrix is a linear map (§7.2, §7.8), and "linear" means exactly "representable by a matrix" — we proved the biconditional both ways.
  • The columns are the images of the basis vectors (§7.3): to build a matrix, ask where do $\mathbf{e}_1$ and $\mathbf{e}_2$ go? and write the answers as columns.
  • Matrix times vector is the weighted sum of the columns (§7.4), derived from superposition — not a row-times-column rule, which we deliberately deferred to Chapter 8's composition story.
  • We built the identity, scaling, rotation, shear, reflection, and projection matrices (§7.5) the same way every time, and derived the rotation matrix $\begin{bmatrix}\cos\theta & -\sin\theta \\ \sin\theta & \cos\theta\end{bmatrix}$ from the unit circle.
  • We learned to read a matrix back into a transformation (§7.7) using its columns and the sign of its determinant, and met the transpose (§7.9) as the row/column flip.

The recurring themes are all here. Linear algebra is the study of linear transformations — matrices merely record them (theme 1). Geometry and algebra are two views of one object — the unit square's deformation and the column entries are the same fact (theme 2). And computation validates theory — every numeric result matched numpy, and your toolkit now contains a from-scratch apply and transpose (theme 3).

Three big questions open immediately, and they organize the rest of Part II. What if you do two transformations in a row? That's composition, and it forces the definition of matrix-matrix multiplication — Chapter 8, where the deferred row-times-column rule finally earns its place and we discover that order matters ($AB \neq BA$). Can you undo a transformation? Only if it loses no information — i.e. never flattens space, never has determinant zero — which is the inverse matrix, Chapter 9. And how much does a transformation stretch area? That single number is the determinant, Chapter 11, and you've already seen the visualizer whisper it in every figure title: $1$ for rotations and shears, $-1$ for the reflection, $0$ for the projection that flattened the square. Watch for it.

FAQ: If a matrix is "just" a transformation, why bother with all the numbers?

Because the numbers are how we compute. The geometric picture — "this rotates by 30°" — tells you what is happening, but to actually transform ten thousand points, or to compose two transformations, or to ask whether a transformation is reversible, you need the numerical handle the matrix provides. The genius of linear algebra is keeping both views alive at once: the picture for understanding and intuition, the numbers for calculation and proof. A great linear algebraist looks at $\begin{bmatrix}0 & -1 \\ 1 & 0\end{bmatrix}$ and sees a quarter-turn; looks at a quarter-turn and writes down $\begin{bmatrix}0 & -1 \\ 1 & 0\end{bmatrix}$. That fluency — moving freely between the geometry and the algebra — is the whole game, and you just played your first full round.