> Learning paths. Math majors — read everything, especially the subspace proof in §13.4 and the Math-Major Sidebar on kernel and image in §13.9; these abstract names ("kernel," "image") are the ones you will meet again in Chapter 35. CS / Data...
Prerequisites
- chapter-06-subspaces-span-independence
- chapter-04-gaussian-elimination
Learning Objectives
- Define the column space C(A) as the span of the columns and explain why Ax = b is solvable exactly when b lies in C(A).
- Define the null space N(A) as the set of all solutions of Ax = 0, and interpret it as the directions the transformation collapses to zero.
- Find a basis for the column space (the pivot columns of the original matrix) and a basis for the null space (the special solutions) directly from RREF.
- Prove that C(A) and N(A) are subspaces by checking closure under linear combination.
- Describe the complete solution set of Ax = b as one particular solution plus the entire null space, and connect this to the four-fundamental-subspaces picture.
- Implement null_space_basis(A) and column_space_basis(A) from scratch and verify their dimensions against numpy and scipy.
In This Chapter
- 13.1 What two questions does every matrix secretly answer?
- 13.2 What is the column space, and why is it everything Ax can reach?
- 13.3 What is the null space, and what does the matrix collapse?
- 13.4 Why are the column space and null space actually subspaces? (a proof)
- 13.5 How do you find a basis for the column space from RREF?
- 13.6 How do you find a basis for the null space — the special solutions?
- 13.7 A full worked example: both bases from one RREF
- 13.8 What does the null space have to do with solving Ax = b?
- 13.9 Why does this give us the four-fundamental-subspaces picture?
- 13.10 What does it cost to compute these bases, and how do we test it?
- 13.11 What should you carry forward from this chapter?
Column Space and Null Space: What Ax = b Can and Cannot Reach
Learning paths. Math majors — read everything, especially the subspace proof in §13.4 and the Math-Major Sidebar on kernel and image in §13.9; these abstract names ("kernel," "image") are the ones you will meet again in Chapter 35. CS / Data Science — focus on the Geometric Intuition callouts, the RREF recipes in §13.5–13.7, the
numpyverifications, and the two case studies; the reachability picture is the one that pays off in control, optimization, and ML. Physics / Engineering — focus on the geometry of "what a matrix reaches and what it destroys," the conservation-law reading of the null space (Case Study 1), and the reachable-states reading of the column space (Case Study 2). This chapter assumes the subspace/span/independence ideas of Chapter 6 and the row reduction of Chapter 4.
13.1 What two questions does every matrix secretly answer?
Take any matrix $A$ and feed it vectors. As an input $\mathbf{x}$ ranges over all of $\mathbb{R}^n$, the output $A\mathbf{x}$ traces out some region of the output space — and that region is almost never all of it. A matrix that flattens 3D space onto a tilted plane can only ever produce vectors on that plane; ask it for a vector pointing off the plane and it simply cannot deliver. So the first question any matrix forces on us is: which outputs can this transformation actually reach?
There is a second, quieter question hiding behind the first. When you flatten 3D space onto a plane, whole lines of input vectors get crushed down to a single output point — they collapse to nothing, indistinguishable after the transformation. Those crushed directions are exactly the information the matrix throws away. So the second question is: which inputs does this transformation destroy — send all the way to zero?
These two questions are not idle curiosities. They are, in disguise, the only two things you ever need to know about the central equation of the subject, $A\mathbf{x} = \mathbf{b}$. Can it be solved at all? — that is the reachability question. And if it can, how many solutions are there? — that turns out to be governed entirely by the destroyed directions. This chapter gives both questions exact answers, in the form of two subspaces attached to every matrix: the column space $C(A)$ (everything reachable) and the null space $N(A)$ (everything collapsed). They are the first two of the four fundamental subspaces that organize all of linear algebra, and by the end of the chapter you will read both of them straight off a matrix's row-reduced form.
Geometric Intuition — Think of a matrix $A$ as a machine that takes the whole input space $\mathbb{R}^n$ and folds it into the output space $\mathbb{R}^m$. Two things happen at once. The image of the fold — the set of places the machine can land — is the column space $C(A)$, a flat through the origin sitting inside $\mathbb{R}^m$. And the directions that get folded flat against zero — the inputs that come out as $\mathbf{0}$ — form the null space $N(A)$, a flat through the origin sitting inside $\mathbb{R}^n$. One lives in the output space and describes reach; the other lives in the input space and describes loss. Every matrix has both.
We met the raw materials for both ideas already. In Chapter 6 you learned that the span of a set of vectors is everything reachable by scaling and adding them, and that the solution set of a homogeneous system $A\mathbf{x} = \mathbf{0}$ is a subspace through the origin. The column space is just the span of the columns, given a name; the null space is just that homogeneous solution set, given a name. What is genuinely new here is the partnership between them, and the fact that both fall out of the single row reduction you already know how to do.
Why give these two old sets new names and a whole chapter? Because naming them turns a procedure into a structure. Up to now, "solve $A\mathbf{x} = \mathbf{b}$" has been a verb — something you do by row-reducing and reading off an answer. From here on it becomes a geometric situation you can reason about before computing anything: $\mathbf{b}$ either sits inside a certain flat or it does not, and the answer is either a single point or a whole flat of points, whose dimension you can predict from the shape of $A$. That shift — from grinding out answers to seeing the shape of the answer set in advance — is what separates someone who can use linear algebra from someone who has merely memorized an algorithm. The four fundamental subspaces are the vocabulary for that seeing, and column space and null space are the first two words.
There is also a practical reason the pair matters so much. In real applications you rarely solve $A\mathbf{x} = \mathbf{b}$ once with nice numbers; you solve it thousands of times with messy data, and the qualitative questions dominate. A control engineer wants to know which states of a spacecraft are reachable at all before tuning any thruster — a column-space question. A statistician wants to know whether two predictors are secretly redundant before trusting a coefficient — a null-space question. A graphics programmer wants to know whether a projection has thrown away depth information it can never recover — again the null space. None of these is answered by a single numerical solution; all of them are answered by the subspaces. This chapter is where linear algebra stops being a calculator and starts being a lens.
Let's begin, as always, with the picture — first the column space, then the null space, then how they meet.
13.2 What is the column space, and why is it everything Ax can reach?
Here is the most important re-reading of matrix–vector multiplication in the whole book, the one from Chapter 3 that we now make load-bearing. When you multiply $A\mathbf{x}$, you are not doing rows-times-columns bookkeeping; you are building a linear combination of the columns of $A$, with the entries of $\mathbf{x}$ as the weights. Write $A$ by its columns $A = [\,\mathbf{a}_1 \mid \mathbf{a}_2 \mid \dots \mid \mathbf{a}_n\,]$ and the product unfolds as $$A\mathbf{x} = x_1\mathbf{a}_1 + x_2\mathbf{a}_2 + \dots + x_n\mathbf{a}_n.$$ The vector $\mathbf{x}$ is a recipe: "take $x_1$ of the first column, $x_2$ of the second, and so on, and add." Every output $A\mathbf{x}$ is one such combination of the columns.
Now let $\mathbf{x}$ range over all of $\mathbb{R}^n$. The weights $x_1, \dots, x_n$ become every possible choice of scalars, so the set of all outputs $A\mathbf{x}$ is precisely the set of all linear combinations of the columns — which is exactly what Chapter 6 named the span. That gives the definition.
The column space of an $m \times n$ matrix $A$, written $C(A)$, is the span of its columns: $$C(A) = \operatorname{span}\{\mathbf{a}_1, \mathbf{a}_2, \dots, \mathbf{a}_n\} = \{\, A\mathbf{x} : \mathbf{x} \in \mathbb{R}^n \,\}.$$ It is a subspace of the output space $\mathbb{R}^m$ (because each column is a vector in $\mathbb{R}^m$).
The two descriptions in that definition are the same set seen two ways, and the equality between them is the chapter's first big idea. From the left, $C(A)$ is "the span of the columns" — a static geometric object you can picture. From the right, $C(A)$ is "everything $A$ can output" — the range of the transformation $\mathbf{x} \mapsto A\mathbf{x}$. Reading them as equal tells you something immediately useful about $A\mathbf{x} = \mathbf{b}$.
The Key Insight — $A\mathbf{x} = \mathbf{b}$ has a solution if and only if $\mathbf{b}$ lies in the column space $C(A)$. Why? Because "$\mathbf{x}$ solves $A\mathbf{x} = \mathbf{b}$" means "$\mathbf{b}$ is the combination of columns with weights $\mathbf{x}$," which means "$\mathbf{b}$ is reachable as a combination of the columns," which is the literal definition of $\mathbf{b} \in C(A)$. Solvability is not a computation — it is a membership question about a subspace.
Notice how this reframes everything Chapter 4 taught you. Back there, "consistent" was a verdict you got after row-reducing $[A \mid \mathbf{b}]$ and seeing whether a contradiction appeared. Now you have the geometric reason behind the verdict: the system is consistent exactly when the target $\mathbf{b}$ sits inside the flat that the columns span. Row reduction is how you check membership; the column space is what you are checking membership in.
This is the column picture of $A\mathbf{x} = \mathbf{b}$ that you first met in Chapter 3, now promoted from a viewpoint to a named subspace. Recall the two pictures from that chapter. The row picture reads each equation as a hyperplane and asks where the hyperplanes intersect — a fine picture for two or three equations but hard to see in high dimensions. The column picture asks instead: can I mix the column vectors, in some amounts $\mathbf{x}$, to land exactly on $\mathbf{b}$? The column picture is the one that scales, the one that generalizes, and the one this chapter formalizes: the set of all landing spots is $C(A)$, and the question "is $\mathbf{b}$ reachable?" is the question "is $\mathbf{b}$ in $C(A)$?" Everything you found awkward about visualizing systems in high dimensions dissolves once you stop tracking intersecting hyperplanes and start asking whether one target lies in the span of a handful of column vectors.
A worked membership check makes the column picture concrete. Suppose $$A = \begin{bmatrix} 1 & 1 \\ 0 & 1 \\ 1 & 0 \end{bmatrix},$$ a $3 \times 2$ matrix. Its two columns $(1,0,1)$ and $(1,1,0)$ are not parallel, so $C(A)$ is a plane through the origin in $\mathbb{R}^3$ — and since a plane is only two-dimensional, most targets in $\mathbb{R}^3$ are off it and hence unreachable. Is $\mathbf{b} = (2, 1, 1)$ reachable? Ask whether $x_1(1,0,1) + x_2(1,1,0) = (2,1,1)$ has a solution: the middle entry forces $x_2 = 1$, the bottom forces $x_1 = 1$, and the top then needs $x_1 + x_2 = 2$ — which $1 + 1 = 2$ satisfies. So $\mathbf{b} = (2,1,1) \in C(A)$, reachable with recipe $\mathbf{x} = (1,1)$. Now try $\mathbf{b}' = (1, 1, 1)$: the middle forces $x_2 = 1$, the bottom forces $x_1 = 1$, but the top then demands $x_1 + x_2 = 1$, i.e. $2 = 1$ — impossible. So $(1,1,1)$ lies off the plane and $A\mathbf{x} = (1,1,1)$ has no solution. The two columns reach a whole plane's worth of targets but never $(1,1,1)$; the column picture lets you see exactly which targets are in and which are out.
13.2.1 Why is the column space a line, a plane, or all of space?
Because it is a span, the column space is always one of the flat-through-the-origin shapes you cataloged in Chapter 6. Its shape depends on how many of the columns point in genuinely independent directions. Consider the small matrix $$A = \begin{bmatrix} 1 & 2 & 3 \\ 2 & 4 & 6 \end{bmatrix}.$$ All three columns — $(1,2)$, $(2,4)$, $(3,6)$ — are multiples of $(1,2)$. Their span is a single line through the origin in $\mathbb{R}^2$. So no matter what $\mathbf{x}$ you choose, $A\mathbf{x}$ always lands on that one line; a target like $\mathbf{b} = (1, 0)$, which is off the line, is unreachable, and $A\mathbf{x} = (1,0)$ has no solution. The matrix has three columns but its outputs are one-dimensional.
# The column space of a rank-1 matrix is a line through the origin in R^2.
import numpy as np
A = np.array([[1, 2, 3],
[2, 4, 6]], dtype=float)
print("rank(A) =", np.linalg.matrix_rank(A)) # 1 -> C(A) is a line
# Every output A x is a multiple of (1,2): check a few random x.
for _ in range(3):
x = np.random.randn(3)
out = A @ x
print("A x =", np.round(out, 3), " ratio out[1]/out[0] =", round(out[1] / out[0], 3))
# all ratios are 2.0 -> every output lies on the line y = 2x
The ratios all print as 2.0: every output obeys $y = 2x$, confirming the column space is the line $\{c\,(1,2)\}$. Contrast this with a $3 \times 3$ matrix whose three columns point in three independent directions — say one with $\det \neq 0$, like the matrix with columns $(2,1,0), (0,1,1), (1,0,3)$, which has determinant $7$ (Chapter 11). Its three columns span all of $\mathbb{R}^3$, so $C(A) = \mathbb{R}^3$ and every $\mathbf{b}$ is reachable — the system $A\mathbf{x} = \mathbf{b}$ is solvable for all $\mathbf{b}$. That is exactly the invertible case from Chapter 9, now restated in subspace language: $A$ (square) is invertible if and only if its column space is the whole output space.
Common Pitfall — "The column space lives in $\mathbb{R}^n$ because there are $n$ columns." No. For an $m \times n$ matrix, each column is a vector with $m$ entries, so the columns — and their span — live in the output space $\mathbb{R}^m$, not the input space $\mathbb{R}^n$. The $3 \times 4$ matrix later in this chapter has four columns but its column space sits in $\mathbb{R}^3$. Keep the spaces straight: $C(A) \subseteq \mathbb{R}^m$ (outputs), and as we are about to see, $N(A) \subseteq \mathbb{R}^n$ (inputs). Mixing them up is the single most common error in this material.
Check Your Understanding — A matrix $A$ is $5 \times 3$. In which space does its column space $C(A)$ live, and what is the largest its dimension could possibly be?
Answer
$C(A)$ lives in $\mathbb{R}^5$ (the output space — each column has 5 entries). Its dimension is the number of independent columns, which can be at most 3, since there are only 3 columns to span with. So $C(A)$ is at most a 3-dimensional flat sitting inside $\mathbb{R}^5$ — it can never fill $\mathbb{R}^5$. With more rows than columns ($m > n$), a "tall" matrix can never be onto: most right-hand sides $\mathbf{b}$ are unreachable, which is precisely why such systems usually have no exact solution and we resort to least squares in Chapter 17.
13.2.2 How do you actually test whether b is in the column space?
The Key Insight gives a beautiful criterion — $A\mathbf{x} = \mathbf{b}$ is solvable iff $\mathbf{b} \in C(A)$ — but "is $\mathbf{b}$ in the span of these columns?" is itself a system to solve. Happily it is one you already know how to attack from Chapter 4: row-reduce the augmented matrix $[A \mid \mathbf{b}]$. If the reduction ever produces a row of the form $[\,0 \ 0 \ \cdots \ 0 \mid c\,]$ with $c \neq 0$ — a row asserting "$0 = c$," a flat contradiction — then $\mathbf{b}$ is not in $C(A)$ and the system is inconsistent. If no such row appears, $\mathbf{b}$ is reachable. The contradiction row is the algebraic fingerprint of a $\mathbf{b}$ that lies off the column-space flat.
Let's watch it fail on purpose. Reuse the rank-1 matrix from above and aim at a target off its line: $$A = \begin{bmatrix} 1 & 2 & 3 \\ 2 & 4 & 6 \end{bmatrix}, \qquad \mathbf{b} = \begin{bmatrix} 1 \\ 0 \end{bmatrix}.$$ Row-reducing $[A \mid \mathbf{b}]$, the second row becomes $(\text{row 2}) - 2(\text{row 1})$, which zeros out the left side but leaves $0 - 2\cdot 1 = -2$ on the right: the row $[\,0 \ 0 \ 0 \mid -2\,]$, i.e. "$0 = -2$." Contradiction. The target $\mathbf{b} = (1, 0)$ is off the line $\{c\,(1,2)\}$ — its components are not in the ratio $1 : 2$ — so no combination of the columns can ever produce it. Compare $\mathbf{b}' = (3, 6)$, which is on the line: there the same reduction leaves $0 = 0$, and the system is consistent (indeed it has infinitely many solutions, since the columns are dependent).
# Testing column-space membership by reducing the augmented matrix [A | b].
import numpy as np
A = np.array([[1, 2, 3], [2, 4, 6]], dtype=float)
for b in (np.array([1., 0.]), np.array([3., 6.])):
aug = np.column_stack([A, b])
rank_A = np.linalg.matrix_rank(A)
rank_aug = np.linalg.matrix_rank(aug)
print(f"b = {b}: rank(A) = {rank_A}, rank[A|b] = {rank_aug} ->",
"solvable" if rank_A == rank_aug else "NO solution")
# b=(1,0): rank 1 vs 2 -> NO solution | b=(3,6): rank 1 vs 1 -> solvable
There is the clean numerical test hiding behind the contradiction row: $\mathbf{b} \in C(A)$ if and only if $\operatorname{rank}([A \mid \mathbf{b}]) = \operatorname{rank}(A)$ — that is, appending $\mathbf{b}$ adds no new independent direction. If $\mathbf{b}$ were outside $C(A)$, it would contribute a fresh direction and bump the rank by one (exactly what happens for $(1,0)$: rank jumps $1 \to 2$). This rank test is the Rouché–Capelli criterion for consistency [verify], and it is just our subspace insight wearing arithmetic clothing: adding a vector to a spanning set raises the dimension precisely when that vector was not already reachable.
13.3 What is the null space, and what does the matrix collapse?
Now the second subspace, the one that explains how many solutions a system has. Some inputs $\mathbf{x}$ are sent by $A$ all the way to the zero vector: $A\mathbf{x} = \mathbf{0}$. Collect all of them.
The null space of an $m \times n$ matrix $A$, written $N(A)$, is the set of all solutions of the homogeneous equation: $$N(A) = \{\, \mathbf{x} \in \mathbb{R}^n : A\mathbf{x} = \mathbf{0} \,\}.$$ It is a subspace of the input space $\mathbb{R}^n$. (Also called the kernel of the transformation, written $\ker A$ — a name we will lean on in Chapter 35.)
So the natural question — what is the null space of a matrix? — has a one-line answer with two readings. Algebraically, it is the complete solution set of $A\mathbf{x} = \mathbf{0}$. Geometrically, it is the set of all input directions that the transformation crushes to nothing. If $A$ flattens 3D space onto a plane, the null space is the line of vectors perpendicular to that plane's preimage — the directions that get squashed flat. The bigger the null space, the more information $A$ destroys.
Geometric Intuition — Picture the projection that drops every point in 3D straight down onto the floor (the $xy$-plane): $(x, y, z) \mapsto (x, y, 0)$. Two points directly above each other, $(1, 2, 5)$ and $(1, 2, -3)$, land on the same shadow $(1, 2, 0)$ — the transformation cannot tell them apart anymore. What separated them was their $z$-coordinate, and that is exactly what got destroyed. The null space here is the entire $z$-axis: every vector $(0, 0, t)$ projects to $\mathbf{0}$. The null space is the set of differences the matrix can no longer see.
Every null space contains at least the zero vector, since $A\mathbf{0} = \mathbf{0}$ always. The real question is whether it contains anything else. If $N(A) = \{\mathbf{0}\}$ — only the trivial solution — then $A$ destroys nothing; distinct inputs always give distinct outputs, and the transformation is one-to-one. If $N(A)$ is bigger — a line, a plane — then $A$ collapses those directions, distinct inputs can collide, and the transformation is not one-to-one. This connects straight back to Chapter 6: $N(A) = \{\mathbf{0}\}$ if and only if the columns of $A$ are linearly independent, because a nonzero $\mathbf{x}$ with $A\mathbf{x} = \mathbf{0}$ is precisely a nontrivial dependence relation among the columns.
The Key Insight — The null space measures how far from one-to-one the matrix is. $N(A) = \{\mathbf{0}\}$ means no information is lost (independent columns, injective map); a nontrivial null space means the matrix glues together whole families of inputs. The dimension of $N(A)$ is literally the number of independent directions the transformation destroys — a quantity Chapter 14 will name the nullity and tie to the rank.
Here is why the null space controls the count of solutions, the payoff promised in §13.1. Suppose $\mathbf{x}_p$ is one solution of $A\mathbf{x} = \mathbf{b}$ (a "particular" solution), and let $\mathbf{x}_n$ be any vector in the null space, so $A\mathbf{x}_n = \mathbf{0}$. Then $$A(\mathbf{x}_p + \mathbf{x}_n) = A\mathbf{x}_p + A\mathbf{x}_n = \mathbf{b} + \mathbf{0} = \mathbf{b},$$ so $\mathbf{x}_p + \mathbf{x}_n$ is also a solution. Add any null-space vector to a solution and you get another solution. We will make this exact in §13.8, but the moral is already clear: the null space is the set of moves that take you from one solution to another without changing the output. If the null space is just $\{\mathbf{0}\}$, there are no such moves and the solution (when it exists) is unique; if the null space is a line, solutions come in a one-parameter family; if it is a plane, a two-parameter family. The shape of the null space is the shape of the solution set.
13.3.1 Seeing both subspaces at once with the 2D visualizer
The cleanest place to see a column space and a null space together is in two dimensions, where the recurring visualizer from Chapter 1 does the work. Recall that the visualizer draws the unit square (the input) and its image under a $2 \times 2$ matrix $A$ (the output), along with the images of the basis vectors. When $A$ is invertible, the square maps to a tilted parallelogram of nonzero area — full column space ($\mathbb{R}^2$), trivial null space. But watch what happens to a singular matrix, one whose two columns are parallel. Take $$A = \begin{bmatrix} 2 & 1 \\ 4 & 2 \end{bmatrix},$$ whose second column $(1,2)$ is exactly half the first column $(2,4)$. Both columns point along the same line $\{c\,(1,2)\}$, so $C(A)$ is that line, and the unit square gets flattened — the whole 2D square collapses onto a one-dimensional segment of zero area. The collapse is the geometric signature of a nontrivial null space.
# Using the visualizer from Chapter 1: a singular matrix flattens the square.
import numpy as np
from toolkit.visualizer import visualize_2d # the FROZEN 2D visualizer (Ch.1)
A = np.array([[2, 1],
[4, 2]], dtype=float)
print("det(A) =", round(float(np.linalg.det(A)), 6)) # 0.0 -> singular
print("rank(A) =", np.linalg.matrix_rank(A)) # 1 -> C(A) is a line
ax = visualize_2d(A, title="Singular: square collapses to a line")
# plt.show()
Figure 13.2 — A singular matrix collapses the unit square. The blue dashed unit square is squashed by $A$ onto the orange line through the origin in the direction $(1, 2)$ — the column space $C(A)$. Its determinant is $0$ (the area-scaling factor of Chapter 11 is zero), and its rank is $1$. Alt-text: a unit square mapped by a 2-by-2 matrix onto a single line segment through the origin, with both transformed basis vectors lying along that line.
Where did the missing dimension go? Into the null space. The matrix sends the direction $(1, -2)$ to zero: $A\,(1, -2) = (2\cdot 1 + 1\cdot(-2),\, 4\cdot 1 + 2\cdot(-2)) = (0, 0)$. So $N(A)$ is the line $\{c\,(1,-2)\}$ — a one-dimensional flat in the input space. Every input gets projected, in effect, along the null-space direction onto the column-space line. The visualizer makes the abstract definitions tactile: the column space is where the square lands; the null space is the direction along which it was crushed. Run it again with an invertible matrix and the square keeps its area — no collapse, no null space.
The Key Insight — For a $2 \times 2$ matrix, $\det(A) = 0$ (Chapter 11), $\operatorname{rank}(A) < 2$, "the columns are parallel," "the unit square collapses to a line," and "the null space is nontrivial" are five names for one phenomenon. The determinant measures area scaling; when it is zero, area is destroyed, which means a whole direction collapsed — and that collapsed direction is the null space. Geometry (collapse) and algebra ($\det = 0$) are, as ever, two views of one object.
13.4 Why are the column space and null space actually subspaces? (a proof)
We have called both objects "subspaces" and drawn them as flats through the origin. That is a claim, not a definition, and it deserves a proof — because the entire toolkit of Part III (bases, dimension, rank–nullity) only applies to genuine subspaces. This is the chapter's central theorem, and it is short.
1. Why we care. If $C(A)$ and $N(A)$ are subspaces, then everything Chapter 6 built — spans, bases, dimension — applies to them, and we are licensed to talk about "a basis for the column space" or "the dimension of the null space." If they were merely arbitrary sets, none of that machinery would attach. The subspace property is the foundation the rest of the part stands on.
2. Key idea. A subspace is exactly a nonempty set closed under linear combinations (Chapter 6). Both proofs are the same one-line argument applied twice, and both come straight from the linearity of $A$: a linear map sends combinations to combinations.
3. Proof.
Claim A: $C(A)$ is a subspace of $\mathbb{R}^m$. Recall $C(A) = \{A\mathbf{x} : \mathbf{x} \in \mathbb{R}^n\}$. It contains the zero vector, since $\mathbf{0} = A\mathbf{0}$. Take any two elements of $C(A)$, say $\mathbf{u} = A\mathbf{x}_1$ and $\mathbf{v} = A\mathbf{x}_2$, and any scalars $c, d$. Then $$c\mathbf{u} + d\mathbf{v} = c\,(A\mathbf{x}_1) + d\,(A\mathbf{x}_2) = A(c\mathbf{x}_1 + d\mathbf{x}_2),$$ using the linearity of matrix multiplication (Chapter 7). The right-hand side is $A$ applied to the vector $c\mathbf{x}_1 + d\mathbf{x}_2 \in \mathbb{R}^n$, so it is again of the form $A(\text{something})$ — hence $c\mathbf{u} + d\mathbf{v} \in C(A)$. The set is closed under linear combinations and contains $\mathbf{0}$, so it is a subspace. (Of course it is also a span, and we proved in Chapter 6 that every span is a subspace; this is the same fact, re-derived directly.) $\quad\blacksquare$
Claim B: $N(A)$ is a subspace of $\mathbb{R}^n$. Recall $N(A) = \{\mathbf{x} : A\mathbf{x} = \mathbf{0}\}$. It contains $\mathbf{0}$, since $A\mathbf{0} = \mathbf{0}$. Take any $\mathbf{x}_1, \mathbf{x}_2 \in N(A)$ — so $A\mathbf{x}_1 = \mathbf{0}$ and $A\mathbf{x}_2 = \mathbf{0}$ — and any scalars $c, d$. Then $$A(c\mathbf{x}_1 + d\mathbf{x}_2) = c\,(A\mathbf{x}_1) + d\,(A\mathbf{x}_2) = c\,\mathbf{0} + d\,\mathbf{0} = \mathbf{0},$$ so $c\mathbf{x}_1 + d\mathbf{x}_2$ is also killed by $A$ and therefore lies in $N(A)$. Closed under linear combinations, contains $\mathbf{0}$ — a subspace. $\quad\blacksquare$
4. What this means. Both proofs ran on the single engine $A(c\mathbf{x}_1 + d\mathbf{x}_2) = cA\mathbf{x}_1 + dA\mathbf{x}_2$ — the linearity that defines a matrix as a linear transformation (Theme 1 of the book). That is why the structure is so clean: because $A$ is linear, the things it reaches and the things it destroys are automatically flat subspaces, never lopsided curves. Geometrically, this is the formal version of the picture in §13.1: reach and loss are both flats through the origin. Note the spaces, one more time: $C(A) \subseteq \mathbb{R}^m$ and $N(A) \subseteq \mathbb{R}^n$. They generally live in different spaces and have different dimensions — and the precise relationship between those dimensions is the rank–nullity theorem of Chapter 14.
13.5 How do you find a basis for the column space from RREF?
Knowing $C(A)$ is a subspace, we want a basis for it — a minimal, independent set of columns whose span is all of $C(A)$. The columns themselves span $C(A)$ by definition, but they may be redundant (the rank-1 example had three columns spanning a one-dimensional line — two were wasted). We need to throw out the redundant ones and keep an independent spanning set. Row reduction tells us exactly which to keep.
Here is the recipe, and then the crucial caveat. Row-reduce $A$ to its RREF $R$ (Chapter 4). The columns of $R$ that contain pivots are the pivot columns; the others are free columns. The rule is:
A basis for $C(A)$ is the set of pivot columns — taken from the original matrix $A$, not from $R$.
The dimension of $C(A)$ is therefore the number of pivots, which is the rank of $A$. Why does this work? Row reduction is a sequence of row operations, and row operations change the column space (they mix the entries of each column), so the columns of $R$ are generally not in $C(A)$ at all. But row operations preserve the dependence relations among the columns: a set of columns of $A$ is independent exactly when the corresponding columns of $R$ are independent, because both express the same null-space relations (this is the Chapter 6 fact that $A$ and $R$ have the same null space). In $R$, the pivot columns are visibly independent (each has a leading 1 in a fresh row) and the free columns are visibly combinations of earlier pivot columns. So the pivot positions identify which original columns to keep — but the vectors you keep must be the original ones, because those are the ones actually living in $C(A)$.
Warning
— This is the trap that catches everyone the first time: the basis for $C(A)$ uses the pivot columns of the original $A$, never the columns of the reduced $R$. Row reduction destroys the column space (it preserves the row space instead — that is Chapter 14). Use $R$ only to locate the pivot positions; then go back to $A$ and read off those columns. If you hand back columns of $R$, you have computed a basis for the wrong subspace. A clean memory hook: rows are for finding, columns are from the original.
Common Pitfall — "Just pick any $r$ columns of $A$, where $r$ is the rank." Not quite — you must pick the pivot columns specifically, because those are guaranteed independent. A random set of $r$ columns can be dependent (imagine choosing the two parallel columns of a rank-1 matrix). The pivot columns are the canonical independent choice; row reduction is precisely the tool that finds them.
13.6 How do you find a basis for the null space — the special solutions?
The null space recipe is just as mechanical, and it is the more beautiful of the two because the basis vectors fall out of the free variables. To find $N(A)$, solve $A\mathbf{x} = \mathbf{0}$ by row-reducing $A$ to $R$ (the right-hand side stays $\mathbf{0}$ through every operation, so we needn't carry it). In $R$, the variables split into two kinds: pivot variables (the columns with pivots) and free variables (the columns without). The free variables are exactly that — free — you may set them to anything, and each choice forces the pivot variables.
The trick that produces a basis is to switch the free variables on one at a time. For each free variable, set it to $1$ and all other free variables to $0$, then solve for the pivot variables from the rows of $R$. The resulting vector is called a special solution. There is one special solution per free variable, and:
The special solutions form a basis for the null space $N(A)$. The number of special solutions equals the number of free variables, which is therefore $\dim N(A)$ (the nullity).
Why are they a basis? They span $N(A)$ because any null-space vector is recovered by taking that vector's free-variable values as the weights and adding up the corresponding special solutions (the linearity of the construction guarantees this). And they are independent because each special solution has a $1$ in a free-variable slot where every other special solution has a $0$ — no special solution can be built from the others, since none of the others can supply that $1$. Spanning plus independent is the definition of a basis (Chapter 6).
Geometric Intuition — Each free variable is a degree of freedom — a direction you can slide along in the input space without changing the output. The special solutions are the "unit slides" along each of these directions. If there are two free variables, the null space is a plane, and the two special solutions are two independent vectors spanning it; you reach any point of the plane by sliding $c_1$ along the first and $c_2$ along the second. Free variables count the dimensions of the collapse.
Check Your Understanding — A $4 \times 7$ matrix row-reduces to an RREF with pivots in columns 1, 2, and 5. How many free variables are there, and what is $\dim N(A)$?
Answer
There are $7$ columns and $3$ pivots, so $7 - 3 = 4$ free variables (columns 3, 4, 6, 7). The null space therefore has dimension $4$: it is a 4-dimensional flat through the origin inside $\mathbb{R}^7$, with one special solution per free variable. Notice the bookkeeping that Chapter 14 will turn into a theorem: $\operatorname{rank} = 3$ pivots, $\dim N(A) = 4$ free, and $3 + 4 = 7 = $ number of columns. Rank plus nullity equals the number of columns — every time.
13.6.1 The null space reads off the dependence relations among the columns
There is a second way to read a null-space vector that ties this chapter tightly back to the linear dependence of Chapter 6, and it is worth pausing on because it explains why the two basis recipes are really one idea. Remember that $A\mathbf{x}$ is a linear combination of the columns. So a vector $\mathbf{x} \in N(A)$ — a solution of $A\mathbf{x} = \mathbf{0}$ — is literally a recipe for combining the columns to get the zero vector: $$x_1\mathbf{a}_1 + x_2\mathbf{a}_2 + \dots + x_n\mathbf{a}_n = \mathbf{0}.$$ A nonzero such $\mathbf{x}$ is exactly a nontrivial dependence relation among the columns — the Chapter 6 definition of linear dependence. So the null space does not merely "describe solutions"; it catalogs every way the columns lean on each other. The null space is trivial precisely when the columns are independent, and each special solution spells out one dependency in full.
Read this off the worked matrix of §13.7. Its first special solution was $\mathbf{s}_1 = (-3, 1, 0, 0)$, which says $$-3\,\mathbf{a}_1 + 1\,\mathbf{a}_2 + 0\,\mathbf{a}_3 + 0\,\mathbf{a}_4 = \mathbf{0} \quad\Longrightarrow\quad \mathbf{a}_2 = 3\,\mathbf{a}_1.$$ And indeed column 2 of $A$ is $(3, 6, -3) = 3\cdot(1, 2, -1) = 3\,\mathbf{a}_1$ — the free column 2 is exactly three times the pivot column 1. The special solution told us which columns are redundant and by how much. This is the deep reason the column-space basis is the pivot columns: the free columns are precisely the ones each special solution exhibits as combinations of earlier pivot columns, so they add nothing new to the span. The null space (free columns) and the column-space basis (pivot columns) are two sides of the same RREF coin.
# Each special solution is a dependence relation among the columns.
import numpy as np
A = np.array([[1, 3, 3, 2], [2, 6, 9, 7], [-1, -3, 3, 4]], dtype=float)
s1 = np.array([-3, 1, 0, 0], dtype=float)
combo = sum(s1[j] * A[:, j] for j in range(4)) # -3 a1 + 1 a2 + 0 a3 + 0 a4
print("dependence combo =", combo) # [0. 0. 0.]
print("is a2 == 3*a1 ? ", np.allclose(A[:, 1], 3 * A[:, 0])) # True
The combination prints [0. 0. 0.] and the check confirms $\mathbf{a}_2 = 3\mathbf{a}_1$: the special solution is a genuine recipe for canceling the columns to zero. When you compute a null space, you are simultaneously discovering exactly how the matrix's columns are entangled — which is why a fat matrix (more columns than rows) always has a nontrivial null space: with more columns than the dimension of the output space, the columns cannot possibly be independent, so some combination must reach zero.
Common Pitfall — "A nonzero null space means the matrix is wrong or the data is bad." Not at all. A nontrivial null space is a precise, useful fact about the transformation — it names the redundancy. In a regression it flags collinear features (Chapter 6); in a network it encodes a conservation law (Case Study 1); in control it marks states you cannot influence (Case Study 2). The null space is information, not error. The mistake is to ignore it and let a solver hand you one arbitrary solution out of an infinite family as though it were the answer.
13.6.2 What does the shape of A tell you about its two subspaces?
Before computing anything, the shape of an $m \times n$ matrix already constrains both subspaces — and learning to read shape is one of the fastest ways to anticipate what a system will do. Write $r = \operatorname{rank}(A)$ for the number of pivots. Three regimes are worth committing to memory, because each corresponds to a story you will meet repeatedly.
Wide matrices ($n > m$, more columns than rows). A wide matrix has more unknowns than equations. Since the rank can be at most the number of rows, $r \le m < n$, there are at least $n - m \ge 1$ free variables — so a wide matrix always has a nontrivial null space. Its columns cannot be independent (too many vectors for the output space to hold). Consequently, when $A\mathbf{x} = \mathbf{b}$ is solvable at all, it has infinitely many solutions. This is the underdetermined regime: fewer constraints than freedoms, so freedoms are left over. It is the natural habitat of dimensionality reduction, where you deliberately use a wide matrix to compress many coordinates into few.
Tall matrices ($m > n$, more rows than columns). A tall matrix has more equations than unknowns. Its column space lives in $\mathbb{R}^m$ but is spanned by only $n$ columns, so $\dim C(A) = r \le n < m$ — the column space is a proper flat inside $\mathbb{R}^m$, and most right-hand sides are unreachable. The system is overdetermined: usually no exact solution exists, because $\mathbf{b}$ generically misses the low-dimensional column-space flat. This is exactly the situation of fitting a line to many noisy data points (Chapter 17), and it is why least squares — finding the closest reachable point — was invented. A tall matrix can still have a trivial null space (independent columns), giving a unique solution when one exists, but existence is the hard part here.
Square matrices ($m = n$). Now the two questions become one. A square $A$ is invertible exactly when $r = n$, and then $C(A) = \mathbb{R}^n$ (every $\mathbf{b}$ reachable) and $N(A) = \{\mathbf{0}\}$ (every solution unique) — the perfect case of Chapter 9, where $A\mathbf{x} = \mathbf{b}$ has exactly one solution for every $\mathbf{b}$. If instead $r < n$, the square matrix is singular, and the two defects arrive together: the column space drops below full ($\det = 0$, some $\mathbf{b}$ unreachable) and the null space becomes nontrivial (when solutions exist, they are infinite). For a square matrix, "fails to reach everything" and "destroys some directions" are the same failure — you cannot have one without the other.
Geometric Intuition — Think of $r = \operatorname{rank}(A)$ as the true dimensionality of the transformation — how many independent directions actually survive the trip from input to output. Both subspaces are pinned to this one number: $\dim C(A) = r$ (the surviving output dimensions) and $\dim N(A) = n - r$ (the collapsed input dimensions). Whatever the input space gives ($n$ dimensions), the transformation splits into "$r$ that come through" and "$n - r$ that get crushed." That split is the entire content of the rank–nullity theorem, and you can predict both subspace dimensions the moment you know the rank and the number of columns. Chapter 14 proves the split is exact; you can already use it.
Check Your Understanding — Without computing anything, what can you say about the solutions of $A\mathbf{x} = \mathbf{b}$ if $A$ is $3 \times 5$?
Answer
$A$ is wide ($n = 5 > m = 3$). Its rank is at most $3$, so there are at least $5 - 3 = 2$ free variables and the null space has dimension at least $2$ — always nontrivial. Therefore $A\mathbf{x} = \mathbf{b}$ can never have a unique solution: it has either no solution (if $\mathbf{b} \notin C(A)$) or infinitely many (a flat of dimension at least 2). You know this before seeing a single entry of $A$ — shape alone rules out uniqueness.
13.7 A full worked example: both bases from one RREF
Let's do the whole thing by hand on a single matrix and then confirm every number with numpy. Take the $3 \times 4$ matrix
$$A = \begin{bmatrix} 1 & 3 & 3 & 2 \\ 2 & 6 & 9 & 7 \\ -1 & -3 & 3 & 4 \end{bmatrix}.$$
It has four columns living in $\mathbb{R}^3$, so $C(A) \subseteq \mathbb{R}^3$ and $N(A) \subseteq \mathbb{R}^4$ — already a reminder that the two subspaces live in different spaces. Our goal: a basis for each.
Step 1 — Row-reduce to RREF. Eliminate below the first pivot (subtract $2\times$ row 1 from row 2, add row 1 to row 3), then clear the third column, and tidy up. The reduced form is $$R = \begin{bmatrix} 1 & 3 & 0 & -1 \\ 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 0 \end{bmatrix}.$$ Read off the structure: pivots sit in columns 1 and 3; columns 2 and 4 are free. So $\operatorname{rank}(A) = 2$, there are $2$ free variables, and the bottom row of zeros tells us the three rows were not all independent.
Step 2 — Basis for the column space. Take the pivot columns of the original $A$ — columns 1 and 3: $$C(A) = \operatorname{span}\left\{ \begin{bmatrix} 1 \\ 2 \\ -1 \end{bmatrix},\ \begin{bmatrix} 3 \\ 9 \\ 3 \end{bmatrix} \right\}, \qquad \dim C(A) = 2.$$ This is a plane through the origin in $\mathbb{R}^3$ — two independent directions spanning a 2D flat. A right-hand side $\mathbf{b}$ is reachable exactly when it lies on this plane. (Do not use columns 1 and 3 of $R$, which are $(1,0,0)$ and $(0,1,0)$ — those span a different plane and are not even in $C(A)$.)
Step 3 — Basis for the null space (special solutions). From $R$, the equations $A\mathbf{x} = \mathbf{0}$ read $$x_1 + 3x_2 \phantom{{}+ x_3} - x_4 = 0 \quad\Rightarrow\quad x_1 = -3x_2 + x_4,$$ $$x_3 + x_4 = 0 \quad\Rightarrow\quad x_3 = -x_4,$$ with $x_2$ and $x_4$ free. Switch them on one at a time:
- Free set $x_2 = 1, x_4 = 0$: then $x_1 = -3$, $x_3 = 0$, giving the special solution $\mathbf{s}_1 = (-3, 1, 0, 0)$.
- Free set $x_2 = 0, x_4 = 1$: then $x_1 = 1$, $x_3 = -1$, giving the special solution $\mathbf{s}_2 = (1, 0, -1, 1)$.
So $$N(A) = \operatorname{span}\left\{ \begin{bmatrix} -3 \\ 1 \\ 0 \\ 0 \end{bmatrix},\ \begin{bmatrix} 1 \\ 0 \\ -1 \\ 1 \end{bmatrix} \right\}, \qquad \dim N(A) = 2.$$ This is a plane through the origin in $\mathbb{R}^4$. Note the running tally: $\dim C(A) + \dim N(A) = 2 + 2 = 4 = $ number of columns. (That is rank–nullity, foreshadowed; Chapter 14 proves it.)
Step 4 — numpy verification. Now confirm by machine. We compute the rank and the null-space dimension, and we check the defining property directly: both special solutions should satisfy $A\mathbf{s} = \mathbf{0}$.
# Verify the column-space and null-space bases of the worked example.
import numpy as np
from scipy.linalg import null_space
A = np.array([[ 1, 3, 3, 2],
[ 2, 6, 9, 7],
[-1, -3, 3, 4]], dtype=float)
# Column space: dimension = rank; basis = original columns 0 and 2 (math cols 1,3).
print("rank(A) =", np.linalg.matrix_rank(A)) # 2 -> dim C(A) = 2
print("col-space basis (cols 0,2):\n", A[:, [0, 2]]) # [[1,3],[2,9],[-1,3]]
# Null space: our hand-found special solutions must map to 0.
s1 = np.array([-3, 1, 0, 0], dtype=float)
s2 = np.array([ 1, 0, -1, 1], dtype=float)
print("A s1 =", A @ s1, " A s2 =", A @ s2) # both [0. 0. 0.]
print("dim N(A) =", null_space(A).shape[1]) # 2 -> matches free-var count
Running it prints rank(A) = 2, the column-space basis [[1. 3.] [2. 9.] [-1. 3.]], then A s1 = [0. 0. 0.] A s2 = [0. 0. 0.], and dim N(A) = 2. Every hand result matches: rank 2, two basis columns, two special solutions that genuinely land on zero, and a 2-dimensional null space. The matrix indexes from 1 in our hand work ($x_1, \dots, x_4$ and "columns 1 and 3"); numpy indexes from 0, so the same columns are A[:, [0, 2]] — the off-by-one we flagged in Chapter 4, biting again exactly where you'd expect.
Computational Note —
scipy.linalg.null_space(A)returns an orthonormal basis for $N(A)$ (its columns are unit length and mutually perpendicular), found via the SVD of Chapter 30 — so its columns will look nothing like our tidy integer special solutions $\mathbf{s}_1, \mathbf{s}_2$. That is fine: a subspace has infinitely many bases, and any two bases of the same space have the same number of vectors. We compare the dimension (.shape[1]), not the individual vectors. If you want to confirm the spaces are identical, check that scipy's basis vectors are themselves killed by $A$ (np.allclose(A @ ns, 0)) and that both bases have size 2.
13.8 What does the null space have to do with solving Ax = b?
We can now answer the second question from §13.1 completely: when $A\mathbf{x} = \mathbf{b}$ is solvable, exactly how many solutions are there, and what do they look like? The null space is the whole answer, through one of the most useful structural theorems in the subject.
The complete solution. If $A\mathbf{x} = \mathbf{b}$ has at least one solution $\mathbf{x}_p$ (a particular solution), then its complete solution set is $$\{\, \mathbf{x}_p + \mathbf{x}_n : \mathbf{x}_n \in N(A) \,\} = \mathbf{x}_p + N(A).$$ In words: every solution is one fixed particular solution plus some null-space vector, and every such sum is a solution.
The proof is two short inclusions, both already half-done in §13.3. Every sum is a solution: if $\mathbf{x}_n \in N(A)$ then $A(\mathbf{x}_p + \mathbf{x}_n) = A\mathbf{x}_p + A\mathbf{x}_n = \mathbf{b} + \mathbf{0} = \mathbf{b}$. Every solution is such a sum: if $\mathbf{x}$ is any solution, then $A(\mathbf{x} - \mathbf{x}_p) = A\mathbf{x} - A\mathbf{x}_p = \mathbf{b} - \mathbf{b} = \mathbf{0}$, so the difference $\mathbf{x} - \mathbf{x}_p$ lies in $N(A)$, which means $\mathbf{x} = \mathbf{x}_p + (\text{null-space vector})$. The two inclusions together pin down the set exactly. $\quad\blacksquare$
The Key Insight — A solution set is a shifted copy of the null space: take the flat $N(A)$ through the origin and slide it over so it passes through any one particular solution $\mathbf{x}_p$. This is why solution sets are flats that don't pass through the origin (unless $\mathbf{b} = \mathbf{0}$) — they are the null space, translated. The direction and dimension of the solution set come entirely from $N(A)$; the location comes from $\mathbf{x}_p$. (This is the geometric cousin of $y = mx + b$: the homogeneous part sets the slope, the particular part sets the offset.)
Three cases now fall out cleanly, unifying everything Chapter 4 told you about the number of solutions:
- $\mathbf{b} \notin C(A)$: no particular solution exists, so the system is inconsistent — no solutions.
- $\mathbf{b} \in C(A)$ and $N(A) = \{\mathbf{0}\}$: a particular solution exists and the null space is trivial, so the solution is unique (only $\mathbf{x}_p + \mathbf{0}$).
- $\mathbf{b} \in C(A)$ and $N(A) \neq \{\mathbf{0}\}$: a particular solution exists and the null space is nontrivial, so there are infinitely many solutions — a whole flat of them, one per null-space vector.
That is the entire none/one/infinite trichotomy from Chapter 3, now explained rather than merely observed: reachability ($\mathbf{b} \in C(A)$?) decides existence; the null space decides uniqueness.
13.8.1 Worked example: the complete solution to Ax = b
Reuse the same $A$ from §13.7 and pick $\mathbf{b} = (1, 5, 5)$. Row-reducing the augmented matrix $[A \mid \mathbf{b}]$ (same row operations as before, now carrying the right-hand side) gives the reduced system
$$x_1 + 3x_2 - x_4 = -2, \qquad x_3 + x_4 = 1,$$
with no contradiction row — so $\mathbf{b} \in C(A)$ and the system is consistent. For a particular solution, set the free variables $x_2 = x_4 = 0$: then $x_1 = -2$ and $x_3 = 1$, giving $\mathbf{x}_p = (-2, 0, 1, 0)$. The complete solution is this particular solution plus the null space we already found:
$$\mathbf{x} = \underbrace{\begin{bmatrix} -2 \\ 0 \\ 1 \\ 0 \end{bmatrix}}_{\mathbf{x}_p} + c_1 \underbrace{\begin{bmatrix} -3 \\ 1 \\ 0 \\ 0 \end{bmatrix}}_{\mathbf{s}_1} + c_2 \underbrace{\begin{bmatrix} 1 \\ 0 \\ -1 \\ 1 \end{bmatrix}}_{\mathbf{s}_2}, \qquad c_1, c_2 \in \mathbb{R}.$$
A 2-parameter family — a plane of solutions in $\mathbb{R}^4$, riding on the particular solution. Verify a couple of members with numpy:
# The complete solution x_p + c1 s1 + c2 s2 always lands on b = (1,5,5).
import numpy as np
A = np.array([[1, 3, 3, 2], [2, 6, 9, 7], [-1, -3, 3, 4]], dtype=float)
xp = np.array([-2, 0, 1, 0], dtype=float)
s1 = np.array([-3, 1, 0, 0], dtype=float)
s2 = np.array([ 1, 0, -1, 1], dtype=float)
print("A xp =", A @ xp) # [1. 5. 5.] = b
print("A(xp+2s1-1s2) =", A @ (xp + 2*s1 - 1*s2)) # still [1. 5. 5.]
Both print [1. 5. 5.]: the particular solution hits $\mathbf{b}$, and adding any combination of the special solutions leaves the output unchanged — exactly as the theorem promises. By contrast, the target $\mathbf{b}' = (4, 11, 1)$ produces a contradiction row in $[A \mid \mathbf{b}']$ (it is off the column-space plane), so $A\mathbf{x} = \mathbf{b}'$ has no solution — a vector the transformation simply cannot reach.
Common Pitfall — "To get a particular solution I should set the free variables to whatever I like, so I might as well leave them as unknowns." You can set them to anything (each choice gives a different particular solution — they all differ by null-space vectors), but the fastest and least error-prone choice is to set every free variable to $\mathbf{0}$, as we did. Then the pivot variables read straight off the last column of the reduced augmented matrix with no extra algebra. A second pitfall: students often forget the homogeneous part entirely and report the single particular solution as "the" answer. When the null space is nontrivial, that is only one point of an infinite flat — always append $+\,c_1\mathbf{s}_1 + c_2\mathbf{s}_2 + \cdots$ to give the complete solution.
13.8.2 Why "particular plus homogeneous" is the same pattern you have seen before
The structure "complete solution = one particular solution + all homogeneous solutions" is not unique to matrices — it is the universal shape of solution sets for any linear problem, and recognizing it will save you in several later chapters. You have, in fact, met it already. In a first course on differential equations, the general solution of a linear ODE like $y'' + y = \sin t$ is written as $y = y_p + y_h$: one particular solution $y_p$ that handles the right-hand side, plus the general homogeneous solution $y_h$ (here $c_1\cos t + c_2\sin t$) that solves the equation with right-hand side zero. That is exactly this chapter's theorem, because differentiation is a linear operator and "$y_h$" is the null space (kernel) of that operator. The matrix case and the ODE case are the same algebra — linearity — wearing different costumes, and Chapter 37 makes the connection literal when it solves systems of ODEs $\mathbf{x}' = A\mathbf{x}$ using the eigenstructure of $A$.
The lesson generalizes: whenever you solve a linear equation $L(\mathbf{x}) = \mathbf{b}$ — for a matrix, a derivative, an integral operator, a difference equation — the solution set is a single particular solution shifted by the entire kernel of $L$. The kernel sets the shape and dimension of the solution family; the particular solution sets its position. Once you see this pattern, "how many solutions does $L(\mathbf{x}) = \mathbf{b}$ have?" always reduces to two sub-questions — is $\mathbf{b}$ in the image of $L$? (existence) and how big is the kernel of $L$? (uniqueness) — which are precisely the column-space and null-space questions of this chapter, lifted to any linear setting.
Real-World Application — In economics, a Leontief input–output model uses a matrix $A$ whose entry $a_{ij}$ says how much of industry $i$'s output is consumed to make one unit of industry $j$'s output; solving $(I - A)\mathbf{x} = \mathbf{d}$ finds the production levels $\mathbf{x}$ that exactly meet external demand $\mathbf{d}$. Whether a demand vector is achievable is a column-space question, and a nontrivial null space of $(I - A)$ would signal a degenerate economy in which some pattern of production sustains itself with zero net output — a structural redundancy with real economic meaning. The same "particular + homogeneous" decomposition tells a planner that if one production plan meets demand, every plan differing from it by a null-space vector meets it too: the null space is the planner's slack, the set of costless reallocations.
13.9 Why does this give us the four-fundamental-subspaces picture?
Step back and look at what we have. Every $m \times n$ matrix $A$ now comes with two subspaces, sitting in two different spaces, answering two different questions:
- $C(A)$, in the output space $\mathbb{R}^m$: everything $A$ can reach (solvability of $A\mathbf{x} = \mathbf{b}$).
- $N(A)$, in the input space $\mathbb{R}^n$: everything $A$ collapses to zero (uniqueness of solutions).
This is half of a diagram that Gilbert Strang made the centerpiece of how linear algebra is taught — the four fundamental subspaces. The other half comes from asking the very same two questions about the transpose $A^{\mathsf{T}}$: its column space $C(A^{\mathsf{T}})$ is the row space of $A$ (living back in the input space $\mathbb{R}^n$), and its null space $N(A^{\mathsf{T}})$ is the left null space of $A$ (living in the output space $\mathbb{R}^m$). Four subspaces, two in each space, all read from the same row reduction. Chapter 14 builds the full picture and proves the rank–nullity theorem that locks their four dimensions together with a single equation, $\operatorname{rank}(A) + \dim N(A) = n$.
Why is this diagram worth elevating above every other fact in the subject? Because it is the map that tells you where every other theorem lives. Almost every major result ahead is, at bottom, a precise statement about how these four subspaces sit relative to one another. Orthogonality (Part IV) will reveal that the null space is perpendicular to the row space and the left null space is perpendicular to the column space — the four subspaces pair up into two right-angled couples, which is why projection and least squares work. The rank will turn out to be the one number all four dimensions are built from. And the SVD (Chapter 30) will hand you orthonormal bases for all four at once, the cleanest possible view of the whole structure. Learning to draw and read this diagram now — even in its two-subspace half — is learning the coordinate system of the rest of the book.
Geometric Intuition — Here is the half of Strang's picture we own so far. On the left sits the input space $\mathbb{R}^n$, containing the null space $N(A)$ (the directions $A$ crushes). On the right sits the output space $\mathbb{R}^m$, containing the column space $C(A)$ (the flat $A$ reaches). The matrix is the arrow between them: it takes all of $\mathbb{R}^n$, sends the entire null space to the single point $\mathbf{0}$ on the right, and spreads the rest of $\mathbb{R}^n$ across the column-space flat. Everything outside the column space — the rest of $\mathbb{R}^m$ — is unreachable. Picture it as a funnel: the input space gets folded so that one whole flat ($N(A)$) pinches to a point, and the image is a lower-dimensional flat ($C(A)$) inside the output space. Chapter 14 adds the row space and left null space to complete the diagram.
INPUT R^n OUTPUT R^m
┌───────────────────┐ ┌───────────────────┐
│ everything else │ ──── A ────▶ │ C(A) reachable │
│ (maps onto C(A)) │ │ (a flat thru 0) │
│ ‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑ │ │ ‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑ │
│ N(A): collapses │ ──── A ────▶ │ 0 │
│ to zero │ │ (unreachable rest │
└───────────────────┘ │ lies outside C(A))│
└───────────────────┘
Figure 13.1 — Half of Strang's four-subspaces picture. The matrix $A$ maps the input space $\mathbb{R}^n$ (left) into the output space $\mathbb{R}^m$ (right). The null space $N(A)$ on the left is sent entirely to $\mathbf{0}$; everything else maps onto the column space $C(A)$ on the right, the only reachable flat. Alt-text: two boxes joined by arrows labeled A; the left box's null-space region maps to a single zero point in the right box, while the rest maps onto the column-space region.
Math-Major Sidebar — In the abstract language of Chapter 35, a matrix is a representation of a linear map $T : \mathbb{R}^n \to \mathbb{R}^m$, and the two subspaces have coordinate-free names: $N(A)$ is the kernel $\ker T = \{\mathbf{x} : T\mathbf{x} = \mathbf{0}\}$, and $C(A)$ is the image (or range) $\operatorname{im} T = \{T\mathbf{x} : \mathbf{x} \in \mathbb{R}^n\}$. The First Isomorphism Theorem for vector spaces then says $\mathbb{R}^n / \ker T \cong \operatorname{im} T$ — the input space, with the kernel "collapsed to a point," is isomorphic to the image. Counting dimensions of both sides gives $n - \dim(\ker T) = \dim(\operatorname{im} T)$, which is rank–nullity again. The same theorem governs linear maps between any vector spaces — polynomials, functions, infinite-dimensional Hilbert spaces — not just $\mathbb{R}^n \to \mathbb{R}^m$. Kernel and image are among the most important constructions in all of algebra; you are meeting them here in their most concrete clothing.
Real-World Application — In machine learning, the columns of a data matrix are features, and $C(A)$ is the feature space: the set of all outputs your linear model can produce. If two features are redundant (one a combination of others, as in Chapter 6's collinearity), they add columns but not dimension to $C(A)$, and the corresponding directions show up in $N(A)$ — the model genuinely cannot tell certain inputs apart. This is why dropping redundant features changes nothing the model could express. The geometry here is the entry point to feature spaces in machine learning, and the same column/null-space structure underlies dimensionality reduction, where the goal is to find a small set of directions (a low-dimensional column space) that captures almost all of the data — the rank-revealing idea that the SVD of Chapter 30 makes precise.
13.10 What does it cost to compute these bases, and how do we test it?
By hand, finding both bases costs exactly one row reduction of $A$ — that is the whole point. After you have the RREF $R$ and the pivot positions, the column-space basis is a lookup (those columns of $A$) and the null-space basis is a back-substitution per free variable. There is no second computation; the same staircase that reveals the pivots reveals the free variables. This is why row reduction, the "tool" we were careful in Chapter 4 not to mistake for the point of the subject, is so central: it is the single engine behind solvability, rank, and both of this chapter's subspaces.
It is time to make the two recipes into reusable code. The toolkit already has row_reduce from Chapter 4; we build the two basis-finders on top of it, in pure Python, and check them against numpy/scipy exactly as the chapter did by hand.
Build Your Toolkit — Add two functions to
toolkit/four_subspaces.py, both built on Chapter 4'srow_reduce(A)(which returns the RREF and the list of pivot columns):
column_space_basis(A)— callrow_reduceto get the pivot columns, then return those columns of the originalA(a list of column vectors). Do not return columns of the RREF — the Warning in §13.5.null_space_basis(A)— callrow_reduce, identify the free columns (every column index that is not a pivot), and build one special solution per free column: set that free variable to $1$ and the other free variables to $0$, then read each pivot variable as $x_{\text{pivot}} = -R[\text{pivot row}][\text{free col}]$. Return the list of special solutions.Verify (numpy only for checking, never inside the implementation):
len(column_space_basis(A))equalsnp.linalg.matrix_rank(A);len(null_space_basis(A))equalsscipy.linalg.null_space(A).shape[1]; and every special solutionssatisfiesnp.allclose(A @ s, 0). On the chapter's worked matrix,column_space_basisreturns the columns $(1,2,-1)$ and $(3,9,3)$, andnull_space_basisreturns $(-3,1,0,0)$ and $(1,0,-1,1)$ — matching §13.7 exactly.
A sketch of the null-space half, to show how cleanly it reads (the full module is assembled separately; implement it yourself before peeking):
# Sketch: special solutions from the RREF. Reuses Chapter 4's row_reduce.
def null_space_basis(A):
R, pivot_cols = row_reduce(A) # RREF + pivot column indices (Ch.4)
n_cols = len(A[0])
free_cols = [j for j in range(n_cols) if j not in pivot_cols]
pivot_row = {col: i for i, col in enumerate(pivot_cols)} # which row pivots col
basis = []
for f in free_cols: # one special solution per free variable
s = [0.0] * n_cols
s[f] = 1.0 # turn this free variable on
for col in pivot_cols: # solve each pivot variable
s[col] = -R[pivot_row[col]][f]
basis.append(s)
return basis
# On the worked matrix this returns [[-3,1,0,0], [1,0,-1,1]] (matches §13.7).
The implementation never imports numpy; we only use numpy/scipy in the tests to confirm the dimensions and the $A\mathbf{s} = \mathbf{0}$ property. That is the toolkit's contract throughout the book (Chapter 4): build it from scratch, check it against the library.
Historical Note — The systematic study of the "solution space" of a homogeneous linear system — what we call the null space — goes back to the 19th-century work on linear systems and determinants by Cauchy, Sylvester, and others; James Joseph Sylvester coined the word "matrix" in 1850 [verify]. The crisp modern framing of four fundamental subspaces and their dimension relations, however, is largely owed to Gilbert Strang, whose textbook and MIT lectures made the four-subspaces diagram the organizing image of an entire generation's linear-algebra education from the 1980s onward [verify]. The terms kernel and image for the abstract versions come from the algebraic tradition (homomorphisms of groups and modules) and were imported into linear algebra as the subject fused with abstract algebra in the 20th century.
13.11 What should you carry forward from this chapter?
We turned the central equation $A\mathbf{x} = \mathbf{b}$ into two clean geometric questions and answered both with subspaces. The column space $C(A)$ — the span of the columns, equivalently the set of all reachable outputs $A\mathbf{x}$ — decides solvability: the system is consistent exactly when $\mathbf{b} \in C(A)$. Its basis is the pivot columns of the original $A$, and its dimension is the rank. The null space $N(A)$ — the solution set of $A\mathbf{x} = \mathbf{0}$, equivalently the directions the matrix collapses — decides uniqueness: its basis is the special solutions (one per free variable), and the complete solution of any consistent system is a single particular solution plus all of $N(A)$. Both are genuine subspaces, both come from one row reduction, and both are halves of Strang's four-fundamental-subspaces picture.
Three threads run straight out of here. Chapter 14 completes the diagram with the row space and left null space and proves the rank–nullity theorem — the conservation law $\operatorname{rank}(A) + \dim N(A) = n$ that we kept seeing in the bookkeeping ($2 + 2 = 4$). Chapter 15 makes "dimension" and "basis" fully rigorous, proving that every basis of a subspace has the same size (so "$\dim C(A)$" is well-defined). And Chapter 17 turns the column space into a workhorse: when $\mathbf{b}$ is not reachable — the usual case for noisy data — least squares finds the point of $C(A)$ closest to $\mathbf{b}$, which is linear regression. The reachable/unreachable distinction you learned here is exactly what makes that problem meaningful.
This chapter advanced two of the book's recurring themes in a way worth naming explicitly. Geometry and algebra are two views of one object: the column space is at once "the span of the columns" (algebra) and "the flat the transformation reaches" (geometry); the null space is at once "the solution set of $A\mathbf{x}=\mathbf{0}$" (algebra) and "the directions crushed to zero" (geometry); and a single quantity, the rank, is simultaneously a count of pivots, the dimension of an image, and a statement about how much information a transformation preserves. The four fundamental subspaces are the organizing framework for all of linear algebra: you have now built the first two, and every theorem ahead — orthogonal projection (Chapter 19), the spectral theorem (Chapter 27), the SVD (Chapter 30) — will locate itself relative to this diagram. When the SVD finally reveals that every matrix is a rotation, a scaling, and another rotation, the scaling factors will turn out to be the dimensions of these very subspaces made quantitative.
Keep the funnel picture in your head: a matrix folds its input space so that one whole flat — the null space — pinches to a point, and spreads everything else across a lower-dimensional flat — the column space — inside the output space. Reach and loss, two subspaces, one transformation. That single image is the organizing idea of all of Part III, and you will see every later topic, from orthogonality to the SVD, hang from it. Find a small matrix, row-reduce it once, and read off both bases the way this chapter did — until the two subspaces are as automatic to you as the row reduction that reveals them. That fluency is the foundation the rest of the part is built on.