> Learning paths. Math majors — read everything, especially the rank-nullity proof in §14.9 and the row-rank-equals-column-rank argument in §14.5, plus the two Math-Major Sidebars (the rank factorization in §14.5 and the coordinate-free proof in...
Prerequisites
- chapter-13-column-space-and-null-space
Learning Objectives
- Define the row space C(A^T) and the left null space N(A^T) and find a basis for each directly from a matrix's row-reduced form.
- State that rank equals the dimension of the column space and the dimension of the row space, and justify why row rank always equals column rank.
- Give the dimensions of all four fundamental subspaces in terms of m, n, and r, and place them in Strang's complete diagram.
- State the rank-nullity theorem with its conditions and reconstruct a motivated proof that counts pivot columns against free columns.
- Apply rank-nullity to predict the size of a solution set and the number of degrees of freedom in a system.
- Recognize the two orthogonality relationships (row space perp null space, column space perp left null space) as a preview of Part IV.
- Compute rank from RREF by counting pivots and verify rank-nullity numerically against np.linalg.matrix_rank.
In This Chapter
- 14.1 What did Chapter 13 leave unfinished?
- 14.2 What is the row space, and where does it live?
- 14.3 Why do row operations leave the row space unchanged?
- 14.4 How do I find a basis for the column space from the RREF?
- 14.5 Why does row rank equal column rank?
- 14.6 What is the left null space, and how do the four subspaces fit together?
- 14.7 What do the four subspaces look like for a tall matrix?
- 14.8 What does the rank-nullity theorem actually say?
- 14.9 PROOF: the rank-nullity theorem
- 14.10 How are the four subspaces related by right angles?
- 14.11 What does rank tell me about solving $A\mathbf{x} = \mathbf{b}$?
- 14.12 Build your own toolkit: rank and the rank-nullity check
- 14.13 Summary: one number, four subspaces, one balance
Row Space, Left Null Space, and the Rank-Nullity Theorem
Learning paths. Math majors — read everything, especially the rank-nullity proof in §14.9 and the row-rank-equals-column-rank argument in §14.5, plus the two Math-Major Sidebars (the rank factorization in §14.5 and the coordinate-free proof in §14.9). CS / Data Science — focus on the Geometric Intuition callouts, the four-subspaces diagram in §14.6, the numpy verifications, and the degrees-of-freedom reading of nullity; the deepest proofs are optional but the dimension count is essential. Physics / Engineering — focus on the geometry of the four subspaces as orthogonal flats, the conservation-law flavor of rank-nullity, and the truss application. This chapter assumes the column space $C(A)$ and null space $N(A)$ from Chapter 13 and the row reduction of Chapter 4.
14.1 What did Chapter 13 leave unfinished?
In Chapter 13 you met the first two of Gilbert Strang's four fundamental subspaces, and they answered the question that secretly drives all of applied linear algebra. Given a matrix $A$ and the equation $A\mathbf{x} = \mathbf{b}$, the column space $C(A)$ told you which right-hand sides are reachable — exactly the $\mathbf{b}$'s for which a solution exists — and the null space $N(A)$ told you what gets destroyed, the inputs $A$ crushes to zero, which is why solutions, when they exist, arrive in infinite families. Two subspaces, two halves of one story: what a transformation reaches, and what it loses.
But the picture was lopsided. The column space lives in the output space $\mathbb{R}^m$ and is built from the columns of $A$; the null space lives in the input space $\mathbb{R}^n$ and is built from the rows of $A$ (each row gives one equation $\text{row} \cdot \mathbf{x} = 0$). We used the columns to make one subspace and the rows to make another — but we never asked what the rows span on their own, nor what the columns send to zero. There are two more subspaces hiding in plain sight, and finding them completes a diagram so powerful that Strang built an entire pedagogy of linear algebra around it.
Geometric Intuition — Every matrix $A$ comes with two spaces it can confuse: its $n$-dimensional input space $\mathbb{R}^n$ and its $m$-dimensional output space $\mathbb{R}^m$. In each of those spaces, $A$ carves out one subspace it cares about and one subspace it ignores. In the output space, $C(A)$ is what it reaches and $N(A^{\mathsf{T}})$ is what it can never reach; in the input space, $C(A^{\mathsf{T}})$ is what it acts on faithfully and $N(A)$ is what it annihilates. Four subspaces, two in each space — and a single number, the rank, controls all four dimensions at once.
This chapter does three things. First, it names the two missing subspaces — the row space $C(A^{\mathsf{T}})$ and the left null space $N(A^{\mathsf{T}})$ — and shows you how to read a basis for each off the same row reduction you already do. Second, it proves the two great counting facts that tie everything together: that row rank equals column rank (so "rank" is a single unambiguous number), and the rank-nullity theorem, which says the dimension of what $A$ reaches plus the dimension of what $A$ destroys always equals $n$, the dimension you started with. Third, it previews the geometry that makes Part IV inevitable: these four subspaces are not scattered at random angles — they meet at perfect right angles, in two orthogonal pairs.
Keep one small matrix in front of you for the whole chapter. We will use $$A = \begin{bmatrix} 1 & 2 & 1 & 3 \\ 2 & 4 & 0 & 4 \\ 3 & 6 & 1 & 7 \end{bmatrix},$$ a $3 \times 4$ matrix (so $m = 3$, $n = 4$). It is small enough to reduce by hand and rich enough that all four subspaces are nontrivial. Watch each subspace appear in its row-reduced form; by §14.6 you will read off all four bases from a single picture.
14.2 What is the row space, and where does it live?
We built the column space in Chapter 13 by spanning the columns of $A$. The most natural next move is the mirror image: span the rows. That is the entire definition.
The row space of an $m \times n$ matrix $A$ is the span of its rows — the set of all linear combinations of the row vectors of $A$. Each row of $A$ has $n$ entries, so the rows are vectors in $\mathbb{R}^n$, and the row space is a subspace of $\mathbb{R}^n$, the input space. We write it $C(A^{\mathsf{T}})$, read "the column space of $A$-transpose," and the notation is not a trick: transposing $A$ turns its rows into columns, so the columns of $A^{\mathsf{T}}$ are the rows of $A$, and the span of those columns is by definition $C(A^{\mathsf{T}})$. The row space of $A$ is literally the column space of $A^{\mathsf{T}}$ — which is why we get it for free from everything we proved about column spaces in Chapter 13, just applied to the transpose.
The Key Insight — The row space $C(A^{\mathsf{T}})$ is the span of the rows of $A$, living in the input space $\mathbb{R}^n$. Naming it $C(A^{\mathsf{T}})$ rather than inventing a new symbol is deliberate: the row space of $A$ is the column space of $A^{\mathsf{T}}$, so every theorem about column spaces transfers instantly. We do not need a separate theory of rows — we already have it.
Why should you care about the span of the rows? Because of a beautiful fact you will prove in spirit throughout this chapter: row operations do not change the row space. When you row reduce $A$, every new row is a linear combination of old rows, and every old row can be recovered from the new ones — so the span never moves. That means the row space of $A$ equals the row space of its reduced form, and the nonzero rows of the RREF form a basis for it. The row space is the one subspace you can read off the RREF with literally no extra work.
Let's see it on our running matrix. We reduce $A$ to reduced row echelon form (the procedure of Chapter 4); doing the elimination by hand, subtract $2\times$ row 1 from row 2 and $3 \times$ row 1 from row 3, then clean up, and you arrive at $$\operatorname{RREF}(A) = \begin{bmatrix} 1 & 2 & 0 & 2 \\ 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 0 \end{bmatrix}.$$ There are two nonzero rows, $(1, 2, 0, 2)$ and $(0, 0, 1, 1)$. Those two vectors are a basis for the row space $C(A^{\mathsf{T}})$, which therefore has dimension $2$ and lives inside $\mathbb{R}^4$. The third row reduced to all zeros, which is the algebra telling you that the third row of $A$ was redundant — and indeed $(3, 6, 1, 7) = (1, 2, 1, 3) + (2, 4, 0, 4)$, the sum of the first two rows. The row space "is" a 2-dimensional flat (a plane through the origin) sitting in 4-dimensional space.
Common Pitfall — "A basis for the row space is the pivot rows of the original matrix $A$." Be careful: a basis for the row space is the nonzero rows of the RREF (or any echelon form), not the corresponding rows of the original $A$. The RREF rows span the same space (row operations preserve it), and they are cleaner. By contrast — and this is the asymmetry that trips everyone — a basis for the column space must come from the pivot columns of the original $A$, never the columns of the RREF, because column operations are not what we performed and the RREF's columns span a different space. Rows of the RREF for the row space; pivot columns of $A$ for the column space. We unpack the column side in §14.4.
# The row space C(A^T) is the span of the rows of A; its basis is the nonzero RREF rows.
import numpy as np
import sympy as sp
A = np.array([[1, 2, 1, 3],
[2, 4, 0, 4],
[3, 6, 1, 7]], dtype=float)
rref, pivots = sp.Matrix(A).rref() # exact reduced row echelon form
print("RREF =\n", np.array(rref, dtype=float))
print("pivot columns (0-indexed):", pivots) # (0, 2)
print("rank = # nonzero rows =", len(pivots)) # 2
# The nonzero RREF rows are a basis for the row space:
print("row-space basis:\n", np.array(rref, dtype=float)[:len(pivots)])
The output reproduces the RREF above, reports the pivot columns as (0, 2) (numpy/sympy index from $0$, so these are the 1st and 3rd columns in the math convention — the indexing offset we flagged in Chapter 6), and confirms the rank is $2$. The first two RREF rows, [1, 2, 0, 2] and [0, 0, 1, 1], are exactly the row-space basis we found by hand.
14.3 Why do row operations leave the row space unchanged?
This deserves a moment of care, because the claim "the nonzero RREF rows are a basis for the row space" rests entirely on it, and the argument is a small jewel. The three elementary row operations are: swap two rows, scale a row by a nonzero constant, and add a multiple of one row to another. We need each to preserve the span of the rows.
Swapping two rows obviously does nothing to their span — span does not care about order. Scaling row $i$ by $c \neq 0$ replaces $\mathbf{r}_i$ with $c\mathbf{r}_i$; any combination using $\mathbf{r}_i$ can use $\tfrac{1}{c}(c\mathbf{r}_i)$ instead, and vice versa, so the set of reachable combinations is unchanged. The interesting one is adding a multiple: replace $\mathbf{r}_i$ with $\mathbf{r}_i + c\mathbf{r}_j$. The new rows are all combinations of the old ($\mathbf{r}_i + c\mathbf{r}_j$ is plainly a combination), so the new row space sits inside the old one; and the operation is reversible (subtract $c\mathbf{r}_j$ to recover $\mathbf{r}_i$), so by the same reasoning the old row space sits inside the new. Two subspaces each containing the other are equal.
The Key Insight — Each elementary row operation produces new rows that are linear combinations of the old rows, and is reversible. "New rows are old combinations" forces new-span $\subseteq$ old-span; reversibility forces old-span $\subseteq$ new-span. Equal containment both ways means the row space is invariant under row reduction. This is exactly why the RREF — the cleanest representative of $A$'s row-equivalence class — carries the row space unchanged.
This invariance is the deep reason row reduction "works" for so many questions at once. The column space, by contrast, is not preserved by row operations — row reducing genuinely moves the columns to new positions in $\mathbb{R}^m$ — which is the precise source of the asymmetry in the last pitfall. Hold onto that distinction; it is the single most common source of errors when finding bases for the four subspaces.
It is worth seeing the asymmetry concretely, because the abstract argument can feel slippery until the numbers make it undeniable. The row space of $A$ and the row space of its RREF are the same plane in $\mathbb{R}^4$ — every row of $A$ is a combination of the two RREF rows (for instance $A$'s first row $(1,2,1,3) = (1,2,0,2) + (0,0,1,1)$, the sum of the two basis rows). But the column spaces are genuinely different subspaces of $\mathbb{R}^3$. The RREF's pivot columns are $(1,0,0)$ and $(0,1,0)$, which span the $xy$-plane (every vector with third coordinate $0$); yet $A$'s pivot column $(1,2,3)$ has third coordinate $3 \neq 0$, so it does not even lie in the RREF's column space. Row reduction slid the columns out of $A$'s column space entirely.
# Row space is preserved by reduction; column space is NOT.
import numpy as np
import sympy as sp
A = np.array([[1, 2, 1, 3],
[2, 4, 0, 4],
[3, 6, 1, 7]], dtype=float)
R = np.array(sp.Matrix(A).rref()[0], dtype=float)
# Row space preserved: A's first row is a combination of the two nonzero RREF rows.
print("A row1 =", A[0]) # [1. 2. 1. 3.]
print("RREF row1 + row2 =", R[0] + R[1]) # [1. 2. 1. 3.] -> same row space
# Column space changed: A's pivot column has a nonzero 3rd entry; RREF's columns do not.
print("A pivot column =", A[:, 0], " third entry =", A[2, 0]) # third entry 3 (nonzero)
print("RREF pivot cols =", R[:, 0], R[:, 2], "(third entry 0)") # (1,0,0),(0,1,0): xy-plane
The output shows R[0] + R[1] reproducing $A$'s first row exactly (row space unchanged), while $A$'s pivot column $(1,2,3)$ has a third entry of $3$ that no combination of the RREF's columns $(1,0,0)$ and $(0,1,0)$ can ever produce (column space changed). This is precisely why a basis for $C(A)$ must use the original columns of $A$, but a basis for $C(A^{\mathsf{T}})$ may use the RREF rows. Same reduction, opposite rules — engrave it.
14.4 How do I find a basis for the column space from the RREF?
We need the column space too, both to complete the dimension count and because it is half of the rank story. Chapter 13 defined $C(A)$ as the span of the columns of $A$, a subspace of the output space $\mathbb{R}^m$. The practical question is: which columns form a basis? The answer is one of the cleanest in the subject, and it explains why the rank you see in the row space is the same number you see in the column space.
Row reduce $A$ and note which columns contain pivots. The pivot columns of the original matrix $A$ — the columns in the positions where the RREF has its leading 1's — form a basis for the column space $C(A)$. For our matrix, the RREF has pivots in columns 1 and 3, so the first and third columns of $A$ itself, $$\mathbf{a}_1 = \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix}, \qquad \mathbf{a}_3 = \begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix},$$ are a basis for $C(A)$. The column space is therefore 2-dimensional, a plane through the origin inside $\mathbb{R}^3$.
Why the original columns and not the RREF's? Because row operations change the column space, but they preserve all linear dependence relations among the columns. Here is the mechanism: a dependence $c_1\mathbf{a}_1 + \dots + c_n\mathbf{a}_n = \mathbf{0}$ is exactly a solution of $A\mathbf{x} = \mathbf{0}$, and row operations never change the solution set of $A\mathbf{x} = \mathbf{0}$ (Chapter 4). So whichever columns are dependent in the RREF are dependent in $A$ in precisely the same way. In the RREF the pivot columns are visibly independent (each has a $1$ where the others have $0$) and the free columns are visibly combinations of them; transporting those exact relations back to $A$ tells you the original pivot columns are independent and the original free columns are redundant.
Common Pitfall — Mixing up the two "which columns" rules. For the column space $C(A)$: take the pivot columns of $A$ (the originals). For the null space $N(A)$: the free (non-pivot) columns are the ones that give you free variables, and you build null-space basis vectors by setting each free variable to $1$ in turn. Same RREF, two different readings: pivot columns name a basis for $C(A)$; free columns count the dimension of $N(A)$. Confusing "use the original columns" (for $C(A)$) with "use the RREF rows" (for $C(A^{\mathsf{T}})$) is the classic four-subspaces mistake.
# A basis for the column space C(A): the PIVOT columns of the ORIGINAL A.
import numpy as np
import sympy as sp
A = np.array([[1, 2, 1, 3],
[2, 4, 0, 4],
[3, 6, 1, 7]], dtype=float)
_, pivots = sp.Matrix(A).rref()
print("pivot columns (0-indexed):", pivots) # (0, 2)
col_basis = A[:, list(pivots)] # original columns 1 and 3
print("column-space basis (cols of A):\n", col_basis)
print("dim C(A) =", len(pivots), " = rank") # 2
print("matrix_rank check:", np.linalg.matrix_rank(A)) # 2 (agrees)
The pivot columns are (0, 2), so columns 1 and 3 of $A$ — [1,2,3] and [1,0,1] — are the column-space basis, and $\dim C(A) = 2$. Notice the punchline already peeking through: the row space had dimension $2$ and the column space has dimension $2$. The same number, two different spaces. That coincidence is no coincidence, and we prove it next.
14.5 Why does row rank equal column rank?
We have defined two numbers that both deserve the name "rank." The column rank of $A$ is $\dim C(A)$, the number of independent columns. The row rank is $\dim C(A^{\mathsf{T}})$, the number of independent rows. For our matrix both came out $2$, and that was not luck. The single most important structural theorem of this chapter is that row rank always equals column rank — so there is just one number, which we call the rank $r$ of $A$ and write $\operatorname{rank}(A)$.
Geometric Intuition — It is genuinely surprising that the number of independent rows must equal the number of independent columns. The rows live in $\mathbb{R}^n$ and the columns live in $\mathbb{R}^m$ — different spaces, often of different sizes (here $\mathbb{R}^4$ and $\mathbb{R}^3$). Why should counting independent directions among the rows give the same answer as counting them among the columns? The deep reason is that both counts equal the number of pivots, and a matrix has exactly one set of pivot positions. The pivots are the meeting point of the two stories.
The pivot-counting argument
The cleanest justification routes both ranks through the pivots of the RREF. Reduce $A$ to its RREF. The row rank equals the number of nonzero rows, because (from §14.3) the row space is unchanged by reduction and the nonzero RREF rows are an obvious basis for it — they are independent (each has a leading $1$ in a column where all other rows are $0$, so no combination of the others can reproduce it) and they span the row space. The number of nonzero rows is exactly the number of pivots.
Now the column rank. From §14.4, the pivot columns of $A$ form a basis for $C(A)$: they are independent (their RREF images each have a leading $1$ in a fresh position, so no dependence relation among them can hold, and dependence relations transfer between $A$ and its RREF), and every non-pivot column is a combination of earlier pivot columns (the RREF displays exactly this, and the relations transfer back). So the column rank also equals the number of pivots. Two quantities each equal to "the number of pivots" are equal to each other.
The Key Insight — Both the row rank and the column rank count the pivots of $A$. A matrix has one RREF, hence one set of pivot positions, hence one pivot count — so row rank $=$ pivot count $=$ column rank. This is why we may speak of the rank $\operatorname{rank}(A) = r$, a single number that is simultaneously the dimension of the column space and the dimension of the row space.
There is a slicker, coordinate-free proof that physicists and math majors often prefer, and it is worth seeing because it makes the symmetry between $A$ and $A^{\mathsf{T}}$ manifest. Suppose the column rank is $r$, so $C(A)$ has a basis $\mathbf{c}_1, \dots, \mathbf{c}_r$ (vectors in $\mathbb{R}^m$). Every column of $A$ is a combination of these, which means we can write $A = CR$, where $C$ is the $m \times r$ matrix whose columns are the $\mathbf{c}_i$ and $R$ is some $r \times n$ matrix of coefficients. But reading $A = CR$ by rows, every row of $A$ is a linear combination of the $r$ rows of $R$ — so the rows of $A$ are spanned by only $r$ vectors, forcing the row rank to be at most $r$. That shows row rank $\le$ column rank. Apply the identical argument to $A^{\mathsf{T}}$ (whose rows are $A$'s columns) to get column rank $\le$ row rank. Both inequalities together force equality.
Math-Major Sidebar (optional) — The factorization $A = CR$ in the second proof is the rank factorization (sometimes the "$CR$ factorization"), and it is more than a proof device: it is the statement that any rank-$r$ matrix is a product of an $m \times r$ and an $r \times n$ matrix, i.e. a sum of $r$ rank-one matrices. That decomposition is the conceptual seed of the singular value decomposition (Chapter 30) and of low-rank approximation (Chapter 31), where compressing an image means replacing $A$ by a nearby matrix of small rank — few independent columns, hence few independent rows, hence little information. Rank, in this light, is a measure of information content: the number of genuinely independent directions a matrix encodes, identical whether you read it across or down.
# Row rank == column rank: both equal the pivot count, for any shape.
import numpy as np
A = np.array([[1, 2, 1, 3],
[2, 4, 0, 4],
[3, 6, 1, 7]], dtype=float)
print("rank(A) =", np.linalg.matrix_rank(A)) # 2 (column rank)
print("rank(A^T) =", np.linalg.matrix_rank(A.T)) # 2 (row rank) -- same number
# A taller, wider, random check that it is never a coincidence:
rng = np.random.default_rng(0)
for shape in [(3, 5), (5, 3), (4, 4), (2, 7)]:
M = rng.integers(-3, 4, size=shape).astype(float)
assert np.linalg.matrix_rank(M) == np.linalg.matrix_rank(M.T)
print("row rank == column rank held for every test shape.")
matrix_rank(A) and matrix_rank(A.T) both return $2$, and the loop confirms the equality across tall, wide, and square random matrices. Internally numpy computes rank from the singular value decomposition rather than from pivots, so this is an independent confirmation: the rank is a genuine invariant of the matrix, not an artifact of how you choose to count.
14.6 What is the left null space, and how do the four subspaces fit together?
One subspace remains. We built the null space $N(A)$ in Chapter 13 as the solutions of $A\mathbf{x} = \mathbf{0}$ — the inputs $A$ sends to zero. The mirror image asks the same question of the transpose: which vectors does $A^{\mathsf{T}}$ send to zero? That set is the left null space $N(A^{\mathsf{T}})$, the solutions of $A^{\mathsf{T}}\mathbf{y} = \mathbf{0}$, a subspace of the output space $\mathbb{R}^m$.
The name "left" comes from transposing the defining equation. The condition $A^{\mathsf{T}}\mathbf{y} = \mathbf{0}$ is equivalent, by taking transposes, to $\mathbf{y}^{\mathsf{T}}A = \mathbf{0}^{\mathsf{T}}$ — a row vector $\mathbf{y}^{\mathsf{T}}$ multiplying $A$ from the left and producing the zero row. So the left null space collects the row vectors that, applied on the left, annihilate $A$. Concretely and memorably: $\mathbf{y}$ is in $N(A^{\mathsf{T}})$ exactly when the combination $\mathbf{y}^{\mathsf{T}}A$ of the rows of $A$ — weighted by the entries of $\mathbf{y}$ — is zero. The left null space records the dependence relations among the rows.
Geometric Intuition — The left null space is the "wasted directions" of the output space, the mirror of how the null space is the wasted directions of the input space. A vector $\mathbf{y}$ in $N(A^{\mathsf{T}})$ tells you that some combination of $A$'s rows collapses to nothing — equivalently, that $\mathbf{y}$ points in a direction of $\mathbb{R}^m$ that the columns of $A$ can never reach. (We make that "never reach" precise as orthogonality in §14.10.) For our $3 \times 4$ matrix, the third row was the sum of the first two; that single redundancy is the entire left null space.
Finding a basis for $N(A^{\mathsf{T}})$ by hand: spot the dependence among the rows. Our RREF had one zero row, signaling one row dependence. We already noted $\mathbf{r}_3 = \mathbf{r}_1 + \mathbf{r}_2$, i.e. $\mathbf{r}_1 + \mathbf{r}_2 - \mathbf{r}_3 = \mathbf{0}$, so the weight vector $\mathbf{y} = (1, 1, -1)$ satisfies $\mathbf{y}^{\mathsf{T}}A = \mathbf{0}$. That single vector is a basis for the left null space, which therefore has dimension $1$, a line through the origin in $\mathbb{R}^3$.
For larger matrices, eyeballing the row dependence is not reliable, so here is the systematic method, and it is a small piece of cleverness worth knowing. Augment $A$ with the $m \times m$ identity, forming $[A \mid I]$, and row reduce the whole augmented array. The row operations that zero out a row of $A$ are recorded, on the right, as exactly the combination of original rows that produced that zero — because the identity started as a perfect ledger of "row $i$ is $1$ times row $i$." When a row of the $A$-side becomes zero, the corresponding row on the $I$-side is a left-null vector $\mathbf{y}^{\mathsf{T}}$. Concretely, reducing $[A \mid I]$ for our matrix turns the third row of the $A$-side to zeros while the third row of the $I$-side becomes $(1, 1, -1)$ — the same $\mathbf{y}$ we found by inspection, now produced mechanically. The bottom $m - r$ rows of the right-hand block, where the left-hand block has gone to zero, are a basis for $N(A^{\mathsf{T}})$. (The plainer alternative — just solve $A^{\mathsf{T}}\mathbf{y} = \mathbf{0}$ by the null-space recipe of Chapter 13 applied to $A^{\mathsf{T}}$ — gives the identical answer; we verify both in code below.)
Now assemble the complete picture. With $m = 3$, $n = 4$, and $r = 2$, all four dimensions are determined:
| Subspace | Symbol | Lives in | Dimension | Our example |
|---|---|---|---|---|
| Column space | $C(A)$ | $\mathbb{R}^m$ (output) | $r$ | $2$ |
| Left null space | $N(A^{\mathsf{T}})$ | $\mathbb{R}^m$ (output) | $m - r$ | $3 - 2 = 1$ |
| Row space | $C(A^{\mathsf{T}})$ | $\mathbb{R}^n$ (input) | $r$ | $2$ |
| Null space | $N(A)$ | $\mathbb{R}^n$ (input) | $n - r$ | $4 - 2 = 2$ |
This is Strang's four-subspaces picture, the organizing diagram of the entire subject. Read it as two stacked pairs. In the input space $\mathbb{R}^n$, the row space (dimension $r$) and the null space (dimension $n - r$) sit together, and their dimensions add to $n$ — that sum is the rank-nullity theorem, coming in §14.8. In the output space $\mathbb{R}^m$, the column space (dimension $r$) and the left null space (dimension $m - r$) sit together, and their dimensions add to $m$. The rank $r$ appears twice, once in each space, because it is simultaneously the row rank and the column rank (§14.5).
Here is the canonical diagram, with $A$ acting as the map $\mathbb{R}^n \to \mathbb{R}^m$:
INPUT SPACE R^n OUTPUT SPACE R^m
(dimension n = 4 in example) (dimension m = 3 in example)
┌───────────────────────────┐ ┌───────────────────────────┐
│ ROW SPACE C(A^T) │ ──A──▶ │ COLUMN SPACE C(A) │
│ dim = r (= 2) │ │ dim = r (= 2) │
│ (rows of A span this) │ │ (columns of A span this) │
├──────────── ⟂ ────────────┤ ├──────────── ⟂ ────────────┤
│ NULL SPACE N(A) │ ──A──▶ │ { 0 } │
│ dim = n − r (= 2) │ │ (everything here is │
│ Ax = 0 (crushed to 0) │ │ crushed to the origin) │
└───────────────────────────┘ │ │
│ LEFT NULL SPACE N(A^T) │
r + (n − r) = n │ dim = m − r (= 1) │
(rank–nullity) │ A^T y = 0 (unreachable) │
└───────────────────────────┘
r + (m − r) = m
Orthogonality (Part IV preview):
ROW SPACE C(A^T) ⟂ NULL SPACE N(A) inside R^n
COLUMN SPACE C(A) ⟂ LEFT NULL SPACE N(A^T) inside R^m
Figure 14.1 — Strang's four fundamental subspaces. The map $A$ sends the input space $\mathbb{R}^n$ to the output space $\mathbb{R}^m$. The row space (dimension $r$) maps onto the column space (also dimension $r$); the null space (dimension $n-r$) maps to the single point $\mathbf{0}$; and the left null space (dimension $m-r$) is the part of the output space the columns of $A$ cannot reach. Within each space the two subspaces are orthogonal complements. Alt-text: two boxes, the left labeled input space R-n split into row space and null space, the right labeled output space R-m split into column space, the origin, and left null space, with arrows showing A mapping row space onto column space and null space to zero, and right-angle marks indicating orthogonality between the paired subspaces.
The Key Insight — The four fundamental subspaces are not four unrelated facts; they are one balanced accounting. The rank $r$ is the dimension of "what $A$ does faithfully" (row space mapping onto column space). In the input space, the rest — $n - r$ dimensions — is destroyed (the null space). In the output space, the rest — $m - r$ dimensions — is never reached (the left null space). Knowing $m$, $n$, and the single number $r$ fixes all four dimensions. Everything in Part III and most of the rest of the book is bookkeeping on this diagram.
# All four fundamental subspaces of A, with their dimensions.
import numpy as np
from scipy.linalg import null_space
A = np.array([[1, 2, 1, 3],
[2, 4, 0, 4],
[3, 6, 1, 7]], dtype=float)
m, n = A.shape
r = np.linalg.matrix_rank(A)
print(f"m = {m}, n = {n}, rank r = {r}")
print("dim C(A) (column, in R^m) =", r) # 2
print("dim N(A^T) (left null, R^m) =", m - r) # 1
print("dim C(A^T) (row, in R^n) =", r) # 2
print("dim N(A) (null, in R^n) =", n - r) # 2
print("left-null-space basis (cols span N(A^T)):\n",
np.round(null_space(A.T), 4)) # a multiple of (1, 1, -1)
The dimensions print as $2, 1, 2, 2$ exactly as the table predicts, and null_space(A.T) returns a single column — a scalar multiple of $(1, 1, -1)$, normalized to unit length as $\approx(-0.577, -0.577, 0.577) = -\tfrac{1}{\sqrt{3}}(1,1,-1)$. The direction is what matters, and it is the row-dependence vector we found by hand. One left-null direction, because there was exactly one redundant row.
Real-World Application — The four subspaces of an electrical network (CS / electrical engineering). Strang's favorite illustration is the incidence matrix of a graph (a circuit): rows are edges, columns are nodes, and each row has a $-1$ at the edge's tail and a $+1$ at its head. For a connected graph with $n$ nodes the rank is $n - 1$, and the four subspaces acquire physical names. The null space $N(A)$ is the constant vector $(1, 1, \dots, 1)$ — adding the same voltage to every node changes no edge voltage, which is why circuits need a chosen ground. The left null space $N(A^{\mathsf{T}})$ is the most striking: its basis vectors are the independent loops of the graph, and $\mathbf{y}^{\mathsf{T}}A = \mathbf{0}$ is exactly Kirchhoff's current law (currents around any loop sum to zero). A triangle of edges $1\!\to\!2$, $2\!\to\!3$, $1\!\to\!3$ gives the loop vector $\mathbf{y} = (1, 1, -1)$ — go around the triangle and you return to your starting potential. The row space and column space, meanwhile, encode the potential differences and the realizable currents. The same rank-nullity that counts solutions of $A\mathbf{x}=\mathbf{b}$ here counts the loops of a network as $(\text{edges}) - (\text{nodes}) + 1$ — Euler's formula in disguise. This is the four-subspaces picture made tangible, and it generalizes to data-science graphs, road networks, and the structural trusses of this chapter's second case study.
Historical Note — The phrase "four fundamental subspaces" and the diagram that organizes them are due to Gilbert Strang, whose Introduction to Linear Algebra and MIT 18.06 lectures made the picture the spine of how a generation learned the subject [verify]. The underlying mathematics — rank, the rank-nullity relation, and the duality between a map and its transpose — is older and was developed through the nineteenth and early twentieth centuries (the rank concept traces to Frobenius and the determinant theory of the 1870s [verify]). Strang's contribution was pedagogical and unifying: insisting that the four spaces together, with their dimensions and orthogonality, are the right first object of study, not an afterthought to solving systems.
14.7 What do the four subspaces look like for a tall matrix?
Our running matrix is wide ($m = 3 < n = 4$), and wide matrices have a fat null space and a thin left null space. To make sure the dimension formulas are not an accident of that one shape, let's run the whole machinery on a tall matrix, where the roles flip. This is not busywork: the contrast is exactly the difference between an underdetermined and an overdetermined system, and the tall case is the one that drives linear regression in Chapter 17.
Take $$B = \begin{bmatrix} 1 & 0 \\ 1 & 1 \\ 1 & 2 \\ 1 & 3 \end{bmatrix},$$ a $4 \times 2$ matrix, so now $m = 4$ and $n = 2$. (This is the design matrix for fitting a straight line to four data points: the first column is the intercept, the second is the $x$-coordinates $0, 1, 2, 3$. Keep that in your back pocket for §14.10.) Its two columns are obviously independent — neither is a multiple of the other — so the rank is $r = 2$, the full column rank. Row reducing confirms it: $B$ reduces to $\begin{bmatrix} 1 & 0 \\ 0 & 1 \\ 0 & 0 \\ 0 & 0 \end{bmatrix}$, with pivots in both columns and two zero rows below.
Now read off all four dimensions from $m = 4$, $n = 2$, $r = 2$:
- Column space $C(B)$: dimension $r = 2$, living in $\mathbb{R}^4$. The two columns span a 2D plane inside 4-space — but most of $\mathbb{R}^4$ is not reachable.
- Null space $N(B)$: dimension $n - r = 2 - 2 = 0$. Full column rank means the only solution of $B\mathbf{x} = \mathbf{0}$ is $\mathbf{x} = \mathbf{0}$. There are no free variables, so nothing is destroyed — the map is one-to-one.
- Row space $C(B^{\mathsf{T}})$: dimension $r = 2$, living in $\mathbb{R}^2$. Two independent rows among the four; in fact the row space is all of $\mathbb{R}^2$ (a 2D space cannot hold a 2D subspace except itself).
- Left null space $N(B^{\mathsf{T}})$: dimension $m - r = 4 - 2 = 2$, living in $\mathbb{R}^4$. This is where the action moved. Two independent dependence relations among the four rows mean a 2D space of directions in $\mathbb{R}^4$ that the columns cannot reach.
Notice how the wide and tall cases mirror each other. The wide matrix $A$ had a 2D null space and a 1D left null space; the tall matrix $B$ has a 0D null space and a 2D left null space. In both, the rank is $2$ and appears twice. The shape decides where the "wasted" dimensions pile up: a wide matrix wastes input directions (big null space, underdetermined system), a tall matrix wastes output directions (big left null space, overdetermined system). The single number $r$ and the two shape numbers $m, n$ tell the whole story.
# The four subspaces of a TALL matrix B (4x2, full column rank): the mirror case.
import numpy as np
from scipy.linalg import null_space
B = np.array([[1, 0],
[1, 1],
[1, 2],
[1, 3]], dtype=float)
m, n = B.shape
r = np.linalg.matrix_rank(B)
print(f"m = {m}, n = {n}, rank r = {r}") # 4 2 2
print("dim C(B) (column, R^4) =", r) # 2
print("dim N(B) (null, R^2) =", n - r) # 0 <- full column rank: unique solutions
print("dim C(B^T) (row, R^2) =", r) # 2
print("dim N(B^T) (left null, R^4)=", m - r) # 2 <- big left null space (overdetermined)
print("N(B) trivial?", null_space(B).shape[1] == 0) # True (only x = 0)
print("left-null dimension =", null_space(B.T).shape[1]) # 2
The output is $r = 2$, with $\dim N(B) = 0$ and $\dim N(B^{\mathsf{T}}) = 2$ — the exact reversal of the wide case. A basis for that 2D left null space is, for instance, $(1, -2, 1, 0)$ and $(1, -1, -1, 1)$; you can check both satisfy $\mathbf{y}^{\mathsf{T}}B = \mathbf{0}$ (the first gives $1 - 2 + 1 = 0$ on the ones-column and $-2 + 2 = 0$ on the $x$-column). These two relations are what make the four-point line-fit overdetermined: there are more equations (4 rows) than the line has parameters (2), so generic data cannot be hit exactly, and the left null space measures that surplus. Chapter 17 turns this surplus into the least-squares story.
Check Your Understanding — A matrix $C$ is $5 \times 3$ with rank $3$. Give the dimensions of all four fundamental subspaces, and say which (if any) is the zero subspace.
Answer
With $m = 5$, $n = 3$, $r = 3$: $\dim C(C) = r = 3$ (column space in $\mathbb{R}^5$); $\dim N(C) = n - r = 3 - 3 = 0$ (the null space is the zero subspace — full column rank, so $C\mathbf{x}=\mathbf{0}$ has only the trivial solution); $\dim C(C^{\mathsf{T}}) = r = 3$ (row space in $\mathbb{R}^3$, which is therefore all of $\mathbb{R}^3$); $\dim N(C^{\mathsf{T}}) = m - r = 5 - 3 = 2$ (left null space in $\mathbb{R}^5$). The big left null space ($2$ dimensions of unreachable output) marks this as an overdetermined, tall system — exactly the regression setup.
14.8 What does the rank-nullity theorem actually say?
Look again at the input-space column of the table: $\dim C(A^{\mathsf{T}}) = r$ and $\dim N(A) = n - r$. Add them and you get $r + (n - r) = n$. That is the rank-nullity theorem, and stated baldly like that it looks almost too obvious to name — of course $r + (n - r) = n$. But the content is not the arithmetic; the content is why the two dimensions are $r$ and $n - r$ in the first place. The theorem is the promise that the dimension of the image and the dimension of the kernel are forced to be complementary — that every dimension of the input space is accounted for as either used faithfully or destroyed, with no overlap and nothing left over.
Before the proof, fix the vocabulary. The rank of $A$ is $r = \dim C(A)$, the dimension of the image — what the map reaches. The nullity of $A$ is $\dim N(A)$, the dimension of the kernel — what the map destroys. With those names the theorem reads:
For any $m \times n$ matrix $A$ (entries in $\mathbb{R}$, or any field), $$\operatorname{rank}(A) + \dim N(A) = n,$$ i.e. rank plus nullity equals the number of columns — equivalently, the dimension of the input space.
Notice the condition that matters: the sum is $n$, the number of columns (the input dimension), not $m$, and not the rank of anything. Students who half-remember the theorem often write "rank + nullity = $m$" or attach it to the wrong dimension; the right-hand side is always the dimension of the space the map acts on, which for $A : \mathbb{R}^n \to \mathbb{R}^m$ is $n$. There is a parallel statement in the output space, $\operatorname{rank}(A) + \dim N(A^{\mathsf{T}}) = m$, which is just rank-nullity applied to $A^{\mathsf{T}}$; we will note it at the end.
Check Your Understanding — A linear map $T : \mathbb{R}^7 \to \mathbb{R}^4$ is given by a $4 \times 7$ matrix $A$. You are told the system $A\mathbf{x} = \mathbf{b}$ is solvable for every $\mathbf{b} \in \mathbb{R}^4$. What is $\operatorname{rank}(A)$, and how many free parameters does each solution have?
Answer
"Solvable for every $\mathbf{b}$" means the column space is all of $\mathbb{R}^4$, so the rank equals the output dimension: $\operatorname{rank}(A) = 4$ (full row rank). Here $m = 4$, $n = 7$, $r = 4$. By rank-nullity, $\dim N(A) = n - r = 7 - 4 = 3$, so every solution carries 3 free parameters — the solution set is a 3-dimensional flat in $\mathbb{R}^7$. (Check the right-hand side is $n = 7$, the number of columns, not $m$.) This is the generic wide-and-surjective situation: always solvable, never uniquely.
Geometric Intuition — Picture $A$ acting on $\mathbb{R}^n$. The transformation can do only two things to each independent input direction: send it somewhere nonzero (contributing to the image) or crush it to the origin (contributing to the kernel). It cannot do both, and it must do one. Rank-nullity is a conservation law for dimensions: the $n$ input dimensions split cleanly into $r$ that survive and $n - r$ that vanish. Squash three-space onto a plane and you reach a 2D image while crushing a 1D line; $2 + 1 = 3$. The dimensions are conserved.
Real-World Application — Degrees of freedom in a sensor network (engineering / data science). Suppose $m$ sensors each report one linear measurement of an underlying state with $n$ parameters, stacked as $A\mathbf{x} = \mathbf{b}$. The rank $r$ is the number of independent measurements you actually have; the nullity $n - r$ is the number of state parameters you cannot determine no matter how you process the data — the unobservable directions. Rank-nullity says these add to $n$: every parameter is either pinned down (rank) or free (nullity). Adding a sensor whose measurement is a combination of existing ones raises $m$ but not $r$, so it shrinks neither the unobservable space nor your uncertainty. This is the linear-algebra core of observability in control theory and of degrees of freedom in statistics, where the count of independent constraints determines how much of a model the data can actually identify.
14.9 PROOF: the rank-nullity theorem
Now we earn the theorem. This is the central proof of Part III, and the version below is the "counting" proof that stays closest to the row reduction you already do by hand — every step is something you can see in the RREF. (A more abstract proof, suitable for general linear maps between abstract vector spaces, is sketched in the sidebar; it belongs to Chapter 35.)
Why we care. Rank-nullity is the load-bearing wall of the four-subspaces picture: it is what forces the dimensions to balance, and it converts the qualitative idea "a transformation reaches some things and destroys others" into an exact equation. Practically, it lets you predict the dimension of a solution set, the number of free parameters in an underdetermined system, and the unobservable degrees of freedom in a model — without solving anything, just by knowing $n$ and the rank.
Key idea. Row reduce $A$. Every column is either a pivot column or a free column, and there is no third option. The pivot columns generate the image (their count is the rank); the free columns generate the kernel (each free variable gives one independent null-space vector). Since every one of the $n$ columns is counted exactly once as pivot-or-free, $r + (n - r) = n$ falls out of the bookkeeping.
Theorem (Rank-Nullity). Let $A$ be an $m \times n$ matrix with entries in a field (the reals will do). Let $r = \operatorname{rank}(A)$ be the number of pivots in its reduced row echelon form. Then $$\operatorname{rank}(A) + \dim N(A) = n.$$
Proof. Reduce $A$ to its reduced row echelon form $R$. By Chapter 4, row reduction does not change the solution set of the homogeneous system, so $N(A) = N(R)$; and from §14.4–§14.5, $r = \operatorname{rank}(A)$ equals the number of pivot columns of $R$. We will show that $\dim N(A) = \dim N(R)$ equals the number of free (non-pivot) columns, which is $n - r$; the theorem follows by adding.
Partition the $n$ column indices into the pivot columns (there are $r$ of them, one per pivot) and the free columns (there are $n - r$ of them). In the system $R\mathbf{x} = \mathbf{0}$, the variables attached to pivot columns are the pivot variables and those attached to free columns are the free variables. Because $R$ is in reduced row echelon form, each nonzero equation solves one pivot variable explicitly in terms of the free variables: $$x_{\text{pivot}} = -\sum_{\text{free } j} (\text{entry of } R)\, x_j .$$ So once the free variables are chosen, the pivot variables — and hence the whole solution $\mathbf{x}$ — are completely determined. The free variables are exactly the parameters of the solution set: choose them freely, and a unique solution follows.
Now construct an explicit basis for $N(A)$. For each free column $j$, build a vector $\mathbf{n}_j$ by setting that free variable to $1$, all other free variables to $0$, and solving for the pivot variables via the equations above. This produces $n - r$ vectors, one per free column. Two claims finish the proof.
These $n - r$ vectors are linearly independent. Look at the free-variable coordinates. The vector $\mathbf{n}_j$ has a $1$ in free-coordinate $j$ and $0$ in every other free-coordinate (by construction). So if a combination $\sum_j c_j \mathbf{n}_j = \mathbf{0}$, then reading off free-coordinate $j$ gives $c_j = 0$ for every $j$. The only combination reaching $\mathbf{0}$ is the trivial one — independence. (This is the same "a $1$ where the others have $0$" trick that made the RREF rows independent in §14.5; the standard-basis pattern in the free coordinates forces independence.)
These $n - r$ vectors span $N(A)$. Take any solution $\mathbf{x} \in N(A)$, and let its free-variable values be $t_1, \dots, t_{n-r}$. Form $\mathbf{w} = \sum_j t_j \mathbf{n}_j$. Then $\mathbf{w}$ is a solution (a combination of solutions is a solution, since $N(A)$ is a subspace — Chapter 13), and $\mathbf{w}$ has the same free-variable values $t_1, \dots, t_{n-r}$ as $\mathbf{x}$ (because each $\mathbf{n}_j$ contributes $t_j$ to free-coordinate $j$ and nothing to the others). But a solution of $R\mathbf{x} = \mathbf{0}$ is uniquely determined by its free-variable values — the pivot variables are forced. Two solutions with identical free coordinates are identical, so $\mathbf{x} = \mathbf{w} = \sum_j t_j \mathbf{n}_j$. Every null-space vector is a combination of the $\mathbf{n}_j$; they span.
The $n - r$ vectors $\mathbf{n}_j$ are independent and span $N(A)$, so they are a basis and $\dim N(A) = n - r$. Therefore $$\operatorname{rank}(A) + \dim N(A) = r + (n - r) = n. \qquad \blacksquare$$
What this means. The proof does more than assert the equation — it constructs the kernel basis, one vector per free variable, which is precisely the recipe Chapter 13 gave for finding $N(A)$. So rank-nullity and "set each free variable to $1$ in turn" are the same fact: each free column is one degree of freedom in the solution set, each pivot column is one dimension of genuine output, and the columns split with no overlap. Geometrically, the input space $\mathbb{R}^n$ decomposes — every direction is either faithfully represented in the image or annihilated into the kernel — and the dimensions of the two pieces must sum to $n$ because together they exhaust the columns.
It is worth pausing on why this counts as a real theorem rather than a tautology, because the distinction sharpens your sense of what linear algebra is doing. The equation $r + (n - r) = n$ is trivially true for any number $r$ you might name. The non-trivial content — the part that required a proof — is that the rank (a property of the column space, defined over in the output space $\mathbb{R}^m$) and the nullity (a property of the kernel, defined back in the input space $\mathbb{R}^n$) are the same $r$ and its complement $n - r$. Two quantities defined in different spaces, by different constructions, turn out to be perfectly complementary slices of the input dimension. That is a structural fact about linear maps, and it fails badly for nonlinear maps: a nonlinear function can crush a high-dimensional set to a point in some places and stretch it in others, with no clean "dimensions in equals dimensions out" accounting. Linearity is exactly what makes the dimensions conserve, and rank-nullity is the precise statement of that conservation. It is the linear-algebra analogue of a conservation law in physics — and like those laws, it lets you deduce a hidden quantity (the nullity) from a measured one (the rank) without ever computing it directly.
Let's confirm the construction on the running matrix, where $r = 2$ and we expect $\dim N(A) = 4 - 2 = 2$. From the RREF $\begin{bmatrix} 1 & 2 & 0 & 2 \\ 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 0 \end{bmatrix}$, the pivot columns are 1 and 3, so the free variables are $x_2$ and $x_4$. The equations are $x_1 = -2x_2 - 2x_4$ and $x_3 = -x_4$. Setting $(x_2, x_4) = (1, 0)$ gives $\mathbf{n}_1 = (-2, 1, 0, 0)$; setting $(x_2, x_4) = (0, 1)$ gives $\mathbf{n}_2 = (-2, 0, -1, 1)$. Two independent null-space vectors, matching $\dim N(A) = 2$.
# Rank-nullity, verified numerically: r + dim N(A) = n.
import numpy as np
from scipy.linalg import null_space
A = np.array([[1, 2, 1, 3],
[2, 4, 0, 4],
[3, 6, 1, 7]], dtype=float)
m, n = A.shape
r = np.linalg.matrix_rank(A)
nullity = null_space(A).shape[1] # dimension of N(A) = # null-space basis vectors
print("rank r =", r) # 2
print("nullity =", nullity) # 2
print("r + nullity =", r + nullity, " n =", n) # 4 4
print("rank-nullity holds:", r + nullity == n) # True
# Confirm the hand-built null basis really lands in N(A):
n1 = np.array([-2, 1, 0, 0.]); n2 = np.array([-2, 0, -1, 1.])
print("A @ n1 =", A @ n1, " A @ n2 =", A @ n2) # both [0 0 0]
The code prints rank $2$, nullity $2$, and $2 + 2 = 4 = n$, so rank-nullity holds; and $A\mathbf{n}_1 = A\mathbf{n}_2 = \mathbf{0}$ confirms our hand-built vectors are genuinely in the null space. The numbers match the theorem exactly.
Math-Major Sidebar (optional) — The counting proof leans on row reduction, which is special to matrices over a field. The abstract rank-nullity theorem is a statement about any linear map $T : V \to W$ between vector spaces with $V$ finite-dimensional: $\dim(\operatorname{im} T) + \dim(\ker T) = \dim V$. The slick proof takes a basis $\mathbf{u}_1, \dots, \mathbf{u}_k$ of the kernel, extends it to a basis $\mathbf{u}_1, \dots, \mathbf{u}_k, \mathbf{v}_1, \dots, \mathbf{v}_p$ of all of $V$ (possible by the basis-extension lemma you will prove in Chapter 15), and shows the images $T\mathbf{v}_1, \dots, T\mathbf{v}_p$ form a basis of $\operatorname{im} T$. Then $\dim V = k + p = \dim(\ker T) + \dim(\operatorname{im} T)$. This version needs no coordinates, no matrix, and no field-specific row reduction — it is the form you will meet again in Chapter 35 for abstract linear transformations, where "rank" becomes $\dim(\operatorname{im} T)$ and "nullity" becomes $\dim(\ker T)$.
14.10 How are the four subspaces related by right angles?
We have the dimensions; now the geometry that makes them fit. The four subspaces are not arranged at arbitrary angles within their spaces — they come in two orthogonal pairs, and that orthogonality is the bridge into Part IV. We will only preview it here (the dot product and orthogonality get their proper treatment starting in Chapter 18), but the two relationships are simple to state and to check, and they explain the right-angle marks in Figure 14.1.
Recall the dot product $\mathbf{u} \cdot \mathbf{v} = u_1 v_1 + \dots + u_n v_n$; two vectors are orthogonal (perpendicular) when their dot product is zero. The two relationships are:
- The row space is orthogonal to the null space (both inside the input space $\mathbb{R}^n$): every vector in $C(A^{\mathsf{T}})$ is perpendicular to every vector in $N(A)$.
- The column space is orthogonal to the left null space (both inside the output space $\mathbb{R}^m$): every vector in $C(A)$ is perpendicular to every vector in $N(A^{\mathsf{T}})$.
The Key Insight — In each space, the two fundamental subspaces are orthogonal complements: they meet only at the origin, they are perpendicular, and their dimensions add up to fill the whole space ($r + (n-r) = n$ in the input space; $r + (m-r) = m$ in the output space). The input space $\mathbb{R}^n$ splits into the row space and the null space at a right angle; the output space $\mathbb{R}^m$ splits into the column space and the left null space at a right angle. This is why the dimensions in rank-nullity must add up perfectly — orthogonal complements tile their space with nothing missing and nothing overlapping.
The first relationship is almost immediate from the definition of $N(A)$, and the argument is worth seeing because it reveals matrix multiplication as a list of dot products. A vector $\mathbf{x}$ is in $N(A)$ exactly when $A\mathbf{x} = \mathbf{0}$. But each entry of $A\mathbf{x}$ is the dot product of a row of $A$ with $\mathbf{x}$: $$A\mathbf{x} = \begin{bmatrix} \mathbf{r}_1 \cdot \mathbf{x} \\ \mathbf{r}_2 \cdot \mathbf{x} \\ \vdots \\ \mathbf{r}_m \cdot \mathbf{x} \end{bmatrix} = \mathbf{0} \quad\Longleftrightarrow\quad \mathbf{r}_i \cdot \mathbf{x} = 0 \text{ for every row } \mathbf{r}_i.$$ So $\mathbf{x} \in N(A)$ means $\mathbf{x}$ is orthogonal to every row of $A$. And if $\mathbf{x}$ is orthogonal to each row, it is orthogonal to every linear combination of the rows — that is, to all of $C(A^{\mathsf{T}})$. The null space is precisely the set of vectors perpendicular to the row space. The second relationship is the identical statement applied to $A^{\mathsf{T}}$: $N(A^{\mathsf{T}})$ is the set of vectors orthogonal to every row of $A^{\mathsf{T}}$ — i.e. to every column of $A$ — so it is perpendicular to $C(A)$.
# Orthogonality preview: row space ⟂ null space, column space ⟂ left null space.
import numpy as np
A = np.array([[1, 2, 1, 3],
[2, 4, 0, 4],
[3, 6, 1, 7]], dtype=float)
# Row-space basis (nonzero RREF rows) and null-space basis (from §14.9):
row_basis = [np.array([1, 2, 0, 2.]), np.array([0, 0, 1, 1.])]
null_basis = [np.array([-2, 1, 0, 0.]), np.array([-2, 0, -1, 1.])]
print("row space . null space dot products:")
print([[round(float(rw @ nv), 10) for nv in null_basis] for rw in row_basis]) # all 0
# Column-space basis (pivot cols of A) and left-null basis:
col_basis = [A[:, 0], A[:, 2]]
leftnull = [np.array([1, 1, -1.])]
print("column space . left null dot products:")
print([[round(float(cv @ yv), 10) for yv in leftnull] for cv in col_basis]) # all 0
Every dot product prints as 0.0: the row-space basis vectors are perpendicular to both null-space basis vectors, and the column-space basis vectors are perpendicular to the left-null vector. The four subspaces really do meet at right angles, exactly as Figure 14.1 marks them.
Geometric Intuition — Here is the payoff that Part IV will cash in. Because the row space and null space are orthogonal complements filling $\mathbb{R}^n$, any input vector $\mathbf{x}$ splits uniquely into a row-space part plus a null-space part, $\mathbf{x} = \mathbf{x}_{\text{row}} + \mathbf{x}_{\text{null}}$, at a right angle. When $A$ acts, the null-space part vanishes and only the row-space part survives to produce the output — and the row-space part maps one-to-one onto the column space. The transformation, stripped of the directions it destroys, is a clean invertible map from the row space to the column space. That decomposition is the seed of orthogonal projection (Chapter 19), least squares, and ultimately the singular value decomposition (Chapter 30), where the right angles become the singular vectors.
14.11 What does rank tell me about solving $A\mathbf{x} = \mathbf{b}$?
The four subspaces and rank-nullity are not abstractions for their own sake — they give immediate answers to the questions Chapter 13 opened, often before you compute a single solution. Let $A$ be $m \times n$ with rank $r$. Reading the diagram:
- The system $A\mathbf{x} = \mathbf{b}$ has a solution exactly when $\mathbf{b} \in C(A)$ — when $\mathbf{b}$ lies in the column space (Chapter 13). The column space has dimension $r$ inside $\mathbb{R}^m$; if $r < m$, most right-hand sides $\mathbf{b}$ are unreachable (they have a nonzero component in the left null space). If $r = m$ (full row rank), the columns span all of $\mathbb{R}^m$ and every $\mathbf{b}$ is reachable — the system is always solvable.
- When a solution exists, the solution set has dimension $n - r$ — it is a shifted copy of the null space, $\mathbf{x}_{\text{particular}} + N(A)$ (Chapter 13). So the number of free parameters in the answer is exactly the nullity $n - r$. If $r = n$ (full column rank), the nullity is $0$ and the solution, when it exists, is unique. If $r < n$, solutions come in an $(n-r)$-dimensional family.
- A unique solution for every $\mathbf{b}$ requires both full row rank and full column rank, $r = m = n$ — a square, invertible matrix (Chapter 9). The four-subspaces picture collapses to its simplest case: the two null spaces are just $\{\mathbf{0}\}$, and the row space and column space are all of $\mathbb{R}^n$.
This is rank-nullity earning its keep as a diagnostic. You can classify a linear system — solvable or not, unique or infinitely many — from two numbers, the shape $m \times n$ and the rank $r$, with the solution set's dimension handed to you as $n - r$.
Watch it work on our running matrix $A$ (which has $m = 3$, $n = 4$, $r = 2$). Because $r = 2 < 3 = m$, the columns do not span all of $\mathbb{R}^3$, so only some right-hand sides are reachable — precisely those orthogonal to the left null vector $\mathbf{y} = (1, 1, -1)$. Test two candidates. Take $\mathbf{b}_1 = (3, 4, 7)$, which is $2\mathbf{a}_1 + \mathbf{a}_3$ (twice the first column plus the third), hence built from columns and certainly in $C(A)$; indeed $\mathbf{y} \cdot \mathbf{b}_1 = 3 + 4 - 7 = 0$, so it is reachable. Now take $\mathbf{b}_2 = (1, 0, 0)$: here $\mathbf{y} \cdot \mathbf{b}_2 = 1 \neq 0$, so $\mathbf{b}_2$ has a nonzero left-null component and is unreachable — no $\mathbf{x}$ solves $A\mathbf{x} = \mathbf{b}_2$. And when $\mathbf{b}_1$ is solvable, rank-nullity tells you the answer is not unique: the nullity is $n - r = 2$, so the solutions form a 2-dimensional family $\mathbf{x}_{\text{particular}} + N(A)$, a plane of solutions floating in $\mathbb{R}^4$.
# Solvability of Ax = b is decided by C(A); unreachable b has a left-null component.
import numpy as np
A = np.array([[1, 2, 1, 3],
[2, 4, 0, 4],
[3, 6, 1, 7]], dtype=float)
y = np.array([1, 1, -1.]) # spans the left null space N(A^T)
b1 = 2 * A[:, 0] + A[:, 2] # = (3, 4, 7), built from columns -> in C(A)
b2 = np.array([1, 0, 0.]) # a generic vector -> probably NOT in C(A)
print("y . b1 =", y @ b1, "-> b1 reachable" if np.isclose(y @ b1, 0) else "") # 0.0
print("y . b2 =", y @ b2, "-> b2 UNreachable" if not np.isclose(y @ b2, 0) else "") # 1.0
# The rank test agrees: b is reachable iff rank[A|b] == rank A.
print("rank A =", np.linalg.matrix_rank(A)) # 2
print("rank [A | b1] =", np.linalg.matrix_rank(np.column_stack([A, b1]))) # 2 -> solvable
print("rank [A | b2] =", np.linalg.matrix_rank(np.column_stack([A, b2]))) # 3 -> NO solution
The output confirms the geometry: $\mathbf{y} \cdot \mathbf{b}_1 = 0$ and augmenting with $\mathbf{b}_1$ keeps the rank at $2$, so the system is solvable; $\mathbf{y} \cdot \mathbf{b}_2 = 1$ and augmenting with $\mathbf{b}_2$ raises the rank to $3$, the classic signal of an inconsistent system. The augmented-rank test and the "is $\mathbf{b}$ orthogonal to the left null space?" test are the same fact viewed two ways — solvability lives entirely in the four-subspaces picture. (When $\mathbf{b}$ is not reachable, the best you can do is the closest reachable vector, its projection onto $C(A)$ — which is the least-squares story of Chapters 17 and 19.)
Common Pitfall — "More equations than unknowns means no solution; more unknowns than equations means infinitely many." Shape alone does not decide it — rank does. A tall system ($m > n$) can still be solvable (if $\mathbf{b}$ happens to lie in the column space), and a wide system ($n > m$) is not automatically solvable for every $\mathbf{b}$ unless it has full row rank. What is true: if $n > m$ then $r \le m < n$, so the nullity $n - r \ge n - m > 0$ is positive — a wide system, when solvable, always has infinitely many solutions. But "when solvable" still depends on $\mathbf{b} \in C(A)$. Count rank, not just rows and columns.
Real-World Application — Rank as data compression (data science). A data matrix's rank is its true information content: an $m \times n$ matrix of rank $r$ can be stored as a product of an $m \times r$ and an $r \times n$ matrix (the rank factorization of §14.5), costing $r(m + n)$ numbers instead of $mn$. When $r$ is small, that is enormous savings — the heart of low-rank approximation and the reason a photo, a recommendation table, or a term-document matrix can be compressed with almost no loss when its rows (or columns) are nearly dependent. The same idea drives dimensionality reduction: keep only the few independent directions that carry the variance and discard the rest as redundant. Rank measures how much of a dataset is genuinely there, and the four-subspaces picture is the geometry of that measurement — we develop it fully in Chapters 31 and 32.
14.12 Build your own toolkit: rank and the rank-nullity check
You have now seen rank from two angles — as a count of pivots in the RREF, and as the dimension of the column (or row) space. The cleanest computational handle is the pivot count, which your row-reduction routine from Chapter 4 already produces. This chapter's toolkit contribution turns that routine toward measuring rank and verifying rank-nullity numerically.
Build Your Toolkit — Add
rank(A)totoolkit/linear_systems.py(the module that holds yourrow_reducefrom Chapter 4). It takes a matrix (a list of rows) and returns the number of pivots — implemented from scratch, no numpy in the body. The recipe: (1) run yourrow_reduceto reach reduced row echelon form; (2) count the rows that are not entirely zero (equivalently, count the pivot positions) — that count is the rank. Then write a small checkercheck_rank_nullity(A)that returns the triple $(r,\ \dim N(A),\ n)$ by computing $r$ with yourrank, taking $n$ as the number of columns, and reporting $\dim N(A) = n - r$; assert that $r + (n - r)$ equals $n$. ```pythontoolkit/linear_systems.py (sketch — reuse your Chapter 4 row_reduce)
def rank(A): """Number of pivots = dimension of the column space = dimension of the row space.""" R = row_reduce(A) # your Chapter-4 RREF routine # count rows whose entries are not all (near-)zero -> that is the pivot count ...
def check_rank_nullity(A): """Return (rank, nullity, n) and confirm rank + nullity == n.""" r = rank(A) n = len(A[0]) # number of columns return r, n - r, n # nullity = n - r by rank-nullity
`` **Verify against numpy:** for any matrix,rank(A)should equalnp.linalg.matrix_rank(np.array(A)), andcheck_rank_nullity(A)should satisfyr + nullity == nwhilenullitymatchesnp.array(A).shape[1] - np.linalg.matrix_rank(np.array(A))`. Test on our $3\times 4$ example (expect $r=2$, $\dim N(A)=2$, $n=4$), a full-rank square matrix (nullity $0$), a zero matrix (rank $0$, nullity $n$), and a tall matrix with dependent columns.Computational Note — As in Chapter 6, comparing floats to exactly zero is unsafe: round-off can leave a tiny $10^{-16}$ where a true zero belongs, so a "zero row" may not look exactly zero. Treat any entry with absolute value below a small tolerance (say
1e-9) as zero when deciding whether a row is a pivot row. This mirrorsnp.linalg.matrix_rank, which does not count pivots at all — it computes the singular values (Chapter 30) and counts how many exceed a tolerance scaled by the largest singular value and the matrix size. For exact-integer test matrices the two agree perfectly; for messy floating-point data, prefer numpy's SVD-based rank and treat "rank deficiency" as a matter of degree. Chapter 38 (numerical linear algebra) takes near-rank-deficiency seriously via the condition number.
14.13 Summary: one number, four subspaces, one balance
This chapter completed the framework that organizes all of linear algebra. You met the two remaining fundamental subspaces: the row space $C(A^{\mathsf{T}})$, the span of the rows, living in the input space $\mathbb{R}^n$ with a basis given by the nonzero RREF rows; and the left null space $N(A^{\mathsf{T}})$, the solutions of $A^{\mathsf{T}}\mathbf{y} = \mathbf{0}$, living in the output space $\mathbb{R}^m$ and recording the dependence relations among the rows. With Chapter 13's column space and null space, that is Strang's complete set of four.
You proved the two great counting facts. Row rank equals column rank, because both count the pivots of $A$, so there is a single well-defined number $\operatorname{rank}(A) = r$ that is simultaneously the dimension of the column space and the dimension of the row space. And the rank-nullity theorem, $\operatorname{rank}(A) + \dim N(A) = n$, which forces the input space to split exactly into $r$ faithfully-used dimensions and $n - r$ destroyed dimensions — a conservation law for dimension, proved by counting pivot columns against free columns. The mirror statement $\operatorname{rank}(A) + \dim N(A^{\mathsf{T}}) = m$ governs the output space. Knowing only $m$, $n$, and $r$ fixes all four dimensions: $r$, $n - r$, $r$, $m - r$.
The book's recurring themes ran straight through. The four fundamental subspaces are the organizing framework for all of linear algebra — you now hold the entire diagram, and nearly every later topic attaches to it. Geometry and algebra are two views of one object: rank is at once an arithmetic pivot count, a geometric image-dimension, and a data-science measure of information content; the null space is at once a solution set and a perpendicular complement. And computation validates theory: every dimension we claimed was confirmed by np.linalg.matrix_rank and scipy.linalg.null_space, and you built rank from scratch to check rank-nullity yourself.
The chapter closed on the geometry that makes Part IV inevitable. The four subspaces meet at right angles, in two orthogonal-complement pairs — the row space perpendicular to the null space, the column space perpendicular to the left null space — which is why the dimensions tile their spaces perfectly. We checked the perpendicularity numerically, but we have not yet built the tools that make "perpendicular" rigorous and useful. That is the whole of Part IV: Orthogonality — dot products and angles (Chapter 18), the orthogonal projection that finds the closest point in a subspace (Chapter 19), the Gram-Schmidt process and QR factorization (Chapter 20), and orthogonal matrices and rotations (Chapter 21). The right angles you glimpsed in Figure 14.1 are about to become the most powerful computational tool in the book. Before that, Chapter 15 makes "dimension" and "basis" fully rigorous — proving that every basis of a subspace has the same size, which is the fact we quietly assumed every time we counted dimensions in this chapter. Keep the four-subspaces diagram in view; you will be drawing it for the rest of the book.