Gaussian Elimination and Row Reduction: The Algorithm That Solves Everything

DataField.Dev

45 min read

> Learning paths. Math majors — read everything; pay special attention to why each row operation preserves the solution set (the proof sketch in §4.2) and to the rank conditions in §4.7. CS / Data Science — focus on the algorithm itself, the numpy...

Prerequisites

chapter-03-systems-of-linear-equations

Learning Objectives

Carry out the three elementary row operations (swap, scale, add a multiple) on an augmented matrix and explain why none of them changes the solution set.
Run forward elimination to reduce a system to row echelon form, then back-substitute to find the solution by hand.
Reduce a matrix all the way to reduced row echelon form (RREF) and read the solution directly off it.
Identify pivot columns and free variables, and use them to classify a system as having a unique solution, infinitely many, or none.
Write the complete solution set of a consistent system with free variables in parametric form.
Verify hand computations against numpy (np.linalg.solve) and the from-scratch solvers, minding 1-indexed math vs 0-indexed code.

In This Chapter

4.1 What problem does Gaussian elimination actually solve?
4.2 Why can we "do operations to equations" without changing the answer?
4.3 What does a row operation look like as a transformation?
4.4 How do you actually solve a system by hand? (Forward elimination and back-substitution)
4.5 What is reduced row echelon form, and why go further than echelon?
4.6 What are pivots and free variables, and how do they reveal the solution set?
4.7 How does elimination tell you a system has no solution?
4.8 Why is elimination so efficient, and where does it strain?
4.9 What does the from-scratch algorithm look like in code?
4.10 Putting it together: the algorithm, start to finish

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Gaussian Elimination and Row Reduction: The Algorithm That Solves Everything

Learning paths. Math majors — read everything; pay special attention to why each row operation preserves the solution set (the proof sketch in §4.2) and to the rank conditions in §4.7. CS / Data Science — focus on the algorithm itself, the numpy verifications, and the "Build Your Toolkit" implementation; this is the chapter whose code you will reuse all term. Physics / Engineering — focus on the geometry of each row operation, the worked examples, and the two case studies (chemistry, traffic). In this chapter everyone learns the same core procedure: the workhorse algorithm that turns the pictures of Chapter 3 into reliable answers.

4.1 What problem does Gaussian elimination actually solve?

In Chapter 3 you learned what a system of linear equations means. Two equations in two unknowns are two lines in a plane; three equations in three unknowns are three planes in space. The solution set is wherever they all meet — a single point, a whole line, an empty intersection. You learned to see the answer. What you did not yet learn is how to reliably find it. Staring at three tilted planes in your mind's eye and announcing "they meet at $(1, 2, 3)$" is not an algorithm; it is a magic trick, and it stops working the moment the numbers get ugly or the system grows to ten unknowns.

This chapter fixes that. Gaussian elimination is a finite, mechanical procedure — an algorithm — that takes any system of linear equations and grinds it down to a form where the answer is obvious. It never guesses. It always terminates. It works for $2$ unknowns or $2{,}000$, by hand or on a supercomputer, and it tells you not just the answer but which kind of answer you have: a unique solution, infinitely many, or none at all. It is, without exaggeration, one of the most useful algorithms ever written down, and you are about to own it.

To appreciate why an algorithm matters, picture how you might solve a system without one. With two equations in two unknowns, you can isolate a variable in one equation and substitute it into the other — the "substitution method" from algebra class. It works, but it is fiddly, and it scales horribly: with five equations in five unknowns you would be substituting expressions into expressions into expressions, drowning in algebra and arithmetic slips, with no guarantee you are making progress. Worse, substitution gives you no clean way to recognize when a system has no solution or infinitely many — you just get stuck or get a strange-looking identity and have to puzzle out what it means. What we need is a procedure that is systematic (the same steps every time, in a predictable order), complete (always reaches an answer or a definitive verdict), and scalable (the work grows in a controlled way, not explosively). Gaussian elimination is exactly that procedure, and the substitution method turns out to be a disorganized special case of it.

Before the machinery, a word on what this chapter is and is not. It is tempting — and a classic mistake — to walk away from a chapter like this believing that row reduction is what linear algebra is about. It is not. Row reduction is a tool: a way to solve $A\mathbf{x} = \mathbf{b}$ and, just as importantly, to reveal the structure of a matrix (which columns are independent, what the solution set looks like, what the matrix can and cannot reach). The real subject of this book is linear transformations — what matrices do to space — and Gaussian elimination is one of our sharpest instruments for studying them. Keep that framing. We are learning to use a power tool, not worshipping the tool. By the end of the book, when you compute an inverse, an $LU$ factorization, a determinant, a rank, or a basis for a subspace, you will find this same elimination quietly doing the work underneath — which is why it is worth mastering completely now, once, by hand.

The Key Insight — Gaussian elimination replaces a hard system with a chain of progressively easier systems that all have the exact same solution set, ending in one so simple you can read the answer off by eye. Every step is reversible, so no solutions are gained and none are lost along the way.

Here is the entire plan of the chapter in one breath. We will write a system as a compact grid called an augmented matrix (§4.2). We will define three elementary row operations that simplify that grid without changing what it solves, and we will check each one against the Chapter 3 geometry (§4.2–4.3). We will use those operations to drive the system into row echelon form — a staircase shape — by forward elimination, then solve it by back-substitution (§4.4). We will push further to reduced row echelon form (RREF), the unique simplest form, where the answer is laid bare (§4.5). And finally we will read the three possible outcomes — unique, infinite, inconsistent — directly off the reduced form, using the language of pivots and free variables (§4.6–4.7). By the end you will implement the whole thing from scratch in Python and watch it agree with numpy to the last decimal.

Let's begin, as always, with a picture.

4.2 Why can we "do operations to equations" without changing the answer?

Picture two lines in the plane — the geometry from Chapter 3. Say

$$\begin{aligned} 2x + y &= 5,\\ x - y &= 1. \end{aligned}$$

Each equation is a line; the solution is the single point where they cross, which happens to be $(2, 1)$ (you can check: $2(2)+1 = 5$ and $2 - 1 = 1$). Now here is the crucial observation that makes the entire algorithm legitimate. Suppose I add the two equations together to get a brand-new equation, $3x = 6$. I have not solved anything yet, but notice: the point $(2, 1)$ satisfies this new equation too, because it satisfied both originals. I can replace one of my equations with this combination, and the crossing point doesn't move. I have changed the equations without changing the answer.

That is the whole idea. We are allowed to manipulate a system in any way that keeps its solution set intact, and there turn out to be exactly three such moves we ever need. Before naming them, let's get rid of the clutter. Writing $x$, $y$, and $=$ over and over is wasted ink; all the information lives in the coefficients. So we strip the system down to a grid of numbers called the augmented matrix — the coefficient matrix $A$ with the right-hand side $\mathbf{b}$ tacked on as an extra column, separated by a bar:

$$\left[\begin{array}{cc|c} 2 & 1 & 5 \\ 1 & -1 & 1 \end{array}\right].$$

Each row is one equation; each column (left of the bar) is one variable; the column right of the bar is the constants. Operating on equations becomes operating on rows, and the three legal moves are the elementary row operations.

The three elementary row operations. 1. Swap two rows ($R_i \leftrightarrow R_j$). Reordering the equations cannot change where they all meet. 2. Scale a row by a nonzero constant ($R_i \leftarrow c\,R_i$, $c \neq 0$). Multiplying an equation through by $3$ — turning $x - y = 1$ into $3x - 3y = 3$ — describes the same line. 3. Add a multiple of one row to another ($R_i \leftarrow R_i + c\,R_j$). This is the combination move from above; it is the engine of elimination.

Each operation is reversible, and reversibility is exactly why the solution set is preserved. If you swap two rows, you can swap them back. If you scale a row by $c \neq 0$, you can scale by $1/c$ to undo it (this is why $c=0$ is forbidden — multiplying an equation by zero destroys it irreversibly, turning a real constraint into the vacuous $0 = 0$). If you add $c\,R_j$ to $R_i$, you can subtract it back. Because every move can be undone, any solution of the new system is a solution of the old and vice versa: nothing is created, nothing is lost.

Math-Major Sidebar (optional) — Here is the preserved-solution claim, proved. Let $S$ be the solution set of the original system and $S'$ the solution set after one elementary operation. Claim: $S = S'$. Key idea: each operation produces equations that are linear combinations of the originals, and the originals are recoverable as linear combinations of the new ones. Proof for the "add a multiple" case: suppose we replace equation $E_i$ (written $\mathbf{a}_i\cdot\mathbf{x} = b_i$) with $E_i + cE_j$, i.e. $(\mathbf{a}_i + c\mathbf{a}_j)\cdot\mathbf{x} = b_i + cb_j$. If $\mathbf{x}\in S$, then $\mathbf{a}_i\cdot\mathbf{x}=b_i$ and $\mathbf{a}_j\cdot\mathbf{x}=b_j$, so adding $c$ times the second to the first gives the new equation — hence $\mathbf{x}\in S'$, so $S\subseteq S'$. Conversely the inverse operation $E_i \leftarrow E_i - cE_j$ recovers the original from the new system, giving $S'\subseteq S$. Therefore $S = S'$. The swap and nonzero-scale cases are identical in spirit (and easier). $\blacksquare$ What this means: the algorithm is not an approximation or a heuristic — it is exact, and the proof is the license that lets us trust every step.

Geometric Intuition — Each row operation is a way of re-describing the same intersection. Swapping rows relabels which plane you list first; scaling stretches the equation but keeps the same plane; adding a multiple of one equation to another tilts one plane through the line where it already met another — pivoting it around their shared intersection. The point (or line, or empty set) where everything meets never budges. We are redrawing the planes into a more convenient orientation without moving where they cross.

Let's run the three operations on our $2\times 2$ example to feel them. Start with the augmented matrix above. Add $-\tfrac{1}{2}R_1$ to $R_2$ to knock out the $x$ in the second equation — actually, the cleaner move here is to first swap so the simplest row is on top, but let's instead use $R_2 \leftarrow R_2 - \tfrac12 R_1$... we will systematize this in a moment. For now just trust that a short sequence of these three moves can turn any system into one you can read. The art is choosing which operations, in what order — and that order has a name: forward elimination.

4.3 What does a row operation look like as a transformation?

You met, back in Chapter 1, the idea that a matrix is a function that transforms space, and you saw the shear — the transformation $\begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix}$ that slides one layer of the plane past another while preserving area. Chapter 1 promised, almost in passing, that "every elimination step is secretly a shear." Now we can cash that in, because it explains why row operations are so well-behaved.

Performing an elementary row operation on a matrix $M$ is the same as multiplying $M$ on the left by a special matrix, called an elementary matrix, which you get by applying that very operation to the identity $I$. Take the operation "$R_2 \leftarrow R_2 - 2R_1$" on a two-row matrix. Apply it to $I = \begin{bmatrix} 1 & 0 \\ 0 & 1\end{bmatrix}$ and you get

$$E = \begin{bmatrix} 1 & 0 \\ -2 & 1 \end{bmatrix}.$$

Left-multiplying any two-row matrix by $E$ performs exactly that operation. Watch it act on the first two rows of our main system from §4.4 below:

# A row operation is left-multiplication by an "elementary matrix".
import numpy as np
E = np.array([[1, 0],
              [-2, 1]])          # built from I by R2 <- R2 - 2 R1
M = np.array([[1, 1, 1, 6],
              [2, 3, 7, 29]])    # two rows of an augmented matrix
print(E @ M)
# [[1 1 1 6]
#  [0 1 5 17]]   <- R2 became R2 - 2*R1, exactly
print(round(float(np.linalg.det(E)), 3))   # 1.0

The product $E M$ is the matrix after the operation, and det(E) is $1$. That determinant is the punchline. The "add a multiple of one row to another" operation is an elementary matrix that is a shear, and shears have determinant $1$ — they preserve area (Chapter 1, Figure 1.4). Swaps are reflections (determinant $-1$, they flip orientation); nonzero scalings are stretches (determinant $c \neq 0$). The single fact that every elementary matrix is invertible (none has determinant zero) is the algebraic shadow of "every row operation is reversible," which is the geometric shadow of "the solution set never changes." Geometry, algebra, and the algorithm are three views of one fact.

Geometric Intuition — Gaussian elimination quietly factors your matrix into a product of shears, scalings, and swaps. That is not a metaphor: it is literally what the LU decomposition of Chapter 10 makes precise, where the multipliers you use during elimination get recorded in a lower-triangular matrix $L$. For now, just hold the picture: each elimination step shears the rows of the augmented matrix, never collapsing space, always undoable.

You do not need elementary matrices to run the algorithm — by hand you just do the row operation directly. But knowing that each step is an invertible transformation is the reason the whole procedure is trustworthy, and it is the bridge to the factorizations we build later. We will not multiply by elementary matrices again in this chapter; we will simply do the operations. But keep the picture in your pocket.

4.4 How do you actually solve a system by hand? (Forward elimination and back-substitution)

Time for the main event: a full $3\times 3$ system, solved start to finish, every single row operation shown. This is the worked example to internalize; everything else in the chapter is variations on it.

Suppose a small workshop builds three products and a bookkeeper has three facts relating the daily counts $x$, $y$, $z$ of each. (The story barely matters — what matters is the procedure — but a system always comes from somewhere, and here it is three measurements.) The system is

$$\begin{aligned} x + \phantom{1}y + \phantom{1}z &= 6,\\ 2x + 3y + 7z &= 29,\\ \phantom{1}x + 3y - 2z &= 1. \end{aligned}$$

Write it as an augmented matrix:

$$\left[\begin{array}{ccc|c} 1 & 1 & 1 & 6 \\ 2 & 3 & 7 & 29 \\ 1 & 3 & -2 & 1 \end{array}\right].$$

Forward elimination has one goal: create zeros below the diagonal, one column at a time, left to right, producing a "staircase" of leading entries. The number we use to clear a column is called the pivot — it is the leading nonzero entry in the row we are eliminating with. Our first pivot is the $1$ in the top-left corner.

Why work left to right, and why only clear below? The discipline is what makes the method foolproof. Once we have cleared a column below its pivot, every subsequent operation works on rows further down and columns further right, so it never disturbs the zeros we already created — progress is permanent, never undone. Marching left to right also means each new pivot we reach already has zeros to its left (from the columns we finished earlier), so eliminating with it touches only the entries we still care about. The result is a strict downhill march: column 1 done, then column 2, then column 3, each leaving behind a finished, zeroed-out region we never revisit. That monotone progress is precisely why the algorithm is guaranteed to terminate — there are only finitely many columns, and each pass retires one.

Step 1 — clear the first column below the pivot. The pivot is $a_{11} = 1$. We want zeros where the $2$ and the $1$ sit beneath it. The multiplier for row 2 is "what times the pivot gives the entry to kill," i.e. $2/1 = 2$:

$$R_2 \leftarrow R_2 - 2R_1: \qquad \left[\begin{array}{ccc|c} 1 & 1 & 1 & 6 \\ 0 & 1 & 5 & 17 \\ 1 & 3 & -2 & 1 \end{array}\right].$$

(Row 2 became $(2,3,7,29) - 2(1,1,1,6) = (0,1,5,17)$.) Now row 3, with multiplier $1/1 = 1$:

$$R_3 \leftarrow R_3 - 1R_1: \qquad \left[\begin{array}{ccc|c} 1 & 1 & 1 & 6 \\ 0 & 1 & 5 & 17 \\ 0 & 2 & -3 & -5 \end{array}\right].$$

The first column is clean: a pivot on top, zeros below. We never touch column 1 again.

Step 2 — clear the second column below its pivot. The new pivot is $a_{22} = 1$ (the leading entry of row 2). We need a zero where the $2$ sits beneath it in row 3. Multiplier $2/1 = 2$:

$$R_3 \leftarrow R_3 - 2R_2: \qquad \left[\begin{array}{ccc|c} 1 & 1 & 1 & 6 \\ 0 & 1 & 5 & 17 \\ 0 & 0 & -13 & -39 \end{array}\right].$$

(Row 3 became $(0,2,-3,-5) - 2(0,1,5,17) = (0,0,-13,-39)$.) We are done: the matrix is now in row echelon form — every leading entry sits to the right of the one above it, and everything below each leading entry is zero. The staircase is built. Reading the last row back as an equation, it says $-13z = -39$.

The Key Insight — Forward elimination's only job is to manufacture that triangle of zeros below the diagonal. Once a system is triangular, it unravels from the bottom up: the last equation has one unknown, the next-to-last has two (one already known), and so on. The hard part is making the triangle; solving the triangle is easy.

Back-substitution is that unraveling. Start at the bottom row and work up.

Row 3: $-13z = -39 \;\Rightarrow\; z = 3$.
Row 2: $y + 5z = 17 \;\Rightarrow\; y + 5(3) = 17 \;\Rightarrow\; y = 17 - 15 = 2$.
Row 1: $x + y + z = 6 \;\Rightarrow\; x + 2 + 3 = 6 \;\Rightarrow\; x = 1$.

The unique solution is $(x, y, z) = (1, 2, 3)$. Three planes in space, meeting at one point. Let's confirm every digit with numpy.

# Solve A x = b and verify the hand result (1, 2, 3).
import numpy as np
A = np.array([[1, 1,  1],
              [2, 3,  7],
              [1, 3, -2]], dtype=float)
b = np.array([6, 29, 1], dtype=float)
x = np.linalg.solve(A, b)
print(x)            # [1. 2. 3.]
print(A @ x)        # [ 6. 29.  1.]  -> equals b, so the solution checks out

np.linalg.solve returns [1. 2. 3.], and multiplying $A$ by that solution reproduces $\mathbf{b} = (6, 29, 1)$ exactly. The machine confirms the pencil. (Under the hood, numpy does not run textbook Gaussian elimination — it uses an LU factorization with partial pivoting from the LAPACK library, the industrial-strength cousin of what we just did. We meet LU in Chapter 10. The answer is identical; the bookkeeping is optimized.)

Common Pitfall — Forgetting to pivot, or pivoting on a zero. The algorithm above breezed along because each pivot happened to be nonzero. But suppose elimination lands a $0$ in the pivot position — say the top-left entry is $0$ to begin with. You cannot use it as a pivot, because the multiplier is "entry divided by pivot," and dividing by zero is undefined. The fix is to swap in a lower row that has a nonzero entry in that column (elementary operation 1) and continue. Forgetting this — barreling ahead and dividing by a zero pivot — is the single most common way a hand computation (or a naive program) blows up. We will see in §4.6 that a column with no available nonzero pivot is not an error at all; it signals a free variable. But a zero in the pivot slot when a nonzero sits below it always means: swap first.

Computational Note — Real numerical software doesn't only swap to avoid zero pivots; it swaps to avoid small ones. Dividing by a pivot close to zero amplifies rounding error catastrophically. Partial pivoting — at each step, swap up the row whose entry in the current column has the largest absolute value — is the standard defense, and it is what LAPACK (hence numpy and numerical methods) does. By hand we pivot only to dodge exact zeros and to keep arithmetic tidy; on a computer, pivoting for size is essential for stability. Chapter 38 dwells on this; Chapter 10 builds the PLU factorization that records the swaps.

A second worked example: when you must swap

The pitfall above is easier to respect once you've felt it. So here is a system engineered to fail without a swap, worked in full. Watch the very first pivot position:

$$\begin{aligned} \phantom{1}0x + \phantom{1}y - 2z &= -3,\\ \phantom{1}x + 2y + 0z &= 5,\\ 2x + 2y + 3z &= 14. \end{aligned} \qquad\qquad \left[\begin{array}{ccc|c} 0 & 1 & -2 & -3 \\ 1 & 2 & 0 & 5 \\ 2 & 2 & 3 & 14 \end{array}\right].$$

The top-left entry is $0$. We cannot pivot on it — the multiplier "entry over pivot" would divide by zero. The fix is the swap (elementary operation 1): exchange row 1 with a lower row whose first entry is nonzero. Row 2 has a $1$ there, which is ideal, so swap:

$$R_1 \leftrightarrow R_2: \qquad \left[\begin{array}{ccc|c} 1 & 2 & 0 & 5 \\ 0 & 1 & -2 & -3 \\ 2 & 2 & 3 & 14 \end{array}\right].$$

Now the algorithm proceeds exactly as before. The first column already has a $0$ under the pivot in row 2 (a freebie — no operation needed), so only row 3 needs clearing, with multiplier $2/1 = 2$:

$$R_3 \leftarrow R_3 - 2R_1: \qquad \left[\begin{array}{ccc|c} 1 & 2 & 0 & 5 \\ 0 & 1 & -2 & -3 \\ 0 & -2 & 3 & 4 \end{array}\right].$$

The second pivot is the $1$ in row 2. Clear the $-2$ beneath it; the multiplier is $-2/1 = -2$, so we subtract a negative — i.e. add $2R_2$:

$$R_3 \leftarrow R_3 - (-2)R_2: \qquad \left[\begin{array}{ccc|c} 1 & 2 & 0 & 5 \\ 0 & 1 & -2 & -3 \\ 0 & 0 & -1 & -2 \end{array}\right].$$

Echelon form reached. Back-substitute from the bottom:

Row 3: $-z = -2 \Rightarrow z = 2$.
Row 2: $y - 2z = -3 \Rightarrow y - 4 = -3 \Rightarrow y = 1$.
Row 1: $x + 2y = 5 \Rightarrow x + 2 = 5 \Rightarrow x = 3$.

The unique solution is $(x, y, z) = (3, 1, 2)$. The swap cost us nothing — it merely reordered the equations, which (operation 1) never changes the solution. Confirm with numpy:

# A system whose (1,1) entry is zero: the solver swaps internally.
import numpy as np
A = np.array([[0, 1, -2],
              [1, 2,  0],
              [2, 2,  3]], dtype=float)
b = np.array([-3, 5, 14], dtype=float)
x = np.linalg.solve(A, b)
print(x)        # [3. 1. 2.]
print(A @ x)    # [-3.  5. 14.]  -> reproduces b

np.linalg.solve returns [3. 1. 2.] and never complains about the leading zero — because, like your from-scratch solver will, it pivots automatically. The takeaway: a zero in the pivot slot is not the end of the road; it is an instruction to swap. Only when no row has a nonzero entry in that column — so there is nothing to swap in — does the column fail to yield a pivot, and that, as the next section shows, is the signature of a free variable, not an error.

4.5 What is reduced row echelon form, and why go further than echelon?

We stopped at row echelon form (REF) and back-substituted. There is a second, tidier destination worth reaching: reduced row echelon form (RREF), where we keep applying row operations until the matrix is as simple as it can possibly be. RREF has three requirements, building on REF:

It is in row echelon form (staircase of leading entries; zeros below each).
Every leading entry (every pivot) equals $1$ — we call these pivot $1$'s.
Each pivot $1$ is the only nonzero entry in its column — zeros above it as well as below.

Where forward elimination cleared below the pivots, reaching RREF means also clearing above them and scaling each pivot to $1$. This extended process is sometimes called Gauss–Jordan elimination. Let's continue our example from the REF we reached in §4.4:

$$\left[\begin{array}{ccc|c} 1 & 1 & 1 & 6 \\ 0 & 1 & 5 & 17 \\ 0 & 0 & -13 & -39 \end{array}\right].$$

Scale row 3 so its pivot becomes $1$: $R_3 \leftarrow -\tfrac{1}{13}R_3$ gives $(0,0,1,3)$:

$$\left[\begin{array}{ccc|c} 1 & 1 & 1 & 6 \\ 0 & 1 & 5 & 17 \\ 0 & 0 & 1 & 3 \end{array}\right].$$

Clear the third column above the pivot. Use $R_3$ to zero out the $5$ in row 2 and the $1$ in row 1:

$$R_2 \leftarrow R_2 - 5R_3,\quad R_1 \leftarrow R_1 - 1R_3: \qquad \left[\begin{array}{ccc|c} 1 & 1 & 0 & 3 \\ 0 & 1 & 0 & 2 \\ 0 & 0 & 1 & 3 \end{array}\right].$$

Clear the second column above its pivot. Use $R_2$ to zero the $1$ in row 1:

$$R_1 \leftarrow R_1 - 1R_2: \qquad \left[\begin{array}{ccc|c} 1 & 0 & 0 & 1 \\ 0 & 1 & 0 & 2 \\ 0 & 0 & 1 & 3 \end{array}\right].$$

Look at what this says. The left side is the identity matrix; the rows now read $x = 1$, $y = 2$, $z = 3$. In RREF, when the system has a unique solution, the answer appears in the last column with no back-substitution needed at all. The reduction is the solving.

# Reduce the augmented matrix all the way to RREF (via sympy, exact arithmetic).
import sympy as sp
aug = sp.Matrix([[1, 1,  1,  6],
                 [2, 3,  7, 29],
                 [1, 3, -2,  1]])
rref, pivots = aug.rref()
print(rref)        # Matrix([[1,0,0,1],[0,1,0,2],[0,0,1,3]])
print(pivots)      # (0, 1, 2)  -> pivot columns are 0,1,2 (math columns 1,2,3)

The RREF is exactly the identity-augmented form we found by hand, and pivots = (0, 1, 2) reports that columns $0,1,2$ (which, remember, are math-columns $1,2,3$ — sympy and numpy index from 0, the math indexes from 1) all contain pivots. We used sympy here rather than numpy because RREF needs exact fractions to be meaningful — floating-point arithmetic smears the clean zeros and ones, so numpy deliberately offers no rref. Your from-scratch row_reduce at the end of this chapter handles RREF with a tolerance for exactly this reason.

Warning

— REF is not unique, but RREF is. Different choices of legal row operations (different pivot orders, different scalings) can land you at different row echelon forms — all correct, all with the same staircase pattern, but with different numbers. Push all the way to reduced row echelon form, however, and the result is uniquely determined by the original matrix — there is one and only one RREF for a given matrix, no matter how you got there. This uniqueness is what makes RREF a canonical fingerprint of a matrix, and it is why later chapters (the rank, the four subspaces in Part III) lean on RREF rather than plain REF.

So why ever stop at REF? Efficiency. For solving a single system by hand or machine, forward elimination to REF plus back-substitution does less arithmetic than going all the way to RREF — back-substitution is cheap, and clearing the entries above each pivot is extra work that back-substitution accomplishes implicitly. That is why production solvers (and our gaussian_elimination) stop at the triangle and back-substitute. We push on to RREF when we want the structure laid bare — to see pivots and free variables at a glance, to compute a matrix's rank, or to find the full solution set of an underdetermined system, which is exactly where we head next.

A useful way to hold the distinction: REF is the form you want for solving; RREF is the form you want for understanding. When all you need is the numbers $(x, y, z)$, the triangle plus back-substitution is the shortest path. When you want to see a matrix's anatomy — how many independent equations it really contains, which variables are free, what its solution set looks like as a geometric object — the fully reduced form shows everything at once, with no further work to interpret. Both are reached by the same three elementary operations; they differ only in how far you choose to push. Most of this book's later uses of elimination (rank, the four subspaces, the inverse) want the understanding, so they go all the way to RREF.

Check Your Understanding — A $3 \times 3$ system reduces to the REF $\left[\begin{smallmatrix} 2 & 4 & 2 & | & 8 \\ 0 & 3 & 9 & | & 12 \\ 0 & 0 & 5 & | & 10 \end{smallmatrix}\right]$. Without finding the solution, (a) is this REF or RREF, and (b) what single kind of operation turns it toward RREF first?

Answer

(a) It is REF but not RREF: the staircase is right and zeros sit below each leading entry, but the pivots are $2, 3, 5$ (not all $1$) and there are nonzero entries above the pivots. (b) Scale each row by the reciprocal of its pivot — $R_1 \leftarrow \tfrac12 R_1$, $R_2 \leftarrow \tfrac13 R_2$, $R_3 \leftarrow \tfrac15 R_3$ — to make every pivot a $1$, then clear above them by adding multiples. (Back-substitution from this REF, incidentally, gives $z = 2$, then $3y + 18 = 12 \Rightarrow y = -2$, then $2x - 8 + 4 = 8 \Rightarrow x = 6$, so the solution is $(6, -2, 2)$.)

4.6 What are pivots and free variables, and how do they reveal the solution set?

Our main example had a unique solution, and that is the friendliest case. But Chapter 3 warned that a linear system can have no solution or infinitely many. Gaussian elimination doesn't just survive those cases — it diagnoses them, cleanly, through two pieces of vocabulary: pivots and free variables.

After you reduce a matrix to echelon form, every column either contains a pivot (a leading entry of some row) or it does not. A pivot column corresponds to a basic variable (also called a pivot variable) — a variable whose value gets pinned down. A column without a pivot corresponds to a free variable — a variable you are free to set to any value, with the basic variables then determined in terms of it. The presence of even one free variable means infinitely many solutions; the absence of any free variable means at most one.

Let's see infinitely many in action. Consider the system

$$\begin{aligned} \phantom{1}x_1 + 2x_2 + \phantom{1}x_3 &= 1,\\ \phantom{1}x_1 + 2x_2 + 3x_3 &= 5,\\ 2x_1 + 4x_2 + 4x_3 &= 6. \end{aligned}$$

Notice the third equation is the sum of the first two — a redundant constraint, secretly only two independent facts dressed as three. Watch elimination expose that. Augmented matrix and forward elimination:

$$\left[\begin{array}{ccc|c} 1 & 2 & 1 & 1 \\ 1 & 2 & 3 & 5 \\ 2 & 4 & 4 & 6 \end{array}\right] \xrightarrow[\;R_3 \leftarrow R_3 - 2R_1\;]{R_2 \leftarrow R_2 - R_1} \left[\begin{array}{ccc|c} 1 & 2 & 1 & 1 \\ 0 & 0 & 2 & 4 \\ 0 & 0 & 2 & 4 \end{array}\right] \xrightarrow{\;R_3 \leftarrow R_3 - R_2\;} \left[\begin{array}{ccc|c} 1 & 2 & 1 & 1 \\ 0 & 0 & 2 & 4 \\ 0 & 0 & 0 & 0 \end{array}\right].$$

The bottom row vanished entirely — $0 = 0$, a true but empty statement, the algebraic fingerprint of the redundant equation. Now reduce to RREF: scale $R_2$ by $\tfrac12$ to get $(0,0,1,2)$, then $R_1 \leftarrow R_1 - R_2$:

$$\left[\begin{array}{ccc|c} 1 & 2 & 0 & -1 \\ 0 & 0 & 1 & 2 \\ 0 & 0 & 0 & 0 \end{array}\right].$$

Read off the pivots: column 1 has a pivot (in row 1), column 3 has a pivot (in row 2), but column 2 has no pivot. So $x_1$ and $x_3$ are basic; $x_2$ is free. The nonzero rows say:

$$x_1 + 2x_2 = -1 \quad\Rightarrow\quad x_1 = -1 - 2x_2, \qquad\qquad x_3 = 2.$$

Set $x_2 = t$, any real number. The complete solution set, in parametric form, is

$$\begin{bmatrix} x_1 \\ x_2 \\ x_3 \end{bmatrix} = \begin{bmatrix} -1 \\ 0 \\ 2 \end{bmatrix} + t\begin{bmatrix} -2 \\ 1 \\ 0 \end{bmatrix}, \qquad t \in \mathbb{R}.$$

That is a line in space — a particular point $(-1, 0, 2)$ plus all multiples of the direction $(-2, 1, 0)$. Geometrically (Chapter 3), the three planes did not meet at a single point; because one was redundant, they share an entire line. Every point on that line solves the system, so there are infinitely many solutions, one for each value of the free parameter $t$. Let's confirm with numpy that two different choices of $t$ both satisfy the original system.

# Verify the parametric solution: a particular point + t*(direction).
import numpy as np
A = np.array([[1, 2, 1],
              [1, 2, 3],
              [2, 4, 4]], dtype=float)
b = np.array([1, 5, 6], dtype=float)
particular = np.array([-1, 0, 2], dtype=float)     # t = 0
direction  = np.array([-2, 1, 0], dtype=float)     # the free-variable direction
for t in (0.0, 1.0, -3.5):                          # any t works
    x = particular + t * direction
    print(t, A @ x)        # always [1. 5. 6.] == b
# 0.0  [1. 5. 6.]
# 1.0  [1. 5. 6.]
# -3.5 [1. 5. 6.]
print(np.linalg.matrix_rank(A))                    # 2  (not 3 -> a free variable)

For every value of $t$, $A\mathbf{x}$ reproduces $\mathbf{b} = (1, 5, 6)$. And np.linalg.matrix_rank(A) returns 2, not 3 — the matrix has only two independent rows, leaving $3 - 2 = 1$ free variable, exactly the one-parameter line we found. (The rank is the number of pivots; we develop it fully in Chapter 14, but you can already see it counting independent constraints.)

Geometric Intuition — The number of free variables is the dimension of the solution set. Zero free variables → a single point (dimension $0$). One free variable → a line (dimension $1$). Two free variables → a plane (dimension $2$), and so on. Each free variable is one direction you can slide along while staying on every plane at once. This is your first glimpse of a profound organizing idea — solution sets are flat objects (points, lines, planes, hyperplanes) whose dimension you can read straight off the pivots. Chapter 13 makes this the theory of the null space, and the count "$\#\text{variables} - \#\text{pivots} = \#\text{free variables}$" becomes the Rank–Nullity Theorem of Chapter 14.

Common Pitfall — Confusing "a row of zeros" with "no solution." When the bottom row reduces to $0\ 0\ 0\ |\ 0$ (all zeros, including the augmented entry), that is a true statement — a redundant equation — and it signals a free variable / infinitely many solutions, not a contradiction. The disaster row is the other one: $0\ 0\ 0\ |\ \text{(nonzero)}$, which reads $0 = \text{nonzero}$, an impossibility. Read the augmented column carefully: a zero there is harmless; a nonzero there is fatal. We meet that fatal case next.

More than one free variable: a solution plane

One free variable gave a line. What happens with two? This is common whenever a system has more unknowns than independent equations — an underdetermined system, the daily reality of data fitting, computer graphics constraints, and optimization. Take two equations in four unknowns:

$$\begin{aligned} \phantom{1}x_1 + 2x_2 + \phantom{1}0x_3 + \phantom{1}x_4 &= 3,\\ 2x_1 + 4x_2 + \phantom{1}x_3 + 3x_4 &= 8. \end{aligned} \qquad \left[\begin{array}{cccc|c} 1 & 2 & 0 & 1 & 3 \\ 2 & 4 & 1 & 3 & 8 \end{array}\right].$$

One elimination step, $R_2 \leftarrow R_2 - 2R_1$, gives $(0,0,1,1\mid 2)$, and the matrix is already in RREF:

$$\left[\begin{array}{cccc|c} 1 & 2 & 0 & 1 & 3 \\ 0 & 0 & 1 & 1 & 2 \end{array}\right].$$

Pivots sit in columns 1 and 3, so $x_1$ and $x_3$ are basic, while $x_2$ and $x_4$ are free — two free variables. The rows say $x_1 = 3 - 2x_2 - x_4$ and $x_3 = 2 - x_4$. Setting $x_2 = s$ and $x_4 = t$ (any reals), the complete solution set is

$$\begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{bmatrix} = \begin{bmatrix} 3 \\ 0 \\ 2 \\ 0 \end{bmatrix} + s\begin{bmatrix} -2 \\ 1 \\ 0 \\ 0 \end{bmatrix} + t\begin{bmatrix} -1 \\ 0 \\ -1 \\ 1 \end{bmatrix}, \qquad s, t \in \mathbb{R}.$$

A particular point plus two independent directions: that is a plane living inside four-dimensional space. Two free variables, two directions to slide along, a two-dimensional solution set. The pattern is now unmistakable — the count of free variables is the dimension of the solution set, and each free variable contributes one direction vector to the parametric form. Notice, too, the bookkeeping recipe for writing those direction vectors: to get the direction for a free variable, set that free variable to $1$, the other free variables to $0$, and solve for the basic variables. It falls right out of the RREF.

# Two free variables -> a solution plane: a point + s*dir1 + t*dir2.
import numpy as np
A = np.array([[1, 2, 0, 1],
              [2, 4, 1, 3]], dtype=float)
b = np.array([3, 8], dtype=float)
point = np.array([3, 0, 2, 0], dtype=float)     # s = t = 0
dir1  = np.array([-2, 1, 0, 0], dtype=float)     # x2 direction
dir2  = np.array([-1, 0, -1, 1], dtype=float)    # x4 direction
for s, t in [(0, 0), (1, 0), (0, 1), (2, -1)]:
    x = point + s * dir1 + t * dir2
    print((s, t), A @ x)        # always [3. 8.]
print(np.linalg.matrix_rank(A))  # 2 -> 4 - 2 = 2 free variables

Every $(s, t)$ lands on $\mathbf{b} = (3, 8)$, and the rank is $2$, leaving $4 - 2 = 2$ free variables — the plane we found. Hold this two-direction picture: when Chapter 13 builds the null space of a matrix, those two direction vectors $(-2,1,0,0)$ and $(-1,0,-1,1)$ turn out to be a basis for it, and the count "free variables = nullity" becomes a theorem.

4.7 How does elimination tell you a system has no solution?

The third possibility from Chapter 3 is the inconsistent system — equations that contradict each other, like three planes positioned so that no single point lies on all of them. Gaussian elimination catches this with an unmistakable signal. Consider

$$\begin{aligned} \phantom{1}x + 2y - \phantom{1}z &= 1,\\ 2x + 3y + \phantom{1}z &= 4,\\ 3x + 5y + 0z &= 6. \end{aligned}$$

The third equation's left side is exactly the sum of the first two left sides $(1+2,\,2+3,\,-1+1) = (3,5,0)$ — but its right side, $6$, is not the sum of the first two right sides, $1 + 4 = 5$. The equations are quietly at war. Forward elimination drags the contradiction into the open:

$$\left[\begin{array}{ccc|c} 1 & 2 & -1 & 1 \\ 2 & 3 & 1 & 4 \\ 3 & 5 & 0 & 6 \end{array}\right] \xrightarrow[\;R_3 \leftarrow R_3 - 3R_1\;]{R_2 \leftarrow R_2 - 2R_1} \left[\begin{array}{ccc|c} 1 & 2 & -1 & 1 \\ 0 & -1 & 3 & 2 \\ 0 & -1 & 3 & 3 \end{array}\right] \xrightarrow{\;R_3 \leftarrow R_3 - R_2\;} \left[\begin{array}{ccc|c} 1 & 2 & -1 & 1 \\ 0 & -1 & 3 & 2 \\ 0 & 0 & 0 & 1 \end{array}\right].$$

Read the bottom row as an equation: $0x + 0y + 0z = 1$, that is, $0 = 1$. No values of $x, y, z$ can make zero equal one. The system has no solution — it is inconsistent — and the algorithm announced it the instant that impossible row appeared. You can stop immediately; there is nothing to back-substitute.

# Detect inconsistency: rank of A vs rank of the augmented [A | b].
import numpy as np
A   = np.array([[1, 2, -1],
                [2, 3,  1],
                [3, 5,  0]], dtype=float)
b   = np.array([1, 4, 6], dtype=float)
aug = np.column_stack([A, b])
print(np.linalg.matrix_rank(A))     # 2
print(np.linalg.matrix_rank(aug))   # 3  -> ranks differ => inconsistent

The two ranks disagree: $\operatorname{rank}(A) = 2$ but $\operatorname{rank}([A\mid\mathbf{b}]) = 3$. Appending $\mathbf{b}$ added a genuinely new pivot — that is precisely the impossible row. This gives the clean, completely general test that ends all suspense about a linear system.

The Key Insight — the existence/uniqueness test. Reduce $[A \mid \mathbf{b}]$ to echelon form and compare ranks (pivot counts): - No solution (inconsistent) ⟺ a pivot appears in the augmented column, i.e. a row $[\,0\ \cdots\ 0 \mid \text{nonzero}\,]$ — equivalently $\operatorname{rank}([A\mid\mathbf{b}]) > \operatorname{rank}(A)$. - Exactly one solution ⟺ consistent and every variable is basic (no free variables), i.e. $\operatorname{rank}(A) = \operatorname{rank}([A\mid\mathbf{b}]) = \#\text{variables}$. - Infinitely many solutions ⟺ consistent and at least one free variable, i.e. $\operatorname{rank}(A) = \operatorname{rank}([A\mid\mathbf{b}]) < \#\text{variables}$.

These three lines settle every linear system, of any size, completely. There are no other cases.

It is worth pausing on how decisively this resolves the central question of Chapter 3 — how many solutions does a system have? — that we could previously only answer by visualizing intersecting planes. Now it is mechanical: reduce, count pivots, compare. The geometry (one point / one line / parallel-and-missing planes) and the algebra (zero / one / many pivots in the right places) are the same story told twice, and Gaussian elimination is the translator between them.

It also dissolves a question that may have nagged you in Chapter 3: with three planes in space, couldn't there be all sorts of weird intersection patterns? There cannot. The pivot count proves that a consistent system's solution set is always a single flat object — a point, a line, a plane, or a higher-dimensional flat — never a curve, never two disconnected pieces, never anything exotic. This flatness is the deep reason linear systems are so tractable, and it is a preview of the central fact of Chapter 6: the solutions of $A\mathbf{x}=\mathbf{0}$ form a subspace, and the solutions of $A\mathbf{x}=\mathbf{b}$ form that subspace shifted to pass through one particular solution. The three "shapes" you can get are not a coincidence; they are forced by linearity.

Check Your Understanding — Each augmented matrix below is already in echelon form. For each, decide whether the system is inconsistent, has a unique solution, or has infinitely many — and if infinitely many, how many free variables. (i) $\left[\begin{smallmatrix} 1 & 2 & | & 3 \\ 0 & 1 & | & 4 \end{smallmatrix}\right]$; (ii) $\left[\begin{smallmatrix} 1 & 2 & 5 & | & 1 \\ 0 & 0 & 3 & | & 2 \\ 0 & 0 & 0 & | & 0 \end{smallmatrix}\right]$; (iii) $\left[\begin{smallmatrix} 1 & 1 & | & 2 \\ 0 & 0 & | & 7 \end{smallmatrix}\right]$.

Answer

(i) Unique. Two variables, two pivots (columns 1 and 2), no zero rows, no free variables — back-substitution gives $y = 4$, then $x = 3 - 8 = -5$. (ii) Infinitely many, one free variable. Three variables, pivots in columns 1 and 3, so column 2 ($x_2$) is free; the all-zero bottom row is harmless. The solution set is a line. (iii) Inconsistent. The second row reads $0 = 7$, impossible — no solution. The rank of the coefficient part is $1$ but the rank of the augmented matrix is $2$, the tell-tale mismatch.

Real-World Application — Circuit analysis (electrical engineering / signals). When you analyze an electrical circuit with Kirchhoff's laws, you get one linear equation per node (current in = current out) and one per loop (voltages sum to zero around a loop). For a circuit with many components this is a large linear system $A\mathbf{x} = \mathbf{b}$ in the unknown branch currents and node voltages, and every circuit simulator — SPICE, the tool behind essentially all chip design — solves it by a pivoted Gaussian elimination at its core (millions of times, as it steps through time). An inconsistent system there would mean a physically impossible circuit (e.g., two ideal voltage sources fighting across a wire); a system with free variables signals an under-constrained circuit with floating nodes. The same three outcomes you just learned to read off the pivots are how a simulator tells a sound design from a broken one. We will see another network application — traffic flow — in this chapter's case study, and circuits return in the context of orthogonality and least squares later in the book.

4.8 Why is elimination so efficient, and where does it strain?

You might wonder whether there is a faster way to solve $A\mathbf{x} = \mathbf{b}$ than this column-by-column grind. For a single dense system, essentially no — Gaussian elimination is the workhorse precisely because it is cheap and complete. Counting the arithmetic, forward elimination on an $n \times n$ system costs on the order of $\tfrac{2}{3}n^3$ floating-point operations, and back-substitution adds only about $n^2$ — negligible by comparison. That cubic cost is why doubling the size of a system makes it roughly eight times slower, and why solving a million-equation system is a serious computation even for a supercomputer.

Two contrasts sharpen the point. First, you might be tempted to solve $A\mathbf{x} = \mathbf{b}$ by computing the inverse matrix $A^{-1}$ (Chapter 9) and forming $A^{-1}\mathbf{b}$. Don't. Computing $A^{-1}$ is itself a Gaussian elimination — and a more expensive one — followed by an extra matrix–vector multiply, so it does strictly more work and, worse, is numerically less stable. The professional's rule, which you will hear repeated in every numerical computing course, is "never invert a matrix to solve a system; eliminate." numpy's np.linalg.solve(A, b) honors this: it does not form the inverse; it runs a pivoted LU elimination directly.

Second, when the same matrix $A$ must be solved against many different right-hand sides $\mathbf{b}_1, \mathbf{b}_2, \dots$, repeating full elimination each time is wasteful, because the expensive part (eliminating $A$) is identical every time. The fix is to do the elimination once and save its result as a reusable factorization — the LU decomposition of Chapter 10, $A = LU$ — after which each new right-hand side costs only a cheap forward-and-back substitution ($\sim n^2$ instead of $\sim n^3$). This is exactly the kind of "do the hard work once, reuse it" thinking that recurs throughout computational linear algebra.

Real-World Application — Input–output economics. In Chapter 1 we met Wassily Leontief's input–output model [verify], where industries consume each other's output: making a dollar of steel needs some coal and electricity, making electricity needs some steel, and so on. If $A$ is the consumption matrix ($a_{ij}$ = dollars of industry $i$'s output needed per dollar of industry $j$'s output) and $\mathbf{d}$ is consumer demand, then total production $\mathbf{x}$ must satisfy $\mathbf{x} = A\mathbf{x} + \mathbf{d}$, which rearranges into a linear system $(I - A)\mathbf{x} = \mathbf{d}$ — solved by exactly the elimination of this chapter. For a tiny two-sector economy with $A = \left[\begin{smallmatrix} 0.2 & 0.3 \\ 0.4 & 0.1\end{smallmatrix}\right]$ and demand $\mathbf{d} = (20, 10)$, solving $(I-A)\mathbf{x}=\mathbf{d}$ gives $\mathbf{x} \approx (35, 26.67)$ — the gross output each sector must produce to meet demand and feed the other sector. A government economist re-solves this for dozens of demand scenarios against the same $(I - A)$, which is precisely why the factor-once-reuse-many idea above is not academic: it is how national-economy models with hundreds of sectors stay fast. (Real input–output tables, like the U.S. Bureau of Economic Analysis accounts, run to hundreds of sectors.) This is linear algebra steering policy, far from any physics lab.

# Leontief input-output: (I - A) x = d, same elimination as this chapter.
import numpy as np
A = np.array([[0.2, 0.3],
              [0.4, 0.1]])              # consumption matrix
d = np.array([20.0, 10.0])             # consumer demand
x = np.linalg.solve(np.eye(2) - A, d)  # solve (I - A) x = d
print(np.round(x, 4))                  # [35.     26.6667]
print(np.round(x - A @ x, 4))          # [20. 10.] -> equals d, checks out

Geometric Intuition — Where does elimination strain? When a pivot is tiny but not zero, dividing by it magnifies every rounding error, and the answer can be wildly wrong even though the algorithm "succeeded." Geometrically this happens when two planes are nearly parallel: they do intersect in a point, but the point's location is hypersensitive to the exact coefficients, so finite-precision arithmetic struggles to pin it down. The matrix is then called ill-conditioned, and partial pivoting (§4.4's Computational Note) is the first line of defense. Chapter 38 quantifies this with the condition number. For now, just register that "the algorithm finished" and "the answer is accurate" are not the same promise when floating point is involved.

These efficiency themes — the cubic cost, never inverting, factor-once-reuse-many, and conditioning — connect directly to the broader study of numerical methods, where Gaussian elimination is the foundational example. They also explain why the algorithm you are about to build by hand in Python is, structurally, the same algorithm running inside every scientific application on Earth. The implementation details (pivoting strategy, memory layout, blocking for cache efficiency) differ; the core idea — systematically eliminate, then back-substitute — does not.

4.9 What does the from-scratch algorithm look like in code?

You have now done Gaussian elimination by hand in three flavors — unique, infinite, inconsistent — and watched numpy confirm each. The deepest way to own an algorithm is to implement it yourself, so this chapter's toolkit contribution is exactly that: the solver, written from scratch in pure Python, using numpy only to check the result. Let's sketch the logic before you build it (the full module is assembled separately; the callout names what you write).

The three pieces mirror the three things you did by hand. back_substitute(U, b) takes an upper-triangular $U$ and unravels from the bottom row up, exactly as in §4.4: for each row $i$ from last to first, subtract off the already-known terms and divide by the diagonal pivot. gaussian_elimination(A, b) does forward elimination with partial pivoting (swap up the largest-magnitude entry in each column to dodge zero/small pivots), reaching an upper-triangular system, then calls back_substitute. row_reduce(A) goes all the way to RREF — scaling each pivot to $1$ and clearing both below and above — which handles the general case (including free variables) and is what you would use to read off rank and the full solution set. Here is the heart of forward elimination, the loop you will flesh out:

# Sketch of forward elimination with partial pivoting (pure Python, no numpy).
# A is a list of rows; b is the right-hand side. Math is 1-indexed; code is 0-indexed.
def forward_eliminate(A, b):
    n = len(A)
    M = [list(map(float, A[i])) + [float(b[i])] for i in range(n)]  # augmented
    for col in range(n):                          # column we are clearing
        # partial pivoting: bring the largest |entry| in this column to the top
        best = max(range(col, n), key=lambda r: abs(M[r][col]))
        M[col], M[best] = M[best], M[col]         # swap rows (elementary op 1)
        for r in range(col + 1, n):               # clear entries below the pivot
            factor = M[r][col] / M[col][col]      # the multiplier from section 4.4
            M[r] = [M[r][k] - factor * M[col][k] for k in range(n + 1)]
    return M                                      # now upper-triangular (+ aug col)

That is the same arithmetic you did with pencil — factor = M[r][col] / M[col][col] is literally "$2/1$" and "$1/1$" and "$2/1$" from the worked example — wrapped in two for loops over rows and columns. If the loop structure feels familiar, it should: it is the canonical nested-loop pattern, and the broader story of building algorithms this way is told in loops and algorithms in Python. Eliminating a column is an inner loop; marching across columns is the outer loop; the whole thing is $O(n^3)$ exactly because of those two nested sweeps over $n$ rows and $n$ columns, with $n$ arithmetic operations each.

Build Your Toolkit — Add toolkit/linear_systems.py with three functions, all in pure Python (no numpy in the implementation): - back_substitute(U, b) — solve an upper-triangular system $U\mathbf{x}=\mathbf{b}$ by the bottom-up unraveling of §4.4. For row $i$ (counting down), compute $x_i = \big(b_i - \sum_{j>i} U_{ij}x_j\big)/U_{ii}$; raise an error if a diagonal pivot $U_{ii}$ is (near) zero. - gaussian_elimination(A, b) — forward-eliminate with partial pivoting (swap up the largest-magnitude entry in each column), then call back_substitute. Raise a clear error if a column has no usable pivot (singular → no unique solution). - row_reduce(A) — reduce all the way to RREF: scale each pivot to $1$, clear above and below it, track the pivot columns, and use a small tolerance (e.g. 1e-12) to decide when an entry counts as zero. Return the reduced matrix and the list of pivot columns.

Verify, don't trust: for random systems compare gaussian_elimination(A, b) against np.linalg.solve(A, b) with np.allclose, check back_substitute against scipy.linalg.solve_triangular, and confirm row_reduce reproduces the RREF of your hand examples. Watch the indexing: the math says "the pivot in row $1$, column $1$"; your code says M[0][0]. A reference implementation and tests live in toolkit/tests/. This module is the backbone of the toolkit — Chapter 9's inverse, Chapter 10's LU, and Chapter 11's determinant all build on the elimination you write here.

When your gaussian_elimination returns [1.0, 2.0, 3.0] for the §4.4 system and np.allclose flashes True against numpy across a hundred random matrices, you will have proven to yourself that the algorithm is not a black box. You wrote the box. That is the feeling this whole toolkit is built to give you — understanding you can run — and it starts with the most important algorithm in computational linear algebra.

4.10 Putting it together: the algorithm, start to finish

Let's consolidate. Given any system $A\mathbf{x} = \mathbf{b}$, here is the complete procedure you now command, and the meaning of every step:

Write the augmented matrix $[A \mid \mathbf{b}]$ — equations become rows, variables become columns.
Forward-eliminate to row echelon form using the three elementary row operations, pivoting (swapping) whenever a pivot position holds a zero (or, on a computer, a dangerously small number). Each operation is reversible, so the solution set never changes — that is the guarantee that makes the whole thing legal.
Inspect the echelon form. A row $[\,0\cdots 0 \mid \text{nonzero}\,]$ means inconsistent — stop, no solution. Otherwise, count pivots.
Identify free variables — every non-pivot column among the variables. None → unique solution. One or more → infinitely many, with one parameter per free variable.
Solve. Either back-substitute from the echelon form, or push on to RREF and read the answer (and the parametric form, if there are free variables) directly off the result.

Five steps, and they handle every linear system that will ever cross your desk — a $2\times 2$ on a napkin or a sparse $10^6 \times 10^6$ system inside a weather model. The arithmetic scales; the logic does not change.

Step back and notice what you have gained, because it is more than a computation. In Chapter 3 you learned that a linear system's solution set is a point, a line, a plane, or empty — geometry you could see but not yet command. Gaussian elimination is the bridge from seeing to finding: it turns the geometric question "where do these surfaces meet?" into the mechanical question "what are the pivots?", and the two answers always agree. This is the recurring theme of the whole book made concrete — geometry and algebra are two views of one object — and row reduction is one of the cleanest places to watch the two views snap together.

And remember the framing we opened with. Row reduction is not the point of linear algebra; it is a tool, and a magnificent one. It solves systems, yes, but it also reveals structure — the pivots tell you which directions a matrix genuinely uses (its rank, Chapter 14), which combinations of inputs it can produce (its column space, Chapter 13), and which inputs it secretly ignores (its null space, the free-variable directions). When we build the four fundamental subspaces in Part III, RREF will be the microscope we use to see them. The algorithm you learned to solve equations turns out, in the chapters ahead, to be the algorithm that x-rays a matrix.

Historical Note — The method is named for Carl Friedrich Gauss, who used systematic elimination around 1810 to compute the orbit of the asteroid Pallas from astronomical observations, popularizing the technique in the West [verify]. But Gauss neither invented it nor claimed to: essentially the same procedure appears almost two thousand years earlier in the Chinese mathematical classic Jiuzhang Suanshu (The Nine Chapters on the Mathematical Art), whose "Fangcheng" chapter solves systems of linear equations by operations on a counting-board array that are exactly our elementary row operations [verify]. The "Gauss–Jordan" refinement to RREF is associated with Wilhelm Jordan, who described it in an 1888 geodesy handbook (and, independently, B.-I. Clasen) — though, as with so much of mathematics, the attribution is tidier than the history [verify]. The lesson echoes Chapter 1: the idea — simplify a system without changing its solutions — long predates the names and notation we hang on it.

In the next chapter the ground tilts upward. We stop asking "how do I solve this particular system?" and start asking "what is a space of solutions, really?" — letting go of arrows-in-the-plane to discover that polynomials, functions, and matrices are vectors too. The free-variable directions you found in §4.6 are your first concrete example of the deeper structure Chapter 5 unveils: a subspace, an entire flat world of solutions threading through space. The algorithm is behind you; the architecture is ahead.