Systems of Linear Equations: The Problem That Started It All

DataField.Dev

47 min read

> Learning paths. Math majors — read everything, especially the precise conditions for consistency and the Math-Major Sidebar on rank and the Rouché–Capelli theorem. CS / Data Science — focus on the Geometric Intuition callouts, the column picture...

Prerequisites

chapter-02-vectors

Learning Objectives

Define a system of linear equations and identify whether an equation is linear, distinguishing coefficients, unknowns, and the constant on the right-hand side.
Read a 2x2 or 3x3 system in TWO geometric ways: the row picture (intersecting lines or planes) and the column picture (is the right-hand side a linear combination of the columns?).
Classify a system's solution set as none, exactly one, or infinitely many, and explain each case geometrically as parallel/intersecting/coincident lines or planes.
State the precise conditions that distinguish consistent from inconsistent systems and unique from infinitely many solutions, using rank and the augmented matrix.
Use np.linalg.solve for a unique system and predict what numpy does for a singular (inconsistent or dependent) system.
Recognize real problems — economic input-output, network flows, web ranking — as linear systems waiting to be solved.

In This Chapter

3.1 What is a system of linear equations, and why did it start everything?
3.2 What does a linear system look like? The row picture
3.3 What is the column picture, and why is it the deeper view?
3.4 How do you solve a system, and what does numpy actually return?
3.5 When does a system have no solution? Inconsistency, geometrically
3.6 When does a system have infinitely many solutions? Dependence and free variables
3.7 What conditions decide the solution count? Rank, stated precisely (A track)
3.8 How big can these systems get? From two unknowns to the whole web
3.9 What is the augmented matrix, and why bother with it?
3.10 Why are linear systems "linear"? A look back at superposition
3.11 Putting it together: a worked classification
3.12 How do you turn a word problem into a system of linear equations?
3.13 What you should carry forward

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Systems of Linear Equations: The Problem That Started It All

Learning paths. Math majors — read everything, especially the precise conditions for consistency and the Math-Major Sidebar on rank and the Rouché–Capelli theorem. CS / Data Science — focus on the Geometric Intuition callouts, the column picture (it seeds the column-space idea you will live in from Chapter 13 on), the numpy snippets, and the applications. Physics / Engineering — focus on the geometry of intersecting planes and the network/circuit applications, and keep the two pictures — rows as surfaces, columns as combinations — side by side in your head. This chapter assumes only Chapter 2: you know what a vector is, and you know what a linear combination is. We deliberately do not teach the solving algorithm here — that is Chapter 4. This chapter is about what a system means and what its solutions look like.

3.1 What is a system of linear equations, and why did it start everything?

Two thousand years ago, in a Chinese text called the Nine Chapters on the Mathematical Art, someone posed a problem about bundles of grain. Three bundles of good grain, two of mediocre, and one of poor yield a known amount; a different mix yields a different amount; a third mix, a third amount. How much grain is in each kind of bundle? The text then lays out a procedure — arrange the numbers in a grid, subtract multiples of one row from another, and read off the answer — that is, essentially, the algorithm we now call Gaussian elimination, roughly eighteen centuries before Gauss [verify]. The problem of several simultaneous linear conditions is one of the oldest in mathematics, and it is no exaggeration to say it is the problem that started all of linear algebra. Everything in this book — matrices, determinants, eigenvalues, the four subspaces — grew, directly or indirectly, out of the need to understand systems like that one.

So let us say precisely what such a system is. A linear equation in the unknowns $x_1, x_2, \dots, x_n$ is an equation of the form

$$a_1 x_1 + a_2 x_2 + \cdots + a_n x_n = b,$$

where the numbers $a_1, \dots, a_n$ are the coefficients and $b$ is the constant on the right-hand side. The defining feature is that each unknown appears to the first power only, multiplied by a constant, and the terms are added. There is no $x_1^2$, no $x_1 x_2$, no $\sin(x_1)$, no $1/x_1$. The equation $3x + 2y = 7$ is linear; $3x^2 + 2y = 7$ is not, and neither is $3xy = 7$ nor $3\sqrt{x} = 7$. A system of linear equations is simply a collection of such equations that must all hold at once — the same unknowns, several constraints, all true simultaneously.

The Key Insight — The "system of equations meaning" is this: a system is a list of conditions that the unknowns must all satisfy together. Each equation, on its own, allows many possibilities. The system asks for the possibilities that survive every constraint at once — the overlap. Solving the system is finding that overlap.

Here is a small system in two unknowns that we will return to again and again throughout the chapter:

$$ \begin{aligned} 2x + y &= 5,\\ x - y &= 1. \end{aligned} $$

A solution is an assignment of numbers to the unknowns that makes every equation true at once. The pair $x = 2,\ y = 1$ is a solution here: $2(2) + 1 = 5$ ✓ and $2 - 1 = 1$ ✓. The solution set is the set of all such assignments. For this little system the solution set happens to contain exactly one point, but that is not guaranteed — and the central drama of this chapter is that a system of linear equations can have no solutions, exactly one solution, or infinitely many solutions, and never any other number. You will never find a linear system with exactly two solutions, or exactly seventeen. Understanding why only those three outcomes are possible is the heart of what follows, and the explanation is geometric.

Why does this matter beyond grain bundles? Because once you train your eye, you see linear systems everywhere. The currents in an electrical circuit are the unknowns in a linear system (Kirchhoff's laws). The prices that balance supply and demand across an economy form a linear system — we will spend a whole case study on the input-output models in economics that won Wassily Leontief a Nobel Prize. The flow of traffic through a road network, the concentrations in a chemical mixture, the weights that best fit a line to data — all linear systems. And, most spectacularly, the importance of every page on the web can be computed as the solution to a single, gigantic linear system with billions of unknowns. We will meet that system, PageRank, at the end of this chapter as a tantalizing preview, and we will finally solve it in Chapter 29.

3.2 What does a linear system look like? The row picture

Let us obey the first law of this book and draw the picture before we manipulate symbols. Take our running system:

$$ \begin{aligned} 2x + y &= 5,\\ x - y &= 1. \end{aligned} $$

Consider the first equation, $2x + y = 5$, all by itself. In the $xy$-plane, the set of all points $(x, y)$ that satisfy it is a straight line — you learned this in high school as $y = -2x + 5$, a line of slope $-2$ crossing the $y$-axis at $5$. Every point on that line satisfies the first equation; no point off it does. Now the second equation, $x - y = 1$, is also a line, namely $y = x - 1$. A solution to the system must lie on both lines at once. Geometrically, then, solving the system means finding where the two lines intersect.

Geometric Intuition — In the row picture, each equation in two unknowns is a line in the plane, and the solution set is the intersection of those lines. Each equation in three unknowns is a plane in space, and the solution set is the intersection of those planes. The word "row" is literal: each row of the system is one geometric surface, and we are hunting for the points common to all of them.

Two lines in a plane can be arranged in exactly three ways, and these are precisely the three solution-count cases:

They cross at a single point. This is the generic case — pick two lines at random and they almost certainly cross once. The system has exactly one solution. Our running system is like this: the lines $y = -2x + 5$ and $y = x - 1$ cross at $(2, 1)$.
They are parallel and distinct — they never meet. Then no point lies on both lines, so the system has no solution. We call such a system inconsistent. Example: $x + y = 2$ and $x + y = 5$ are parallel lines (same slope, different intercepts); there is no $(x, y)$ with $x + y$ equal to both $2$ and $5$ at once.
They are the same line. Then every point on that line satisfies both equations, so the system has infinitely many solutions. Example: $x + y = 2$ and $2x + 2y = 4$ describe the identical line — the second equation is just twice the first, carrying no new information.

That is the whole story in 2D, and it already explains the "none / one / infinitely many" rule. Two lines cannot cross at exactly two points (two distinct points determine a unique line, so if they shared two points they would be the same line and share all points). The three cases are exhaustive and mutually exclusive. Figure 3.1 shows all three.

# Figure 3.1 — the three row-picture cases for two lines in the plane.
import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(-1, 4, 200)
fig, axes = plt.subplots(1, 3, figsize=(13, 4))

# (1) one solution: 2x + y = 5  and  x - y = 1  meet at (2, 1)
axes[0].plot(x, 5 - 2*x, label="2x + y = 5")
axes[0].plot(x, x - 1,   label="x - y = 1")
axes[0].plot(2, 1, "ko"); axes[0].set_title("Exactly one solution")

# (2) no solution: x + y = 2 and x + y = 5 are parallel
axes[1].plot(x, 2 - x, label="x + y = 2")
axes[1].plot(x, 5 - x, label="x + y = 5")
axes[1].set_title("No solution (parallel)")

# (3) infinitely many: x + y = 2 and 2x + 2y = 4 are the SAME line
axes[2].plot(x, 2 - x, lw=4, alpha=0.4, label="x + y = 2")
axes[2].plot(x, 2 - x, "--", label="2x + 2y = 4")
axes[2].set_title("Infinitely many (same line)")

for ax in axes:
    ax.set_aspect("equal"); ax.grid(True, alpha=0.3); ax.legend(fontsize=8)
# plt.show()

Figure 3.1: The three possible solution sets for a system of two equations in two unknowns, drawn as lines. Left: lines crossing once (one solution). Middle: parallel distinct lines (no solution). Right: coincident lines (infinitely many solutions). The alt-text: three side-by-side plots showing two crossing lines, two parallel lines, and two overlapping lines.

Common Pitfall — Students often think a system with "more equations than unknowns" must have no solution, or that "fewer equations than unknowns" must have infinitely many. Neither rule is reliable. Three lines through a single common point give a perfectly consistent over-determined system with one solution; and two identical equations in two unknowns (an apparently square system) give infinitely many. What controls the solution count is the geometry — how the surfaces actually sit relative to one another — not the head-count of equations versus unknowns. We make the precise version (rank) explicit in §3.7.

Stepping up to three dimensions

In three unknowns $(x, y, z)$, a single linear equation $a_1 x + a_2 y + a_3 z = b$ describes a plane in space (a flat, infinite sheet). A system of three such equations asks for the points lying on all three planes at once. The arrangements get richer, but the same three outcomes — and only those three — appear:

Three planes meeting at a single point → one solution. Think of two walls and the floor of a room meeting at a corner.
Three planes with no common point → no solution. This can happen even when no two planes are parallel: imagine three planes arranged like the three sides of a triangular prism — each pair of planes meets in a line, but the three lines are parallel and never share a point. There is also the simpler case where two of the planes are parallel.
Three planes sharing a whole common line (or a common plane) → infinitely many solutions. Picture the pages of an open book all hinged along the same spine: every plane contains the spine, so every point of the spine solves the system.

Geometric Intuition — The anchor image to carry through this chapter is the intersection of planes. Hold up three flat sheets of cardboard. Slide them so all three pass through one pencil tip — that is the unique-solution case. Now tilt one so the common point disappears but the sheets still tilt every which way — inconsistent, no solution. Finally, fan them like book pages around a shared pencil held as the spine — every point on the pencil works, infinitely many solutions. Three pieces of cardboard and a pencil teach the entire solution-count theorem in three dimensions.

We can confirm the "single point" case numerically. Consider three planes that meet at one corner:

$$ \begin{aligned} x + y + z &= 6,\\ 2y + 5z &= -4,\\ 2x + 5y - z &= 27. \end{aligned} $$

# Three planes meeting at a single point — expect a unique (x, y, z).
import numpy as np
A = np.array([[1.0, 1.0,  1.0],
              [0.0, 2.0,  5.0],
              [2.0, 5.0, -1.0]])
b = np.array([6.0, -4.0, 27.0])
print(np.linalg.solve(A, b))   # -> [ 5.  3. -2.]

The solver returns $(x, y, z) = (5, 3, -2)$, the single point common to all three planes. (We will see in Chapter 4 how np.linalg.solve finds it; here we only care that the geometry — three planes, one shared point — matches the algebra.)

Check Your Understanding — In three dimensions, can a system of two linear equations ever have exactly one solution?
Answer
No. Two planes in space either are parallel and distinct (no solution), or coincide (a whole plane of solutions), or cross in a line (infinitely many solutions). Two planes can never meet in exactly one point — you need a third, suitably tilted plane to pin down a unique point. So a two-equation system in three unknowns has either zero or infinitely many solutions, never exactly one. This is the row picture telling you something the head-count of equations alone could not.

It is worth cataloguing the 3D cases a little more carefully, because three planes can fail to meet at a point in several visually different ways, and recognizing the configuration at a glance is a genuinely useful skill. With three planes you can encounter: (a) one common point (the room-corner — unique); (b) two parallel distinct planes plus any third (no solution — the parallel pair already has no common point); (c) all three parallel and distinct (no solution); (d) three planes forming a triangular prism, no two parallel, yet no common point (no solution — the most surprising case); (e) three planes all sharing one common line, like book pages on a spine (a line of solutions — infinitely many); and (f) all three planes coincident (a whole plane of solutions — infinitely many). Cases (b), (c), and (d) are all "no solution," and cases (e) and (f) are both "infinitely many" — confirming once more that only the three outcomes ever occur, however the planes are arranged. The configuration that produces "no solution" without any parallel planes (case d) is the one that trips people up, so it is the one worth fixing in memory: dependent normals, inconsistent constants.

We can confirm the "common line" case (e) numerically. Take three planes engineered to share a line — the third equation is the sum of the first two on both sides, so it is genuinely redundant rather than contradictory:

# Three planes sharing a common LINE (case e) — expect infinitely many solutions.
import numpy as np
A = np.array([[1.0, 1.0, 0.0],
              [0.0, 1.0, 1.0],
              [1.0, 2.0, 1.0]])   # row3 = row1 + row2  (dependent)
b = np.array([1.0, 1.0, 2.0])     # and 1 + 1 = 2, so CONSISTENT (no contradiction)
print("rank(A)     =", np.linalg.matrix_rank(A))                        # -> 2
print("rank([A|b]) =", np.linalg.matrix_rank(np.column_stack([A, b])))  # -> 2

Both ranks are $2$, and $2 < 3$ unknowns, so the planes meet in a line ($3 - 2 = 1$ free variable) — the open-book picture. Change the last constant from $2$ to $5$, breaking consistency, and you fall into the triangular-prism case (d) instead: the rank of the augmented matrix jumps to $3$ while $\operatorname{rank}(A)$ stays at $2$, and the planes share nothing. The same coefficient matrix gives infinitely many or no solutions depending only on whether $\mathbf{b}$ is consistent with the dependence among the rows — a preview of how delicately the right-hand side controls the outcome once the coefficient matrix is singular.

3.3 What is the column picture, and why is it the deeper view?

The row picture is the one most people meet first, and it is genuinely useful. But there is a second geometric reading of the very same system, and for the rest of linear algebra it is the more important of the two. It does not look at the rows of the system; it looks at the columns. This is the picture that will grow, in Chapter 13, into the single most important concept of the subject — the column space of a matrix — and it connects directly to the matrix–vector product you met in Chapter 1.

Start, as always, with our running system, and this time stack the coefficients into columns. The system

$$ \begin{aligned} 2x + y &= 5,\\ x - y &= 1 \end{aligned} $$

can be rewritten by pulling out the column of $x$-coefficients, the column of $y$-coefficients, and the column of constants:

$$ x \begin{bmatrix} 2 \\ 1 \end{bmatrix} + y \begin{bmatrix} 1 \\ -1 \end{bmatrix} = \begin{bmatrix} 5 \\ 1 \end{bmatrix}. $$

Look hard at what just happened. We have exactly the same equations — multiply out the left side and you recover $2x + y = 5$ in the top entry and $x - y = 1$ in the bottom. But the question the equation poses now sounds completely different. It is no longer "where do two lines cross?" It is now: what amounts $x$ and $y$ do I need so that $x$ copies of the first column plus $y$ copies of the second column land exactly on the target vector $\mathbf{b} = (5, 1)$? Naming the columns $\mathbf{a}_1 = (2, 1)$ and $\mathbf{a}_2 = (1, -1)$, the question is: for which scalars $x, y$ is

$$x\,\mathbf{a}_1 + y\,\mathbf{a}_2 = \mathbf{b}?$$

That phrase — "$x$ times this vector plus $y$ times that vector" — is exactly the linear combination from Chapter 2. So the column picture recasts the whole problem in the language you already learned:

The Key Insight — The column picture asks: is the right-hand side $\mathbf{b}$ a linear combination of the columns of the system — and if so, with what weights? The weights are the solution. Solving a linear system is identical to expressing the target vector as a combination of the column vectors.

Geometrically, the two columns $\mathbf{a}_1 = (2,1)$ and $\mathbf{a}_2 = (1,-1)$ are arrows in the plane. Scaling $\mathbf{a}_1$ by $x$ stretches it along its own direction; scaling $\mathbf{a}_2$ by $y$ stretches it along its direction; adding the two (tip to tail, exactly as in Chapter 2) lands you somewhere. The question is whether some choice of stretches lands you precisely on the tip of $\mathbf{b}$. For our system the answer is $x = 2,\ y = 1$, and we can watch the two scaled arrows add up to $\mathbf{b}$:

# Column picture: is b a linear combination of the columns a1, a2?
import numpy as np
a1 = np.array([2.0, 1.0])     # first column (x-coefficients)
a2 = np.array([1.0, -1.0])    # second column (y-coefficients)
b  = np.array([5.0, 1.0])     # target on the right-hand side
print(2*a1 + 1*a2)            # -> [5. 1.]  ==  b, so (x, y) = (2, 1)

The output [5. 1.] equals $\mathbf{b}$, confirming that two of the first arrow plus one of the second reaches the target. Figure 3.2 draws this tip-to-tail combination.

# Figure 3.2 — the column picture: 2*a1 + 1*a2 reaches b.
import numpy as np
import matplotlib.pyplot as plt
a1 = np.array([2.0, 1.0]); a2 = np.array([1.0, -1.0]); b = np.array([5.0, 1.0])
fig, ax = plt.subplots(figsize=(5.5, 5.5))
# draw 2*a1, then a2 added tip-to-tail
ax.arrow(0, 0, *(2*a1), color="C3", width=0.04, length_includes_head=True)   # 2 a1
ax.arrow(*(2*a1), *a2, color="C2", width=0.04, length_includes_head=True)    # + a2
ax.arrow(0, 0, *b, color="k", ls="--", width=0.02, length_includes_head=True) # b
ax.text(*(a1), " 2·a1", color="C3"); ax.text(*(2*a1 + 0.5*a2), " a2", color="C2")
ax.text(*b, "  b", color="k")
ax.set_xlim(-1, 6); ax.set_ylim(-2, 4); ax.set_aspect("equal"); ax.grid(True, alpha=0.3)
# plt.show()

Figure 3.2: The column picture for $2x+y=5,\ x-y=1$. The red arrow is $2\mathbf{a}_1$; the green arrow $\mathbf{a}_2$ is added to its tip; the dashed black arrow is the target $\mathbf{b}=(5,1)$, reached exactly. Alt-text: two scaled column arrows added tip-to-tail landing on the target vector b.

Why the column picture is the one that scales

Here is the payoff that makes the column picture worth the extra effort. Recall from Chapter 1 that a matrix times a vector, $A\mathbf{x}$, is a linear combination of the columns of $A$, weighted by the entries of $\mathbf{x}$. That means the entire system can be written in one compact, profound line:

$$A\mathbf{x} = \mathbf{b}, \qquad A = \begin{bmatrix} 2 & 1 \\ 1 & -1 \end{bmatrix}, \quad \mathbf{x} = \begin{bmatrix} x \\ y \end{bmatrix}, \quad \mathbf{b} = \begin{bmatrix} 5 \\ 1 \end{bmatrix}.$$

The matrix $A$ holds the coefficients column by column (its $j$-th column is the coefficients of the $j$-th unknown); we call it the coefficient matrix. The vector $\mathbf{x}$ holds the unknowns and $\mathbf{b}$ the constants. Every linear system, no matter how large, collapses to the four-symbol statement $A\mathbf{x} = \mathbf{b}$. And in the column reading, this statement says: $\mathbf{b}$ must be reachable as $A$ acting on some input $\mathbf{x}$ — equivalently, $\mathbf{b}$ must lie among the linear combinations of $A$'s columns. The set of all vectors reachable as combinations of the columns is so important it gets a name and a symbol in Chapter 13: the column space $C(A)$. The single question "does $A\mathbf{x} = \mathbf{b}$ have a solution?" becomes "is $\mathbf{b}$ in $C(A)$?" — a question about geometry, not arithmetic.

Geometric Intuition — Row picture: the solution is where surfaces (lines, planes) intersect. Column picture: the solution is the recipe of weights that builds $\mathbf{b}$ out of the column vectors. Same equations, same answer, two complementary stories. When a system is inconsistent, the row picture says "the surfaces share no common point" and the column picture says "$\mathbf{b}$ lies outside the reach of the columns — no combination of them can build it." Learn to flip between the two views at will; the best linear algebraists hold both in mind at once.

Real-World Application (recommender systems / data science) — In a matrix-factorization recommender, each user's taste is a short vector and each movie is a short vector, and a predicted rating is built as a combination of latent "taste columns." Asking "which blend of latent factors reproduces this user's known ratings?" is exactly the column-picture question $A\mathbf{x}=\mathbf{b}$: express the observed ratings vector as a linear combination of factor columns. We develop the linear-algebra machinery behind such recommenders across Chapters 13, 30, and 33; the seed is right here, in the column reading of a humble 2×2 system. For readers who want to see systems as code first, the way these coefficient grids are stored and indexed is the same array idea covered in representing systems in code.

3.4 How do you solve a system, and what does numpy actually return?

We now know what a solution means in both pictures. How you compute it — the elimination algorithm — is the subject of Chapter 4, and we will honor that boundary: no row reduction mechanics here. But you should see, right now, how to ask the computer for a solution and, just as importantly, how to predict what it will do in each of the three cases. That predictive skill is the real learning goal; the function call is trivial.

For a system with a unique solution — a square coefficient matrix whose rows/columns are not redundant — numpy's workhorse is np.linalg.solve(A, b). It returns the one vector $\mathbf{x}$ satisfying $A\mathbf{x} = \mathbf{b}$:

# Unique solution: 2x + y = 5, x - y = 1.
import numpy as np
A = np.array([[2.0, 1.0],
              [1.0, -1.0]])
b = np.array([5.0, 1.0])
x = np.linalg.solve(A, b)
print(x)                      # -> [2. 1.]
print(A @ x)                  # -> [5. 1.]   (recovers b: a sanity check)

The result [2. 1.] is our point $(x, y) = (2, 1)$, and multiplying $A$ by it recovers $\mathbf{b}$ — always sanity-check a solution by plugging it back in. A 3×3 unique system works the same way; we already saw np.linalg.solve return $(5, 3, -2)$ for three planes meeting at a point in §3.2.

But what happens when the system is not nicely unique — when it is inconsistent (no solution) or dependent (infinitely many)? Both of those correspond to a coefficient matrix that is singular — geometrically, its columns fail to point in genuinely independent directions, so they cannot reach every target, and its rows describe surfaces that fail to pin down a single point. np.linalg.solve is built only for the unique case, so it refuses to guess:

# Inconsistent: x + y = 2 and x + y = 5 (parallel lines, NO solution).
import numpy as np
A = np.array([[1.0, 1.0],
              [1.0, 1.0]])
b = np.array([2.0, 5.0])
try:
    x = np.linalg.solve(A, b)
except np.linalg.LinAlgError as e:
    print("LinAlgError:", e)   # -> LinAlgError: Singular matrix

# Dependent: x + y = 2 and 2x + 2y = 4 (same line, INFINITELY many solutions).
import numpy as np
A = np.array([[1.0, 1.0],
              [2.0, 2.0]])
b = np.array([2.0, 4.0])
try:
    x = np.linalg.solve(A, b)
except np.linalg.LinAlgError as e:
    print("LinAlgError:", e)   # -> LinAlgError: Singular matrix

Computational Note — Notice that np.linalg.solve raises the same LinAlgError: Singular matrix for the inconsistent case (no solution) and the dependent case (infinitely many). The function detects only that the coefficient matrix is singular — it does not, and cannot, tell you which of the two non-unique situations you are in. To distinguish "no solution" from "infinitely many," you must look further: compare the rank of the coefficient matrix $A$ with the rank of the augmented matrix $[A \mid \mathbf{b}]$ (we do exactly this in §3.7), or fall back on a least-squares routine like np.linalg.lstsq, which always returns something and reports the rank it found. In exact arithmetic the determinant of a singular matrix is $0$; in floating point you will often see a tiny nonzero value like -2.99…e-16 instead, which is why numpy tests singularity by a tolerance, not by det == 0.

A subtle and practical point: np.linalg.solve insists on a square $A$ (same number of equations as unknowns). For a non-square system — more equations than unknowns, or fewer — you reach for np.linalg.lstsq, which finds the best-fit solution in a least-squares sense (the closest reachable point when $\mathbf{b}$ is out of range, or one particular solution when there are many). It also hands back the matrix rank, which is gold for diagnosing your case:

# lstsq always returns something AND reports the rank it detected.
import numpy as np
A_dep = np.array([[1.0, 1.0], [2.0, 2.0]]); b_dep = np.array([2.0, 4.0])
sol, residual, rank, sv = np.linalg.lstsq(A_dep, b_dep, rcond=None)
print("one solution:", sol, " rank:", rank)   # -> one solution: [1. 1.]  rank: 1

For the dependent system $x+y=2,\ 2x+2y=4$, lstsq returns the particular solution $(1, 1)$ — which indeed satisfies $x+y=2$ — and reports rank: 1, signaling that the matrix has only one independent row, not two. That rank-deficiency is the algebraic fingerprint of "the two equations are really one." We will turn this fingerprint into a precise theorem in §3.7. (You will implement the underlying elimination from scratch in Chapter 4's toolkit module — np.linalg.lstsq is the reference you will check against.)

Common Pitfall — Seeing np.linalg.lstsq return a single clean vector for a dependent system tempts students to report "the solution is $(1,1)$." But the dependent system has infinitely many solutions — every point on the line $x+y=2$, i.e. $(1+t,\ 1-t)$ for all real $t$. lstsq returned just one representative (the minimum-norm one). Always interpret the rank: when rank < number of unknowns and the system is consistent, the true answer is a whole family, not a point. The single vector is a sample, not the solution set.

3.5 When does a system have no solution? Inconsistency, geometrically

A system is inconsistent when its solution set is empty — no assignment of the unknowns satisfies all equations at once. We have seen the two-line version (distinct parallel lines), but it is worth meeting inconsistency in its purest algebraic form, because that form recurs constantly. Subtract the first equation of

$$ \begin{aligned} x + y &= 2,\\ x + y &= 5 \end{aligned} $$

from the second and you obtain $0 = 3$ — a flatly false statement, with the unknowns gone entirely. Whenever honest manipulation of a system produces a contradiction like $0 = 3$ (a row of all-zero coefficients with a nonzero right-hand side), the system is inconsistent and has no solution. The contradiction is the algebra's way of announcing that the surfaces never meet.

In three dimensions, inconsistency need not come from parallel planes. Consider three planes whose normal directions are coplanar (the third equation's coefficients are the sum of the first two's) but whose constants do not match up:

$$ \begin{aligned} x + y &= 1,\\ y + z &= 1,\\ x + 2y + z &= 5. \end{aligned} $$

The left side of the third equation is exactly (first) + (second): $(x+y) + (y+z) = x + 2y + z$. So the left sides force $5 = 1 + 1 = 2$ — impossible. No two of these planes are parallel, yet they share no common point: they bound a triangular prism. numpy confirms the inconsistency by refusing to solve and by the rank test:

# Three planes, no common point (a triangular prism). Inconsistent.
import numpy as np
A = np.array([[1.0, 1.0, 0.0],
              [0.0, 1.0, 1.0],
              [1.0, 2.0, 1.0]])   # row3 = row1 + row2
b = np.array([1.0, 1.0, 5.0])     # but 1 + 1 = 2 != 5  -> contradiction
print("rank(A)      =", np.linalg.matrix_rank(A))                       # -> 2
print("rank([A|b])  =", np.linalg.matrix_rank(np.column_stack([A, b]))) # -> 3

The coefficient matrix has rank $2$ (only two genuinely independent rows), but appending $\mathbf{b}$ raises the rank to $3$: the constant column $\mathbf{b}$ carries information that the coefficient rows cannot account for. That mismatch — rank([A|b]) > rank(A) — is the exact, general signature of inconsistency, in any dimension. In the column picture it reads beautifully: $\mathbf{b}$ pokes outside the span of the columns, so no combination of them can build it.

Real-World Application (network flow / engineering) — An inconsistent system is not just a math curiosity; in applications it usually means your model is over-constrained or your data is contradictory. If you write conservation equations for current at the nodes of a circuit, or for cars at the intersections of a road network, and the resulting system is inconsistent, it is telling you that no flow can satisfy all your stated constraints simultaneously — perhaps you double-counted a constraint or specified incompatible inflows. The "$0 = 3$" contradiction is the model shouting that your assumptions cannot all be true. We work a concrete traffic-network system in this chapter's case studies.

3.6 When does a system have infinitely many solutions? Dependence and free variables

The opposite failure of uniqueness is redundancy: some equations repeat information already carried by others, leaving the unknowns underdetermined. The cleanest example is two equations describing the same line:

$$ \begin{aligned} x + y &= 2,\\ 2x + 2y &= 4. \end{aligned} $$

The second equation is exactly twice the first; it adds no new constraint. Effectively we have one equation in two unknowns, and a single line's worth of points satisfies it. We describe that infinite family with a free variable: let $y = t$ be free to take any real value, and then the equation $x + y = 2$ forces $x = 2 - t$. The complete solution set is

$$(x, y) = (2 - t,\ t), \qquad t \in \mathbb{R}.$$

Every choice of $t$ gives a genuine solution, and every solution arises from some $t$. The single free parameter is the algebraic echo of the geometric fact that the solution set is a one-dimensional line, not a zero-dimensional point. (How you systematically find which variables are free — pivots versus free columns — is, again, Chapter 4's job; here we only need to see that a free variable produces an infinite family and what that family looks like geometrically.)

The hallmark of this case is consistency together with rank deficiency: the equations are not contradictory (so at least one solution exists), but they are redundant (so the solution is not pinned down). In rank language, which we make precise next: rank([A|b]) = rank(A) (consistent) and rank(A) < number of unknowns (under-determined). We can verify both halves:

# Dependent system: consistent AND rank-deficient => infinitely many solutions.
import numpy as np
A = np.array([[1.0, 1.0],
              [2.0, 2.0]])
b = np.array([2.0, 4.0])
print("rank(A)     =", np.linalg.matrix_rank(A))                        # -> 1
print("rank([A|b]) =", np.linalg.matrix_rank(np.column_stack([A, b])))  # -> 1
print("# unknowns  =", A.shape[1])                                      # -> 2
# rank(A) == rank([A|b]) = 1  (consistent), and 1 < 2 (free variable) -> infinitely many

Both ranks equal $1$ (consistent), and $1$ is less than the $2$ unknowns, so there is $2 - 1 = 1$ free variable — a one-parameter family, exactly the line we found by hand. In three dimensions, three planes hinged along a common line behave the same way: rank $2$, consistent, one free variable, a line of solutions.

Geometric Intuition — The number of free variables equals the dimension of the solution set. Zero free variables → a single point (dimension $0$). One free variable → a line (dimension $1$). Two free variables → a plane (dimension $2$). The free variables are coordinates along the solution set, parameterizing how much freedom remains after all the constraints have had their say. This counting idea matures into the rank–nullity theorem in Chapter 14; here it is already visible to the naked eye.

Check Your Understanding — A system in three unknowns has been reduced (you may take this on faith) to the single non-redundant equation $x + 2y + 3z = 6$, with the other two equations turning out to be multiples of it. How many solutions does the system have, and what is the dimension of the solution set?
Answer
Infinitely many. One genuine equation in three unknowns leaves $3 - 1 = 2$ free variables, so the solution set is a plane (dimension $2$) sitting in 3D space. You could parameterize it by letting $y = s$ and $z = t$ be free, giving $x = 6 - 2s - 3t$, so the solutions are $(6 - 2s - 3t,\ s,\ t)$ for all real $s, t$. Two free parameters, a two-dimensional sheet of solutions — exactly the single plane $x+2y+3z=6$ itself.

The special case: homogeneous systems

A system whose right-hand side is all zeros, $A\mathbf{x} = \mathbf{0}$, is called homogeneous. It is never inconsistent — $\mathbf{x} = \mathbf{0}$ (everything zero) always works, since $A\mathbf{0} = \mathbf{0}$. We call $\mathbf{x} = \mathbf{0}$ the trivial solution. So the only question for a homogeneous system is whether the trivial solution is the only one, or whether there are others. Geometrically, every plane in a homogeneous 3D system passes through the origin (because $\mathbf{0}$ satisfies each equation), so the planes always share at least the origin; they share more — a line or plane through the origin — exactly when the system is rank-deficient. Homogeneous systems are the gateway to the null space $N(A)$ of Chapter 13, the set of all inputs a matrix sends to zero. Keep them in your back pocket; they will become central.

Common Pitfall — "A homogeneous system might have no solution." It cannot. There is always at least the trivial solution $\mathbf{x} = \mathbf{0}$. The real question is uniqueness: does $A\mathbf{x} = \mathbf{0}$ have only the zero solution (which happens exactly when $A$'s columns are independent), or does it have a whole subspace of nonzero solutions? Confusing "homogeneous" with "possibly inconsistent" is a common slip — homogeneous systems trade the existence question for the uniqueness question.

3.7 What conditions decide the solution count? Rank, stated precisely (A track)

We have been circling a precise statement; let us land it. Everything about the number of solutions is governed by comparing two ranks. The rank of a matrix is the number of genuinely independent rows it has — equivalently, the number of independent columns (these two counts are always equal, a fact we prove in Chapter 14). Intuitively, rank measures how much non-redundant information a matrix carries. For our purposes here, take rank as "the count of independent equations after duplicates and combinations are stripped away," and trust numpy's np.linalg.matrix_rank to compute it.

Let $A$ be the $m \times n$ coefficient matrix of a system $A\mathbf{x} = \mathbf{b}$ in $n$ unknowns, and let $[A \mid \mathbf{b}]$ be the augmented matrix formed by tacking the constant column $\mathbf{b}$ onto the right of $A$. Then:

Theorem (Solution-count criterion; the Rouché–Capelli theorem [verify]). Consider $A\mathbf{x} = \mathbf{b}$ with $A$ an $m \times n$ matrix. 1. The system is consistent (has at least one solution) if and only if $\operatorname{rank}(A) = \operatorname{rank}([A \mid \mathbf{b}])$. 2. If it is consistent, the solution is unique when $\operatorname{rank}(A) = n$ (the number of unknowns), and there are infinitely many solutions when $\operatorname{rank}(A) < n$. In the infinite case, the number of free variables is exactly $n - \operatorname{rank}(A)$.

Let us unpack each clause, because every word earns its place — and notice we are stating conditions, never presenting a special case as the general rule.

Why consistency is the rank equality. Appending $\mathbf{b}$ can only keep the rank the same or raise it by one. It raises it precisely when $\mathbf{b}$ contributes a direction the columns of $A$ could not already produce — i.e., when $\mathbf{b}$ lies outside the span of $A$'s columns. In the column picture, "$\mathbf{b}$ outside the column span" is inconsistency. So $\operatorname{rank}([A\mid\mathbf{b}]) > \operatorname{rank}(A)$ ⇔ $\mathbf{b}$ unreachable ⇔ no solution; and equality ⇔ $\mathbf{b}$ reachable ⇔ at least one solution. This is the rank-test we used in §3.5 (inconsistent prism: $2$ vs $3$) and §3.6 (dependent line: $1$ vs $1$).

Why uniqueness needs full column rank. Suppose the system is consistent, so a solution $\mathbf{x}_0$ exists. If the columns of $A$ are independent ($\operatorname{rank}(A) = n$), then the only way to write $\mathbf{0}$ as a combination of them is with all-zero weights — the homogeneous system $A\mathbf{x} = \mathbf{0}$ has only the trivial solution. But if $\mathbf{x}_1$ were a second solution to $A\mathbf{x} = \mathbf{b}$, then $A(\mathbf{x}_1 - \mathbf{x}_0) = \mathbf{b} - \mathbf{b} = \mathbf{0}$, so $\mathbf{x}_1 - \mathbf{x}_0$ would be a nonzero solution of the homogeneous system — a contradiction. Hence the solution is unique. Conversely, if $\operatorname{rank}(A) < n$, the columns are dependent, the homogeneous system has nonzero solutions $\mathbf{h}$, and then $\mathbf{x}_0 + \mathbf{h}$ is a different solution of $A\mathbf{x} = \mathbf{b}$ — infinitely many, one for each homogeneous solution.

Math-Major Sidebar (skippable). The argument above quietly proves a structural gem worth stating on its own: the solution set of a consistent system $A\mathbf{x}=\mathbf{b}$ is a coset (a parallel translate) of the solution set of the homogeneous system $A\mathbf{x}=\mathbf{0}$. Concretely, every solution has the form $\mathbf{x}_0 + \mathbf{h}$, where $\mathbf{x}_0$ is any one particular solution and $\mathbf{h}$ ranges over the null space $N(A) = \{\mathbf{h} : A\mathbf{h} = \mathbf{0}\}$. So the geometry of the solution set is "a point ($\mathbf{x}_0$) plus a subspace ($N(A)$)" — a point, a line, a plane, depending on $\dim N(A) = n - \operatorname{rank}(A)$. This "particular + homogeneous" decomposition is the same structure you meet for linear differential equations in calculus and physics, and it is the reason linear problems are so much friendlier than nonlinear ones. We make $N(A)$ a first-class object in Chapter 13 and prove the dimension count, the rank–nullity theorem, in Chapter 14.

For the most common case in practice — a square system with $n$ equations in $n$ unknowns — the theorem specializes to a clean dichotomy that ties together everything in this book's early chapters:

The Key Insight (square systems). For a square system $A\mathbf{x} = \mathbf{b}$ ($A$ is $n \times n$), exactly one of two things happens: - $A$ is invertible ($\operatorname{rank}(A) = n$, equivalently $\det(A) \neq 0$, equivalently the columns are independent): there is exactly one solution, for every choice of $\mathbf{b}$. This is the case np.linalg.solve is built for. - $A$ is singular ($\operatorname{rank}(A) < n$, equivalently $\det(A) = 0$): depending on $\mathbf{b}$, there are either no solutions or infinitely many — never exactly one. This is the case where np.linalg.solve raises LinAlgError: Singular matrix.

This dichotomy is why the determinant, the inverse, and rank — the subjects of Chapters 9, 11, and 13–14 — are not arbitrary topics but answers to the single question "how many solutions does $A\mathbf{x}=\mathbf{b}$ have?" The whole early architecture of linear algebra is a refined study of this one problem the Nine Chapters posed two millennia ago.

Check Your Understanding — A square $4 \times 4$ system has $\det(A) = 0$ and you are also told it is consistent (a solution exists). How many solutions are there, and at minimum how many free variables?
Answer
Infinitely many. $\det(A) = 0$ means $A$ is singular, so $\operatorname{rank}(A) < 4$ — at most $3$. Being consistent rules out "no solutions," so by the dichotomy the only remaining possibility is infinitely many. The number of free variables is $n - \operatorname{rank}(A) = 4 - \operatorname{rank}(A) \geq 4 - 3 = 1$, so there is at least one free variable (the solution set is at least a line, possibly a plane or larger). Exactly how many depends on the precise rank.

3.8 How big can these systems get? From two unknowns to the whole web

Everything so far lived in two or three unknowns, where we can see the lines and planes. But the reason linear systems run the modern world is that the same structure — $A\mathbf{x} = \mathbf{b}$, with its three solution outcomes governed by rank — holds for systems with thousands, millions, or billions of unknowns, where no picture is possible but the geometry still guides the thinking. This is the third recurring theme of the book in action: computation validates theory, and theory guides computation. The two-dimensional intuition you built with cardboard and a pencil is exactly the intuition that organizes a billion-dimensional system; only the arithmetic gets harder, and that we hand to the computer.

The grandest example, and the anchor we are seeding for later, is ranking the web. Imagine the entire World Wide Web as a graph: each page is a node, each hyperlink an arrow from one page to another. Larry Page and Sergey Brin's insight, around 1998, was that a page's importance should be defined self-referentially: a page is important if important pages link to it. Write $r_i$ for the importance ("rank") of page $i$. Then $r_i$ should be a weighted sum of the ranks of all pages linking to it — and that is a linear equation in the unknown ranks. Stack one such equation per page and you get a colossal linear system in as many unknowns as there are web pages — billions of them.

Real-World Application (the web — seeding PageRank). PageRank turns "which web pages matter most?" into a single enormous linear system. Each equation says a page's score is a blend of the scores of the pages pointing to it; together they form $\mathbf{r} = M\mathbf{r}$ for a giant link matrix $M$, a system with one unknown per web page. Written as $(I - M)\mathbf{r} = \mathbf{0}$ it is a homogeneous system, and its nontrivial solution — the importance vector — is exactly an eigenvector of $M$ with eigenvalue $1$ (you will meet eigenvectors in Chapter 23). Solving it for billions of pages by elimination is hopeless; instead it is solved by iteration, repeatedly multiplying by $M$ until the vector settles. We will set up the full system in Chapter 29 and watch power iteration converge to the answer. For now, hold onto the headline: ranking the entire web is, at bottom, a single linear system / eigenvector problem — the very thing this chapter is about, scaled to the size of the internet.

We can preview the idea in miniature with a three-page web, just to make it concrete (the eigenvector machinery is Chapter 23/29 — here we only peek):

# A 3-page web, link matrix M (column-stochastic). The "rank vector" is the
# steady state r with M r = r. We only PEEK here; we solve it properly in Ch.29.
import numpy as np
M = np.array([[0.0, 0.0, 1.0],
              [0.5, 0.0, 0.0],
              [0.5, 1.0, 0.0]])
vals, vecs = np.linalg.eig(M)
i = np.argmin(np.abs(vals - 1))          # the eigenvalue equal to 1
r = np.real(vecs[:, i]); r = r / r.sum() # normalize to sum to 1
print(np.round(r, 3))                    # -> [0.4 0.2 0.4]

The output $[0.4,\ 0.2,\ 0.4]$ says pages 1 and 3 are equally important and twice as important as page 2 — the steady-state importance that satisfies the self-referential system. Do not worry about how the eigenvector was found; the point is that a question about the web's structure became a linear system, and its solution is a vector of scores. That is the promise of this subject made vivid: learn the small case, inherit the large one.

There is one more lesson hiding in the jump from three pages to three billion, and it shapes a whole later part of this book. The elimination algorithm you will learn in Chapter 4 is wonderful for systems of modest size, but its cost grows roughly with the cube of the number of unknowns — triple the unknowns and the work multiplies by about twenty-seven. For a system the size of the web, that cost is not merely large; it is astronomically, hopelessly beyond any computer that will ever exist. So at genuine scale, nobody solves $A\mathbf{x} = \mathbf{b}$ by elimination. Instead, they exploit the structure of the particular system — the web's link matrix is enormous but mostly zeros, since any one page links to only a handful of others — and they solve it by iteration: start with a guess, multiply by the matrix, and repeat until the vector stops changing. Each multiplication is cheap because of all those zeros, and the answer emerges after surprisingly few rounds. This split between direct methods (elimination, Chapter 4) and iterative methods (Chapter 38, and the power iteration behind PageRank in Chapter 29) is one of the central dramas of numerical linear algebra, and it begins right here, with the simple observation that "how many solutions, and what are they?" and "how do we actually compute them at scale?" are different questions with different answers.

Build Your Toolkit — You will build the solver itself in Chapter 4 (the module toolkit/linear_systems.py, with gaussian_elimination(A, b) and friends, verified against np.linalg.solve). To prime that work, do not solve anything yet — instead, sharpen your prediction. In a scratch file, set up these four systems as numpy arrays A and b, and for each one write down your prediction — none / exactly one / infinitely many — using only the row picture, the column picture, and the rank test from §3.7, before running anything: python import numpy as np systems = { "S1": (np.array([[2.,1.],[1.,-1.]]), np.array([5.,1.])), # ? "S2": (np.array([[1.,1.],[1.,1.]]), np.array([2.,5.])), # ? "S3": (np.array([[1.,1.],[2.,2.]]), np.array([2.,4.])), # ? "S4": (np.array([[1.,1.,0.],[0.,1.,1.],[1.,2.,1.]]), np.array([1.,1.,5.])), # ? } for name, (A, b) in systems.items(): ra = np.linalg.matrix_rank(A) rab = np.linalg.matrix_rank(np.column_stack([A, b])) print(name, "rank(A)=", ra, "rank([A|b])=", rab, "n=", A.shape[1]) Then check your predictions against the ranks: equal ranks with rank = n → one; equal ranks with rank < n → infinitely many; unequal → none. (Answers: S1 one, S2 none, S3 infinitely many, S4 none.) Keep this file; in Chapter 4 your hand-built gaussian_elimination will reproduce these verdicts mechanically, and you will appreciate having predicted them first.

3.9 What is the augmented matrix, and why bother with it?

We have used the augmented matrix $[A \mid \mathbf{b}]$ informally; let us pin it down, because it is the object Chapter 4's algorithm actually operates on. To form it, write the coefficient matrix $A$ and then append $\mathbf{b}$ as one extra column on the right, conventionally separated by a vertical bar to remind you that the last column plays a special role (it is constants, not coefficients). For our running system:

$$ \left[\begin{array}{cc|c} 2 & 1 & 5 \\ 1 & -1 & 1 \end{array}\right]. $$

Each row of this array is one equation, stripped down to its numbers — the unknowns $x, y$ are implicit in the column positions. The reason this bookkeeping is worth the trouble is that the operations you are allowed to perform on a system without changing its solution set — swapping two equations, scaling an equation by a nonzero number, adding a multiple of one equation to another — become simple operations on the rows of this array. That is precisely what Chapter 4 exploits: it manipulates rows of the augmented matrix to peel the system apart. For now, just internalize the encoding and the rank test that rides on it:

# Build the augmented matrix and run the consistency / uniqueness test.
import numpy as np
A = np.array([[2.0, 1.0],
              [1.0, -1.0]])
b = np.array([5.0, 1.0])
aug = np.column_stack([A, b])
print(aug)
# [[ 2.  1.  5.]
#  [ 1. -1.  1.]]
print("rank(A)=", np.linalg.matrix_rank(A),
      "rank(aug)=", np.linalg.matrix_rank(aug),
      "n=", A.shape[1])
# rank(A)= 2 rank(aug)= 2 n= 2  -> consistent and rank == n -> unique solution

Equal ranks ($2 = 2$) say consistent; the rank equals the number of unknowns ($2 = 2$) say unique. The augmented matrix has converted the entire solvability question into two integer comparisons — a foreshadowing of how mechanical Chapter 4 will make the whole business.

Computational Note — np.linalg.matrix_rank decides rank numerically, by counting singular values above a tolerance (it uses the SVD of Chapter 30 under the hood). For clean integer matrices it returns exactly what you expect, but for matrices built from messy floating-point data, a "should-be-zero" singular value can come out as 1e-16 and be correctly ignored, or a genuinely tiny-but-nonzero value can sit right at the tolerance and make the reported rank sensitive to round-off. This is your first brush with the theme of numerical linear algebra (Chapter 38): the clean integer answers of theory and the tolerance-dependent answers of floating point can diverge, and knowing when to trust which is a real skill.

3.10 Why are linear systems "linear"? A look back at superposition

It is worth pausing to connect this chapter to the very first idea of the book — linearity as superposition (Chapter 1). Why do these systems deserve the name linear, and why does that name promise the good behavior we have seen? Because the map $\mathbf{x} \mapsto A\mathbf{x}$ is a linear transformation: it satisfies $A(\mathbf{u} + \mathbf{v}) = A\mathbf{u} + A\mathbf{v}$ and $A(c\mathbf{u}) = cA\mathbf{u}$, the two rules from Chapter 1. The solution-set structure we discovered in §3.7 — "particular solution plus homogeneous solutions" — is a direct consequence of superposition: if $A\mathbf{x}_0 = \mathbf{b}$ and $A\mathbf{h} = \mathbf{0}$, then by linearity $A(\mathbf{x}_0 + \mathbf{h}) = A\mathbf{x}_0 + A\mathbf{h} = \mathbf{b} + \mathbf{0} = \mathbf{b}$. Superposition is why the solution set is a flat object (a point, line, or plane) rather than some curved, complicated set.

To make that abstract structure concrete, return to the dependent system $x + y = 2,\ 2x + 2y = 4$ from §3.6, which we found had solution set $(2 - t,\ t)$. We can now read that family through the superposition lens. A particular solution — any single point that works — is $\mathbf{x}_0 = (2, 0)$ (check: $2 + 0 = 2$ ✓). The associated homogeneous system $x + y = 0,\ 2x + 2y = 0$ has solutions $\mathbf{h} = (-t,\ t) = t\,(-1, 1)$, a line through the origin in the direction $(-1, 1)$. Superposition then guarantees that every solution is $\mathbf{x}_0 + \mathbf{h} = (2, 0) + t(-1, 1) = (2 - t,\ t)$ — precisely the family we found by hand, now seen as "one particular point, plus the entire line of homogeneous solutions slid through it." The solution set is a line parallel to the homogeneous line $N(A)$ but passing through $\mathbf{x}_0$ instead of the origin. This is the geometric meaning of the Math-Major Sidebar's coset: the homogeneous solutions are a subspace anchored at the origin, and the inhomogeneous solutions are that same subspace translated to a particular point. Hold this picture; it is the skeleton of Chapters 13 and 14.

This is also why the three-outcome rule is special to linear systems. A nonlinear system — say $x^2 + y^2 = 1$ and $y = x^2$ — can have two solutions, or four, or any number, because circles and parabolas can cross in many points. The clean trichotomy "none / one / infinitely many" is a privilege of linearity, paid for by the straightness and even spacing of the grid lines that Chapter 1 used to define linear transformations. Every time you exploit that trichotomy, you are cashing in superposition. Keep this connection in view: the systems of this chapter and the transformations of Chapter 1 are two angles on one object, $A$, and the next chapter's algorithm is simply the most efficient way to interrogate it.

Geometric Intuition — Tie the two pictures to the transformation view one last time. The matrix $A$ is a machine that takes the input arrow $\mathbf{x}$ and transforms it into the output arrow $A\mathbf{x}$. Solving $A\mathbf{x} = \mathbf{b}$ asks: which input does the machine turn into the specific output $\mathbf{b}$? If the machine is invertible (it doesn't squash space flat), every output has a unique input — one solution. If the machine collapses space (singular — it flattens 2D onto a line, say), then some outputs are unreachable (no input maps there → no solution) and the reachable ones have a whole line of inputs mapping to them (→ infinitely many). The determinant, in Chapter 11, will measure exactly this collapse. Three chapters, one object, seen from three sides.

3.11 Putting it together: a worked classification

Let us consolidate by classifying one more system by reasoning, not by solving — exercising both pictures and the rank test together. Consider

$$ \begin{aligned} x + 2y + 3z &= 6,\\ 2x + 4y + 6z &= 12,\\ x - y + z &= 2. \end{aligned} $$

Row picture first. Each equation is a plane in space. Look at the first two: the second is exactly $2\times$ the first (left side and right side both double), so they describe the same plane. We really have only two distinct planes — the plane $x+2y+3z=6$ and the plane $x-y+z=2$ — and two non-parallel planes in space meet in a line. So we expect infinitely many solutions, forming a line.

Column picture cross-check. In $A\mathbf{x}=\mathbf{b}$ form, is $\mathbf{b}=(6,12,2)$ reachable as a combination of the three columns? It must be, because we can already see the system is consistent (no contradiction arises — the redundant equation agrees, $12 = 2\times 6$). And because the rows are dependent (rank $< 3$), the columns are too, so the combination is not unique: a one-parameter family of weights reaches $\mathbf{b}$.

Rank test confirms. Let numpy referee:

# Classify x+2y+3z=6, 2x+4y+6z=12, x-y+z=2 by rank — do NOT solve.
import numpy as np
A = np.array([[1.0, 2.0, 3.0],
              [2.0, 4.0, 6.0],
              [1.0, -1.0, 1.0]])
b = np.array([6.0, 12.0, 2.0])
print("rank(A)     =", np.linalg.matrix_rank(A))                        # -> 2
print("rank([A|b]) =", np.linalg.matrix_rank(np.column_stack([A, b])))  # -> 2
print("# unknowns  =", A.shape[1])                                      # -> 3

Both ranks are $2$ (consistent), and $2 < 3$, so there are $3 - 2 = 1$ free variables — a line of solutions, exactly as the row picture predicted. Three independent lines of reasoning — rows, columns, and rank — converge on one verdict. That convergence, the habit of cross-checking geometry against algebra against computation, is the working style this book is training into you. (And had we tried np.linalg.solve(A, b) here, it would have raised LinAlgError: Singular matrix, refusing to pick one point from an infinite line — precisely the behavior §3.4 taught you to expect.)

Common Pitfall — When two equations look different but are secretly proportional (like the first two above, $1{:}2{:}3{:}6$ versus $2{:}4{:}6{:}12$), students count them as two constraints and wrongly expect a unique solution from "three equations, three unknowns." Always check for hidden dependence: is one equation a scalar multiple of another, or a sum of others? The rank, not the row count, is the true number of constraints. A $3\times 3$ system with rank $2$ behaves like a $2$-equation system, with all the under-determination that implies.

3.12 How do you turn a word problem into a system of linear equations?

So far our systems arrived pre-packaged as equations. In practice — and in nearly every application in this book — the system is hidden inside a paragraph of words, and the real skill in learning how to solve a system of linear equations is the translation step that comes before any solving. There is a reliable four-move recipe, and it leans on exactly the two pictures you now own.

First, name the unknowns. Decide what quantities you are solving for and give each a letter. The number of unknowns is the dimension of the space you are working in — two unknowns live in the plane, three in space, and the geometry of the row and column pictures applies directly. In the mixture problem of the exercises, the unknowns are the millilitres of two stock solutions; in the circuit case study, they are the three branch currents.

Second, find the constraints, one equation per independent condition. Every sentence that says "this combination of the unknowns equals that number" is a candidate linear equation. Conservation statements ("what goes in equals what comes out"), totals ("the parts sum to the whole"), and balance conditions ("the value here equals the value there") are the usual sources, and they are linear whenever the unknowns appear only to the first power, scaled and added. Watch for the trap that gives a dependent system: a condition that is secretly a consequence of others (the redundant loop equation, the total that is already implied) adds a row but not a constraint, and the rank — not the sentence count — is what truly matters.

Third, write it as $A\mathbf{x} = \mathbf{b}$, and pause to predict the solution count before solving. Assemble the coefficient matrix $A$ column by column (each column is the coefficients of one unknown), stack the constants into $\mathbf{b}$, and ask the rank question from Section 3.7. Is the coefficient matrix square and likely invertible (expect a unique answer), or do you have more conditions than unknowns (watch for inconsistency), or fewer (expect a family of solutions)? This prediction is not busywork: it tells you in advance whether to reach for solve (unique) or lstsq (over- or under-determined), and it catches modeling errors early, because an unexpectedly singular system usually means a redundant or contradictory assumption slipped in.

Fourth, solve, then interpret in the original words. Only now do you compute — by hand in small cases, or with numpy at scale — and then translate the numbers back into the language of the problem: these millilitres, those currents, that production level. The interpretation step is where the column picture pays off again, because the solution is literally the recipe of weights that combines the columns to meet the demand, and that recipe usually has a concrete real-world reading. Throughout the rest of this book, almost every application — regression, graphics, recommenders, PageRank — follows this same arc from words to $A\mathbf{x} = \mathbf{b}$ to interpretation, so the habit you build here will repay itself many times over.

Common Pitfall — When translating words to equations, students often write fewer genuinely independent equations than they think, because two of their sentences say the same thing in different words. The result is an accidental dependent system with infinitely many solutions where they expected one — and the temptation is to "fix" it by inventing an extra equation that is not actually justified by the problem, which silently changes the model. The discipline is: write only the constraints the problem truly gives you, count the rank, and if the answer is under-determined, accept that the problem genuinely has a family of solutions (or go find the missing real-world condition, like an extra measurement).

3.13 What you should carry forward

You arrived at this chapter able to add and scale vectors; you leave it able to interrogate a whole system of constraints before lifting a pencil to solve it. The single most valuable habit you have practiced is reading every system in two pictures at once — the row picture (where do the surfaces meet?) and the column picture (is $\mathbf{b}$ a combination of the columns?) — and recognizing that they describe the same thing. That double vision is not a stylistic flourish; it is the organizing principle of the next thirty-seven chapters. The row picture leads into Chapter 4's elimination algorithm and the geometry of solution sets; the column picture leads straight into Chapters 13–14's column space, null space, and rank — the four fundamental subspaces that are the backbone of the whole book.

You also now know the whole truth about how many solutions a linear system can have — none, exactly one, or infinitely many, and nothing else — and why: it is superposition, the linearity from Chapter 1, expressed as the flatness of the solution set. You can state the conditions precisely (the rank comparison of $A$ and $[A\mid\mathbf{b}]$), you can predict numpy's behavior (np.linalg.solve for the unique case, a Singular matrix error otherwise, np.linalg.lstsq and the rank as your diagnostic), and you have seen the idea scale all the way up to ranking the entire web. What you cannot yet do — systematically solve a large system by hand — is exactly the gap Chapter 4 fills, with row reduction. Turn the page knowing what the answer means before you learn how to compute it; that ordering is the whole pedagogy of this book.

Historical Note — The elimination method for solving linear systems appears in the Chinese Nine Chapters on the Mathematical Art (c. 200 BCE–200 CE) [verify], and the technique now bears the name of Carl Friedrich Gauss (1777–1855), who used it in computing the orbit of the asteroid Ceres around 1809 [verify]. The systematic study of when systems are solvable — the rank criterion of §3.7 — is credited to Eugène Rouché and Alfredo Capelli in the late nineteenth century [verify]. The matrix notation $A\mathbf{x} = \mathbf{b}$ that lets us write a billion equations in four symbols came later still, with Arthur Cayley's matrix algebra (1858) [verify]. Two millennia separate the grain-bundle problem from PageRank, but the question — for which unknowns do all these linear conditions hold at once? — is unchanged.