The Inverse Matrix: Undoing a Transformation

DataField.Dev

51 min read

> Learning paths. Math majors — read everything, especially the proof that a one-sided inverse of a square matrix is automatically two-sided (§9.7) and the Math-Major Sidebars on the invertible matrix theorem and on left/right inverses of non-square...

Prerequisites

chapter-08-matrix-operations

Learning Objectives

Explain the inverse matrix geometrically as the transformation that exactly undoes A, reversing its stretch, rotation, shear, or reflection.
State the equivalent conditions for invertibility precisely (bijective map, nonzero determinant, full rank, a pivot in every row and column) and use any one to decide whether a matrix is invertible.
Recognize singular (non-invertible) matrices as the transformations that collapse space onto a lower dimension and explain why collapse destroys the information needed to undo it.
Compute the inverse of a matrix by Gauss-Jordan elimination on the augmented array [A | I], by hand for 2x2 and 3x3 and in numpy for any size.
Apply and prove the algebraic properties of inverses, including the order-reversing rule (AB)^{-1} = B^{-1}A^{-1} and (A^T)^{-1} = (A^{-1})^T.
Explain why solving Ax = b directly (elimination / LU) is preferred over forming A^{-1}b, and implement inverse(A) from scratch via Gauss-Jordan in toolkit/inverse.py, verified against numpy.

In This Chapter

9.1 What does it mean to undo a transformation?
9.2 When can a transformation be undone? (Bijectivity)
9.3 What is a singular matrix, and why can't you reverse it?
9.4 Undoing a transformation in the visualizer
9.5 How do you find the inverse of a matrix? (Gauss-Jordan on [A | I])
9.6 When is a matrix invertible? (The conditions, precisely)
9.7 Why does a one-sided inverse of a square matrix have to be two-sided? (A proof)
9.8 Why do you rarely actually invert a matrix to solve Ax = b?
9.9 What rules do inverses obey? (Properties)
9.10 Build Your Toolkit
9.11 What did we just learn, and where does it go?

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

The Inverse Matrix: Undoing a Transformation

Learning paths. Math majors — read everything, especially the proof that a one-sided inverse of a square matrix is automatically two-sided (§9.7) and the Math-Major Sidebars on the invertible matrix theorem and on left/right inverses of non-square matrices. CS / Data Science — focus on the Geometric Intuition callouts, the Gauss-Jordan algorithm in §9.5, the numpy snippets, and especially the §9.8 warning about not inverting to solve; the proofs build intuition but the sidebars are optional. Physics / Engineering — focus on the geometry of "undo," the invertibility conditions, and the circuit/economics applications; keep the picture of the unit square springing back to its original shape in your head. This is the chapter where the verb of Chapter 7 — a matrix transforms space — acquires a past tense: can we put space back?

A note on this chapter. Chapter 7 made a matrix a verb; Chapter 8 taught us to chain verbs together by composition — $AB$ means "do $B$, then do $A$." The natural next question is the one a child asks after knocking over a tower: can we undo it? For transformations the answer is sometimes yes and sometimes a flat, irreversible no, and the whole chapter is about telling the two apart and, when the answer is yes, computing the reversal. We will not present the inverse as a formula handed down from above; we will earn it geometrically (the undo), characterize exactly when it exists (the invertible matrix theorem), and only then compute it (Gauss-Jordan on $[A\,|\,I]$). And we will end with a warning the textbooks too often whisper: in real computation you almost never form the inverse at all.

9.1 What does it mean to undo a transformation?

In Chapter 7 you learned to read a matrix as a motion of space: $\begin{bmatrix}0 & -1 \\ 1 & 0\end{bmatrix}$ is a quarter-turn, $\begin{bmatrix}2 & 0 \\ 0 & 2\end{bmatrix}$ is a zoom, $\begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}$ is a shear that slides the top of the plane rightward. Every one of these takes the flat sheet of graph paper and moves it. Now picture pressing a rewind button. The shear slid the plane right; unsliding it means sliding the plane back left by the same amount. The quarter-turn spun the plane counterclockwise; unturning it means spinning clockwise by the same quarter. The zoom doubled every distance; unzooming means halving every distance. Each of these reversals is itself a transformation — a motion of space — and therefore each is itself a matrix. The inverse of $A$, written $A^{-1}$, is the matrix that performs the reverse motion.

That is the entire idea of this chapter, and it is worth saying before any algebra: $A^{-1}$ is the transformation that undoes what $A$ did. If you apply $A$ and then apply $A^{-1}$, you are back exactly where you started — every point of the plane returned to its original position. The combined motion "do $A$, then undo it" is the motion that does nothing at all, which Chapter 7 named the identity $I$. In the language of composition from Chapter 8, where stacking transformations is matrix multiplication, this reads

$$A^{-1}A = I \qquad\text{and}\qquad AA^{-1} = I.$$

The first equation says "do $A$, then $A^{-1}$, and you've done nothing"; the second says the same with the order flipped — undoing first and then doing also lands you home. (For square matrices these two conditions turn out to be equivalent, a small miracle we prove in §9.7.)

The Key Insight — The inverse matrix $A^{-1}$ is the transformation that exactly reverses $A$. Geometrically: $A$ moves space, $A^{-1}$ moves it back. Algebraically: $A^{-1}A = AA^{-1} = I$, the do-nothing transformation. Searching "how to find the inverse of a matrix" really means searching for "how to write down the motion that retraces $A$ step by step," and that reframing makes every formula in this chapter inevitable rather than arbitrary.

Let's make the picture concrete with the simplest reversible motion. The shear $A = \begin{bmatrix}1 & 1 \\ 0 & 1\end{bmatrix}$ leaves east alone and slides north one unit to the right, sending $\mathbf{e}_2 = (0,1)$ to $(1,1)$. To undo it, leave east alone again and slide the tilted north back one unit to the left, returning $(1,1)$ to $(0,1)$ — that is the shear $\begin{bmatrix}1 & -1 \\ 0 & 1\end{bmatrix}$. So we expect $A^{-1} = \begin{bmatrix}1 & -1 \\ 0 & 1\end{bmatrix}$, and we can confirm it the geometric way (does $A^{-1}A$ leave the basis vectors fixed?) before we ever learn a computational recipe:

# The shear and its undo: A^{-1} A should be the identity (does nothing).
import numpy as np
A    = np.array([[1, 1], [0, 1]])
Ainv = np.array([[1, -1], [0, 1]])    # slide north back to the left
print(Ainv @ A)     # do A, then undo it
print(A @ Ainv)     # undo first, then do

[[1 0]
 [0 1]]
[[1 0]
 [0 1]]

Both products are the identity, exactly as the geometry promised: shearing right and then shearing left by the same amount returns every point home. We guessed this inverse from the picture; §9.5 gives an algorithm so you never have to guess.

What is the inverse of a matrix, intuitively?

It is the rewind button. If $A$ is a recipe for moving every point of space to a new location, then $A^{-1}$ is the recipe for moving every point back to where it came from. The reason a matrix can have an inverse at all is that some transformations are reversible in this clean way — a rotation can be un-rotated, a stretch un-stretched — and the reason a matrix can fail to have one is that some transformations destroy information that no later motion can recover. The next two sections develop exactly that distinction, because it is the heart of the chapter: invertibility is the geometry of "did we lose anything?"

Geometric Intuition — Think of the unit square from the Chapter 1 visualizer. $A$ deforms it into some parallelogram. $A^{-1}$ is the unique transformation that takes that parallelogram back to the unit square — and, because it's linear and agrees with the reverse motion on the basis vectors, it takes everything else back too. If $A$ rotates and stretches, $A^{-1}$ stretches by the reciprocal and rotates the opposite way. We will watch the unit square deform and then snap back in §9.4.

9.2 When can a transformation be undone? (Bijectivity)

Not every motion can be reversed, and the obstruction is beautifully simple to state. To undo $A$ you need a rule that, given an output point $\mathbf{b}$, tells you the unique input point $\mathbf{x}$ that $A$ sent there. Two things can go wrong with that rule, and they are the only two things that can go wrong.

First, $A$ might send two different inputs to the same output. If both $\mathbf{x}_1$ and $\mathbf{x}_2$ land on the same $\mathbf{b}$, then when you see $\mathbf{b}$ and try to rewind, you cannot tell which one you came from — the information about which input you started with was destroyed. A transformation that never does this is called injective (one-to-one): distinct inputs always go to distinct outputs. Second, $A$ might fail to reach some outputs at all. If no input maps to a particular point $\mathbf{b}$, then "undo $\mathbf{b}$" has no answer — there's nothing to rewind to. A transformation that reaches every point of the target space is called surjective (onto).

A transformation that is both injective and surjective — every output comes from exactly one input — is called bijective, and bijectivity is precisely the condition under which a clean two-sided undo exists.

The Key Insight — A transformation can be undone if and only if it is bijective: every output point comes from exactly one input point. Injective ("no two inputs collide") guarantees the undo is unambiguous; surjective ("every point is reached") guarantees the undo is defined everywhere. Lose either property and you cannot build a genuine inverse. For the square matrices that occupy this chapter, a small theorem (§9.6) shows injective and surjective are the same condition — so for square matrices you only ever have to check one.

Here is the link back to the geometry that makes this vivid. A linear transformation of the plane that fails to be injective must be squashing the plane — collapsing some direction to nothing, so that a whole line's worth of points pile up on a single image point. And a linear map of the plane that fails to be surjective must miss part of the plane — its outputs all lie on some line or at the origin, never filling out the whole sheet. For a linear map of $\mathbb{R}^n$ to itself, these are two sides of one coin: if it collapses a direction on the way in (not injective), it must leave a direction unreached on the way out (not surjective), because it cannot cram an $n$-dimensional space into a lower-dimensional image without losing a dimension somewhere. That coin is the single most important object in the chapter, and it has a name.

There is a quantitative way to feel the same fact, which the visualizer's title number has been hinting at since Chapter 7. A bijective linear map keeps the unit cube genuinely $n$-dimensional — it may stretch, slant, and flip it, but the image still has nonzero volume, so its volume-scaling factor $\det(A)$ is nonzero. A non-bijective map flattens the cube onto something lower-dimensional with zero volume, forcing $\det(A) = 0$. So "reversible," "bijective," "no dimension lost," and "nonzero determinant" are already converging on a single idea — the convergence we'll formalize as the Invertible Matrix Theorem in §9.6 and prove rigorously for the determinant in Chapter 11. Keep your eye on the determinant in every figure; it is quietly reporting reversibility.

Common Pitfall — "If $AB = I$ then surely $A$ and $B$ undo each other, so $A$ is invertible — done." For square matrices, yes (we'll prove a one-sided inverse forces the other, §9.7). But the reasoning "they multiply to $I$, so it's reversible" hides the real content, which is bijectivity. For non-square matrices the trap is genuine: a tall $3\times 2$ matrix can have a left inverse $B$ with $BA = I_2$ (it's injective, you can recover its input) while having no right inverse and being un-invertible in the two-sided sense (it can't be surjective onto $\mathbb{R}^3$). One-sided inverses are a real and useful thing — they reappear as the pseudoinverse in Chapter 19 — but "inverse" with no qualifier means the two-sided inverse of a square matrix.

Check Your Understanding — The transformation $A = \begin{bmatrix}1 & 0 \\ 0 & 0\end{bmatrix}$ projects the plane onto the $x$-axis. Find two different input points that $A$ sends to the same output, and name a point in the plane that $A$ never reaches. Is $A$ invertible?

Answer

Both $(3, 5)$ and $(3, -2)$ map to $(3, 0)$ — in fact the entire vertical line $x = 3$ collapses onto $(3,0)$, so $A$ is not injective. And $A$ never reaches any point off the $x$-axis, e.g. $(0, 1)$ is unreachable, so $A$ is not surjective either. It fails bijectivity twice over: it is not invertible (singular). When you see the projected point $(3,0)$ you cannot recover the height you started with — the information is gone. This is the prototype of every singular matrix.

9.3 What is a singular matrix, and why can't you reverse it?

The matrices that cannot be undone deserve their own name and their own picture, because recognizing them on sight is half the skill of this chapter. A square matrix that has no inverse is called singular (equivalently, non-invertible); a square matrix that does have an inverse is invertible or nonsingular. The word "singular" is apt: these are the special, exceptional, measure-zero matrices where something degenerate happens.

What happens, geometrically, is collapse. A singular matrix takes the full $n$-dimensional space and flattens it onto a lower-dimensional shadow — in the plane, it squashes the whole 2D sheet down onto a line (or, in the extreme case of the zero matrix, onto the single point at the origin). The projection $\begin{bmatrix}1 & 0 \\ 0 & 0\end{bmatrix}$ you just met crushes the plane onto the $x$-axis. The matrix $\begin{bmatrix}1 & 2 \\ 2 & 4\end{bmatrix}$ does the same kind of damage in a tilted direction: its second column $(2,4)$ is exactly twice its first column $(1,2)$, so both basis vectors land on the same line through the origin (the line in the direction $(1,2)$), and the entire plane is mashed onto that one line.

The Key Insight — A singular matrix collapses space onto a lower dimension — it loses a dimension. That lost dimension is exactly the information you would need to reverse the motion, and it is gone for good: infinitely many input points were crushed onto each surviving output point, so there is no rule that could send a survivor back to "the" point it came from. Collapse is irreversible. Singular = collapse = un-undoable. This is the geometric content behind every algebraic test for non-invertibility you'll meet.

Why exactly does losing a dimension make reversal impossible? Because reversal demands that the journey home be unique, and collapse destroys uniqueness. When $\begin{bmatrix}1 & 2 \\ 2 & 4\end{bmatrix}$ sends both $(2,-1)$ and $(0,0)$ to the origin — check: $1\cdot 2 + 2\cdot(-1) = 0$ and $2\cdot 2 + 4\cdot(-1) = 0$, so $(2,-1) \mapsto (0,0)$, and obviously $(0,0)\mapsto(0,0)$ — any proposed inverse would have to send the origin back to both $(2,-1)$ and $(0,0)$ at once, which no function can do. The existence of even a single nonzero vector that $A$ kills (sends to $\mathbf{0}$) is fatal to invertibility. That set of killed vectors is the null space of $A$, which gets a whole chapter (Chapter 13); for now, just hold onto the slogan: a nonzero null space means singular.

# A singular matrix collapses the plane: its columns are linearly dependent.
import numpy as np
S = np.array([[1, 2],
              [2, 4]])             # column 2 = 2 * column 1
print("rank =", np.linalg.matrix_rank(S))   # 1, not 2: a dimension was lost
print("det  =", np.linalg.det(S))           # 0.0: the area-scaling factor is zero
# Two distinct inputs with the same output (collapse made explicit):
print(S @ np.array([2, -1]), "=", S @ np.array([0, 0]))   # both land on the origin

rank = 1
det  = 0.0
[0 0] = [0 0]

The rank is $1$ (the image is one-dimensional — a line), the determinant is $0$ (zero area, because a flattened square has no area), and the vector $(2,-1)$ is mapped to the origin just like $(0,0)$ is. Three different alarm bells, all ringing the same fact: this matrix collapses space and cannot be inverted. Asking numpy to invert it raises LinAlgError: Singular matrix, which is the library's way of refusing to do the impossible.

Warning

— Determinant $= 0$ means singular — but "small determinant" does NOT mean "almost singular" in any naive sense, and a nonzero determinant is not a safe certificate that inversion will be accurate. The determinant scales with the size of the entries (multiply a $10\times 10$ matrix by $\tfrac{1}{2}$ and its determinant shrinks by $2^{10}$ even though invertibility is unchanged), so its raw magnitude is a terrible numerical health check. The right measure of "how close to collapse" is the condition number (Chapter 38), not the determinant. For now: $\det(A) = 0$ exactly $\iff$ $A$ is singular, a clean theoretical fact we make rigorous in Chapter 11. Reading too much into the size of a nonzero determinant is a classic error.

How do you recognize a singular matrix at a glance?

Look at the columns (the destinations of the basis vectors, from Chapter 7). If one column is a scalar multiple of another — or, in larger matrices, if any column is a combination of the others — the basis vectors don't land on independent directions, the image is squashed into a lower-dimensional space, and the matrix is singular. The cleanest tells are: a column (or row) of all zeros; two equal columns; one column a multiple of another. Each means the transformation has wasted a dimension, mapping it onto territory already covered by the other columns. When the columns do point in genuinely independent directions — spanning all of $\mathbb{R}^n$ — the matrix is invertible. That phrase "independent directions spanning the whole space" is the geometric soul of the next section's checklist.

9.4 Undoing a transformation in the visualizer

Time to see all of this. We reuse the recurring 2D visualizer introduced in Chapter 1 — verbatim, changing only the matrix and the narration, so that this chapter's figures look identical to every other transformation figure in the book. Here it is exactly as frozen in the style bible, living in toolkit/visualizer.py:

# toolkit/visualizer.py — the recurring 2D transformation visualizer.
# Shows what a 2x2 matrix A does to the unit square and the basis vectors.
import numpy as np
import matplotlib.pyplot as plt

def visualize_2d(A, title="", ax=None):
    """Plot the action of 2x2 matrix A on the unit square and i-hat, j-hat."""
    A = np.asarray(A, dtype=float)
    square = np.array([[0, 1, 1, 0, 0],
                       [0, 0, 1, 1, 0]])          # unit-square corners (closed)
    out = A @ square                               # transformed square
    e1, e2 = A @ np.array([1, 0]), A @ np.array([0, 1])   # images of basis vectors
    if ax is None:
        _, ax = plt.subplots(figsize=(5, 5))
    ax.plot(square[0], square[1], "b--", lw=1, label="input (unit square)")
    ax.fill(out[0], out[1], alpha=0.25, color="C1")
    ax.plot(out[0], out[1], "C1-", lw=2, label="A · (unit square)")
    ax.arrow(0, 0, *e1, color="C3", width=0.02, length_includes_head=True)  # A e1
    ax.arrow(0, 0, *e2, color="C2", width=0.02, length_includes_head=True)  # A e2
    ax.axhline(0, color="gray", lw=0.5); ax.axvline(0, color="gray", lw=0.5)
    ax.set_aspect("equal"); ax.grid(True, alpha=0.3)
    ax.set_title(title or f"det = {np.linalg.det(A):.2f}")
    ax.legend(loc="best", fontsize=8)
    return ax

# Example: a horizontal shear
# visualize_2d([[1, 1], [0, 1]], title="Shear")
# plt.show()

As always: the blue dashed outline is the input unit square; the orange filled region is its image; the red arrow is $A\mathbf{e}_1$ (where east lands, the first column); the green arrow is $A\mathbf{e}_2$ (where north lands, the second column). Let's run a three-panel experiment: a matrix $A$, its inverse $A^{-1}$, and the product $A^{-1}A$ that returns the unit square — and then, alongside, a singular matrix that squashes the square onto a line and can't be brought back.

Take the invertible $A = \begin{bmatrix}2 & 1 \\ 1 & 3\end{bmatrix}$. Its determinant is $2\cdot 3 - 1\cdot 1 = 5$, so it stretches area fivefold and is comfortably invertible. We'll compute its inverse properly in §9.5; numpy tells us it is $A^{-1} = \begin{bmatrix}0.6 & -0.2 \\ -0.2 & 0.4\end{bmatrix}$. Watch the two transformations and their composition:

# Undoing a transformation in the visualizer: A, then A^{-1}, then A^{-1}A = I.
import numpy as np, matplotlib.pyplot as plt
from toolkit.visualizer import visualize_2d

A    = np.array([[2, 1], [1, 3]])
Ainv = np.linalg.inv(A)
print("A^{-1} =", np.round(Ainv, 3).tolist())
print("det A =", round(float(np.linalg.det(A)), 3),
      "  det A^{-1} =", round(float(np.linalg.det(Ainv)), 3))   # reciprocals

fig, axes = plt.subplots(1, 3, figsize=(15, 5))
visualize_2d(A,         title="A: det = 5.00",        ax=axes[0])
visualize_2d(Ainv,      title="A⁻¹: det = 0.20",       ax=axes[1])
visualize_2d(Ainv @ A,  title="A⁻¹A = I: det = 1.00",  ax=axes[2])
plt.show()

A^{-1} = [[0.6, -0.2], [-0.2, 0.4]]
det A = 5.0   det A^{-1} = 0.2

Figure 9.1. Undoing a transformation. Left panel: $A$ turns the blue unit square into a stretched, slanted orange parallelogram of area $5$; the red arrow ($A\mathbf{e}_1$) reaches $(2,1)$ and the green arrow ($A\mathbf{e}_2$) reaches $(1,3)$. Middle panel: $A^{-1}$ is the opposite motion, an area-$\tfrac15$ transformation that shrinks and un-slants; its parallelogram is small and tilted the other way. Right panel: composing them, $A^{-1}A$, returns the orange image exactly onto the blue unit square — red arrow back at $(1,0)$, green arrow back at $(0,1)$, determinant back to $1$. Alt-text: Three side-by-side transformation plots; the first stretches the unit square into a large parallelogram, the second is a small reverse transformation, and the third shows the unit square perfectly restored.

Notice the determinants in the printout: $\det(A) = 5$ and $\det(A^{-1}) = 0.2 = \tfrac{1}{5}$. The inverse undoes the area scaling too — if $A$ multiplies area by $5$, then $A^{-1}$ must multiply it by $\tfrac{1}{5}$ to get back to where we started, so $\det(A^{-1}) = 1/\det(A)$ always. (We prove this properly in Chapter 11, but you can already see why it must hold: undo the stretch, undo the area change.) This is your first quantitative payoff from the visualizer's title number, and it foreshadows that $\det(A) = 0$ has no reciprocal — which is exactly why a zero determinant signals an inverse that cannot exist.

Now the singular case. Run the projection $P = \begin{bmatrix}1 & 0 \\ 0 & 0\end{bmatrix}$ through the same visualizer:

# A singular matrix squashes the square onto a line — there is no way back.
import numpy as np, matplotlib.pyplot as plt
from toolkit.visualizer import visualize_2d
P = np.array([[1, 0], [0, 0]])
print("det P =", np.linalg.det(P), "  rank P =", np.linalg.matrix_rank(P))
visualize_2d(P, title="Singular: det = 0.00 (collapsed to a line)")
plt.show()

det P = 0.0   rank P = 1

Figure 9.2. A singular transformation collapsing space. The orange image is no longer a parallelogram with area — it is a flat orange segment lying along the $x$-axis from $(0,0)$ to $(1,0)$. The two-dimensional square has been crushed onto a one-dimensional line. The red arrow ($P\mathbf{e}_1$) survives at $(1,0)$; the green arrow ($P\mathbf{e}_2$) has collapsed to a zero-length arrow at the origin. Title det = 0.00. There is no transformation that can re-inflate this segment back into a square, because the height information was annihilated — which is precisely what "singular, non-invertible" means. Alt-text: A transformation plot showing the unit square flattened into a horizontal line segment on the x-axis, with one basis-vector arrow surviving and the other collapsed to the origin, illustrating an irreversible singular matrix.

Geometric Intuition — Put the two figures side by side and you have the entire chapter in pictures. An invertible matrix deforms the unit square into a full parallelogram (nonzero area) that another transformation can deform back. A singular matrix flattens the square into a segment (zero area) that no transformation can re-inflate, because re-inflating would require inventing the lost dimension. Invertible = full-dimensional image you can reverse; singular = collapsed image you cannot. The visualizer's det annotation is the tell: nonzero area $\Rightarrow$ invertible, zero area $\Rightarrow$ singular.

9.5 How do you find the inverse of a matrix? (Gauss-Jordan on [A | I])

We've been guessing inverses from geometry and reading them off numpy. Now we earn a systematic algorithm that works for any size and never requires cleverness — the answer to the search query "how to find the inverse of a matrix." The method reuses, almost unchanged, the row-reduction machinery from Chapter 4. Here is the idea, and it is genuinely elegant.

We want a matrix $X$ with $AX = I$. Read that column by column. If the columns of $X$ are $\mathbf{x}_1, \dots, \mathbf{x}_n$ and the columns of $I$ are the standard basis vectors $\mathbf{e}_1, \dots, \mathbf{e}_n$, then $AX = I$ is the same as $n$ separate systems: $$A\mathbf{x}_1 = \mathbf{e}_1, \quad A\mathbf{x}_2 = \mathbf{e}_2, \quad \dots, \quad A\mathbf{x}_n = \mathbf{e}_n.$$ Each system asks "which input does $A$ send to a given basis vector?" — exactly the undo question, one basis vector at a time. We could solve all $n$ systems separately by Gaussian elimination (Chapter 4), but they all have the same coefficient matrix $A$, so we solve them simultaneously by reducing one big augmented matrix with all the right-hand sides stacked together: $[A\,|\,I]$. The same row operations that turn $A$ into $I$ on the left will turn $I$ into the answers on the right.

The Key Insight — To invert $A$, row-reduce the augmented array $[A\,|\,I]$ until the left block becomes $I$; the right block is then $A^{-1}$: $$[\,A \mid I\,] \;\xrightarrow{\text{Gauss-Jordan}}\; [\,I \mid A^{-1}\,].$$ This is the Gauss-Jordan method. It works because the row operations that reduce $A$ to $I$ collectively are the transformation $A^{-1}$ (each elementary row operation is multiplication by an elementary matrix, and their product is $A^{-1}$); applying that same sequence to $I$ produces $A^{-1}I = A^{-1}$. If the left block cannot be reduced to $I$ — a row of zeros appears, i.e. a missing pivot — then $A$ is singular and has no inverse, and the algorithm tells you so.

Let's grind through a clean $3\times 3$ by hand. Take the symmetric matrix $$A = \begin{bmatrix} 2 & 1 & 1 \\ 1 & 2 & 1 \\ 1 & 1 & 2 \end{bmatrix}.$$ Form $[A\,|\,I]$ and reduce. (We follow the Chapter 4 recipe: get a $1$ in each pivot, then clear the rest of that column, working left to right. To keep the arithmetic readable by hand we'll use exact fractions rather than partial pivoting; the answer is identical to what numpy's pivoted reduction returns.)

$$\left[\begin{array}{ccc|ccc} 2 & 1 & 1 & 1 & 0 & 0 \\ 1 & 2 & 1 & 0 & 1 & 0 \\ 1 & 1 & 2 & 0 & 0 & 1 \end{array}\right]$$

Step 1. Scale row 1 by $\tfrac12$ to make the first pivot a $1$: $$\left[\begin{array}{ccc|ccc} 1 & \tfrac12 & \tfrac12 & \tfrac12 & 0 & 0 \\ 1 & 2 & 1 & 0 & 1 & 0 \\ 1 & 1 & 2 & 0 & 0 & 1 \end{array}\right]$$

Step 2. Clear the first column below the pivot: $R_2 \leftarrow R_2 - R_1$ and $R_3 \leftarrow R_3 - R_1$: $$\left[\begin{array}{ccc|ccc} 1 & \tfrac12 & \tfrac12 & \tfrac12 & 0 & 0 \\ 0 & \tfrac32 & \tfrac12 & -\tfrac12 & 1 & 0 \\ 0 & \tfrac12 & \tfrac32 & -\tfrac12 & 0 & 1 \end{array}\right]$$

Step 3. Scale row 2 by $\tfrac23$ to make the second pivot a $1$, then clear the rest of column 2 ($R_1 \leftarrow R_1 - \tfrac12 R_2$, $R_3 \leftarrow R_3 - \tfrac12 R_2$): $$\left[\begin{array}{ccc|ccc} 1 & 0 & \tfrac13 & \tfrac23 & -\tfrac13 & 0 \\ 0 & 1 & \tfrac13 & -\tfrac13 & \tfrac23 & 0 \\ 0 & 0 & \tfrac43 & -\tfrac13 & -\tfrac13 & 1 \end{array}\right]$$

Step 4. Scale row 3 by $\tfrac34$ to make the third pivot a $1$, then clear the rest of column 3 ($R_1 \leftarrow R_1 - \tfrac13 R_3$, $R_2 \leftarrow R_2 - \tfrac13 R_3$): $$\left[\begin{array}{ccc|ccc} 1 & 0 & 0 & \tfrac34 & -\tfrac14 & -\tfrac14 \\ 0 & 1 & 0 & -\tfrac14 & \tfrac34 & -\tfrac14 \\ 0 & 0 & 1 & -\tfrac14 & -\tfrac14 & \tfrac34 \end{array}\right]$$

The left block is now $I$, so the right block is the inverse: $$A^{-1} = \frac14\begin{bmatrix} 3 & -1 & -1 \\ -1 & 3 & -1 \\ -1 & -1 & 3 \end{bmatrix} = \begin{bmatrix} 0.75 & -0.25 & -0.25 \\ -0.25 & 0.75 & -0.25 \\ -0.25 & -0.25 & 0.75 \end{bmatrix}.$$

That symmetric, almost-magical pattern (the inverse of a symmetric matrix is symmetric — a fact we'll explain via §9.6's transpose rule) is a good sign we didn't err. Confirm with numpy, and check by multiplying $A\,A^{-1}$ to recover $I$:

# Gauss-Jordan result, confirmed and checked against the definition A A^{-1} = I.
import numpy as np
A = np.array([[2, 1, 1],
              [1, 2, 1],
              [1, 1, 2]], dtype=float)
Ainv = np.linalg.inv(A)
print(np.round(Ainv, 2))
print(np.round(A @ Ainv, 10))     # should be the identity

[[ 0.75 -0.25 -0.25]
 [-0.25  0.75 -0.25]
 [-0.25 -0.25  0.75]]
[[ 1.  0.  0.]
 [ 0.  1.  0.]
 [ 0.  0.  1.]]

numpy's inverse matches our hand computation entry for entry, and $A\,A^{-1}$ comes back as the identity (to ten decimal places — the tiny floating-point dust is swept to zero by the rounding). You now have a method that never requires inspiration: write $[A\,|\,I]$, reduce, read off the right half.

Common Pitfall — Forgetting to apply each row operation to the augmented (right) half too. Gauss-Jordan only works if every operation you do to the left block is mirrored on the right block — that's how $I$ accumulates into $A^{-1}$. A second frequent slip is stopping at row echelon form (zeros below the pivots) instead of reduced row echelon form (zeros above them as well, with pivots equal to $1$); the inverse appears only when the left block is the full identity, not merely triangular. If a zero row appears in the left block partway through, stop: $A$ is singular and has no inverse.

Let's do one more, smaller and even more carefully, to lock in the mechanics on a $2\times 2$. Invert $A = \begin{bmatrix}2 & 1 \\ 1 & 3\end{bmatrix}$ (the Figure 9.1 matrix) by Gauss-Jordan, so you can compare the full reduction against the shortcut we're about to meet. Start with $[A\,|\,I]$: $$\left[\begin{array}{cc|cc} 2 & 1 & 1 & 0 \\ 1 & 3 & 0 & 1 \end{array}\right] \xrightarrow{R_1 \leftarrow \frac12 R_1} \left[\begin{array}{cc|cc} 1 & \tfrac12 & \tfrac12 & 0 \\ 1 & 3 & 0 & 1 \end{array}\right] \xrightarrow{R_2 \leftarrow R_2 - R_1} \left[\begin{array}{cc|cc} 1 & \tfrac12 & \tfrac12 & 0 \\ 0 & \tfrac52 & -\tfrac12 & 1 \end{array}\right].$$ Now make the second pivot a $1$ and clear above it: $$\xrightarrow{R_2 \leftarrow \frac25 R_2} \left[\begin{array}{cc|cc} 1 & \tfrac12 & \tfrac12 & 0 \\ 0 & 1 & -\tfrac15 & \tfrac25 \end{array}\right] \xrightarrow{R_1 \leftarrow R_1 - \frac12 R_2} \left[\begin{array}{cc|cc} 1 & 0 & \tfrac35 & -\tfrac15 \\ 0 & 1 & -\tfrac15 & \tfrac25 \end{array}\right].$$ The left block is $I$, so $A^{-1} = \begin{bmatrix}3/5 & -1/5 \\ -1/5 & 2/5\end{bmatrix} = \begin{bmatrix}0.6 & -0.2 \\ -0.2 & 0.4\end{bmatrix}$ — the very matrix the visualizer used in Figure 9.1, now derived from scratch by elimination rather than read off numpy. Every fraction had a reason; nothing was guessed.

Check Your Understanding — Use Gauss-Jordan on $[A\,|\,I]$ to invert $A = \begin{bmatrix}1 & 2 \\ 0 & 1\end{bmatrix}$ (a shear). You should need only one row operation. What is $A^{-1}$, and what motion does it perform?

Answer

Start with $\left[\begin{array}{cc|cc} 1 & 2 & 1 & 0 \\ 0 & 1 & 0 & 1 \end{array}\right]$. The only off-diagonal entry to clear is the $2$ in row 1, column 2, so $R_1 \leftarrow R_1 - 2R_2$ gives $\left[\begin{array}{cc|cc} 1 & 0 & 1 & -2 \\ 0 & 1 & 0 & 1 \end{array}\right]$. Hence $A^{-1} = \begin{bmatrix}1 & -2 \\ 0 & 1\end{bmatrix}$. It's a shear in the opposite direction — $A$ slid north two units right (sending $\mathbf{e}_2$ to $(2,1)$), and $A^{-1}$ slides it two units back left. As expected from the §9.5 zoo discussion, the inverse of a shear is a shear with $k$ negated.

The 2×2 shortcut (and why you should still understand the algorithm)

For the $2\times 2$ case there is a famous closed-form shortcut, and it's worth memorizing because you understand where it comes from. For $$A = \begin{bmatrix} a & b \\ c & d \end{bmatrix}, \qquad A^{-1} = \frac{1}{ad - bc}\begin{bmatrix} d & -b \\ -c & a \end{bmatrix} \quad\text{provided } ad - bc \neq 0.$$ Swap the diagonal entries, negate the off-diagonal entries, and divide by the determinant $ad - bc$. The condition $ad - bc \neq 0$ is exactly "$\det(A) \neq 0$": when the determinant is zero you'd be dividing by zero, the algebraic echo of "you can't invert a collapse." Let's apply it to the §9.4 matrix $A = \begin{bmatrix}2 & 1 \\ 1 & 3\end{bmatrix}$: the determinant is $2\cdot3 - 1\cdot1 = 5$, so $$A^{-1} = \frac{1}{5}\begin{bmatrix} 3 & -1 \\ -1 & 2 \end{bmatrix} = \begin{bmatrix} 0.6 & -0.2 \\ -0.2 & 0.4 \end{bmatrix},$$ matching the value the visualizer used in Figure 9.1.

# The 2x2 shortcut, confirmed against numpy.
import numpy as np
a, b, c, d = 2, 1, 1, 3
det = a*d - b*c                       # = 5
shortcut = (1/det) * np.array([[d, -b], [-c, a]])
print(np.round(shortcut, 3))
print(np.round(np.linalg.inv(np.array([[a, b], [c, d]], float)), 3))

[[ 0.6 -0.2]
 [-0.2  0.4]]
[[ 0.6 -0.2]
 [-0.2  0.4]]

Both agree. The shortcut is a special case of the general cofactor formula $A^{-1} = \frac{1}{\det(A)}\operatorname{adj}(A)$, which we postpone to Chapter 11 (it's beautiful but computationally hopeless beyond $3\times 3$ — the work explodes as $n!$, whereas Gauss-Jordan costs only about $n^3$). For computation at any real size, Gauss-Jordan wins; the cofactor formula is for theory and tiny matrices.

Real-World Application — Undoing a color transform (image processing / data science). A digital camera or display stores color as a 3-vector (red, green, blue), and converting between color spaces — RGB to YUV for video compression, or applying a "white balance" correction — is multiplication by a fixed $3\times 3$ matrix $M$ applied to every pixel's color vector. To invert the correction — to recover the original colors from the transformed image, or to display YUV data back as RGB — you apply $M^{-1}$ to every pixel. Because $M$ is the same for all pixels, one matrix inverse undoes the transform for the entire image at once, exactly as $A^{-1}$ in Figure 9.1 sends every point of the plane back home. (Case Study 2 develops this in full: color management is linear algebra, and the round trip RGB $\to$ YUV $\to$ RGB is $M^{-1}M = I$.) When a color transform is not invertible — say it maps all three channels to a single grayscale value, collapsing 3D color onto a 1D line — the original color is gone forever, the same irreversible collapse as Figure 9.2.

How do you undo the standard transformations from Chapter 7?

The Gauss-Jordan algorithm always works, but for the transformation zoo of Chapter 7 you can write down inverses by pure geometry, with no computation at all — and doing so deepens the "undo" picture beautifully. Each standard motion has an obvious reversal.

A scaling $\begin{bmatrix}s_x & 0 \\ 0 & s_y\end{bmatrix}$ stretches the axes by $s_x$ and $s_y$; to undo it, stretch by the reciprocals, giving the inverse $\begin{bmatrix}1/s_x & 0 \\ 0 & 1/s_y\end{bmatrix}$. This is reversible exactly when neither factor is zero — and a zero scaling factor is precisely a collapsed axis (a column of zeros, hence singular), matching §9.3. More generally, the inverse of a diagonal matrix is diagonal with each entry reciprocated, provided no diagonal entry is zero; this is why diagonal matrices are so pleasant, and why so much of the book is about making a matrix diagonal (Chapter 25).

A rotation $R(\theta) = \begin{bmatrix}\cos\theta & -\sin\theta \\ \sin\theta & \cos\theta\end{bmatrix}$ turns the plane counterclockwise by $\theta$; to undo it, turn clockwise by $\theta$, i.e. rotate by $-\theta$. Since $\cos(-\theta) = \cos\theta$ and $\sin(-\theta) = -\sin\theta$, that gives $$R(\theta)^{-1} = R(-\theta) = \begin{bmatrix}\cos\theta & \sin\theta \\ -\sin\theta & \cos\theta\end{bmatrix}.$$ Look closely: this is exactly the transpose of $R(\theta)$ — the off-diagonal signs swapped. So for a rotation, $R^{-1} = R^{\mathsf{T}}$, a remarkable coincidence we flagged in the closing FAQ and will explain fully when we study orthogonal matrices in Chapter 21. A reflection, like $\begin{bmatrix}1 & 0 \\ 0 & -1\end{bmatrix}$ across the $x$-axis, is its own inverse: flip twice and you're back, so $A^{-1} = A$ (and indeed it equals its transpose too). And a shear $\begin{bmatrix}1 & k \\ 0 & 1\end{bmatrix}$ inverts to $\begin{bmatrix}1 & -k \\ 0 & 1\end{bmatrix}$ — slide back by the same amount — as we found by hand in §9.1.

# Geometric inverses of the standard transformations, confirmed by numpy.
import numpy as np
theta = np.radians(30)
R = np.array([[np.cos(theta), -np.sin(theta)],
              [np.sin(theta),  np.cos(theta)]])
print("R^{-1} == R(-theta) == R^T ?",
      np.allclose(np.linalg.inv(R), R.T))                 # rotation: inverse is transpose
S = np.array([[2.0, 0.0], [0.0, 0.5]])
print("scaling inverse:", np.linalg.inv(S).tolist())      # reciprocals on the diagonal
F = np.array([[1.0, 0.0], [0.0, -1.0]])
print("reflection is its own inverse?", np.allclose(np.linalg.inv(F), F))

R^{-1} == R(-theta) == R^T ? True
scaling inverse: [[0.5, 0.0], [0.0, 2.0]]
reflection is its own inverse? True

Every result confirms the geometry: rotations undo by transposing, scalings undo by reciprocating, reflections undo themselves. The only standard transformation with no inverse is the projection, which collapses a dimension — exactly the singular case of Figure 9.2. Spotting which zoo animal you're facing often gives you the inverse faster than any algorithm.

This zoo of invertible motions is exactly the toolbox of real-time graphics. When you move and rotate the camera in a 3D game, the engine applies a transformation to every vertex of the world; to figure out where a screen click lands back in the world, it applies the inverse transformation. Because the camera's motion is built from rotations, scalings, and translations — all individually invertible — the whole pipeline is invertible, and the engine can move freely between "world space" and "view space." The same rotation-inverse-by-transpose trick we just saw is why graphics code can cheaply undo a camera rotation: transposing a $3\times 3$ rotation is far faster than a general inverse. This is the inverse side of the transformations in video game design story, where forward transformations place objects on screen and their inverses turn screen coordinates back into world positions.

Geometric Intuition — Notice the pattern across the zoo: the inverse always belongs to the same family as the original. The inverse of a rotation is a rotation; of a scaling, a scaling; of a shear, a shear. This is no accident — the invertible $2\times 2$ matrices form a group (the general linear group $GL_2$), closed under inversion, and the rotations and the scalings and the shears are each subgroups closed under inversion on their own. You don't need the group-theory vocabulary yet, but the intuition is durable: undoing a motion of a given type produces a motion of the same type, never a wilder one.

9.6 When is a matrix invertible? (The conditions, precisely)

We've collected several signs of invertibility almost in passing — bijective, nonzero determinant, full rank, a pivot in every column. The remarkable fact, and one of the most useful theorems in all of linear algebra, is that these are not separate conditions: they are all the same condition, wearing different clothes. For a square matrix, any one of them holding forces all the others to hold. This bundle is traditionally called the Invertible Matrix Theorem (sometimes the "fundamental theorem of invertible matrices"), and learning to move freely among its faces is a core skill.

The Key Insight (the Invertible Matrix Theorem) — For a square $n\times n$ matrix $A$, the following statements are all equivalent — each one is true exactly when every other one is: 1. $A$ is invertible (there exists $A^{-1}$ with $AA^{-1} = A^{-1}A = I$). 2. The transformation $\mathbf{x}\mapsto A\mathbf{x}$ is bijective (one-to-one and onto). 3. $A$ has full rank: $\operatorname{rank}(A) = n$ (the columns span all of $\mathbb{R}^n$). 4. $A$ has a pivot in every row and every column when row-reduced ($n$ pivots). 5. The columns of $A$ are linearly independent (and so are the rows). 6. $\det(A) \neq 0$. 7. The system $A\mathbf{x} = \mathbf{0}$ has only the trivial solution $\mathbf{x} = \mathbf{0}$ (the null space is just $\{\mathbf{0}\}$). 8. $A\mathbf{x} = \mathbf{b}$ has a unique solution for every $\mathbf{b}$.

A square matrix is invertible $\iff$ any of these holds. Each is a different lens on the single geometric fact: the transformation loses no dimension.

Why are they the same? Walk the geometry. "Loses no dimension" means the image fills all of $\mathbb{R}^n$ — that's full rank (3) and surjectivity (half of 2). It means no nonzero vector gets crushed to zero — that's the trivial null space (7) and injectivity (the other half of 2). It means the $n$ destination-arrows (columns) point in independent directions — that's (5), which is the same as having a genuine pivot in each column (4). It means the unit cube isn't flattened — so it still has volume, $\det(A) \neq 0$ (6), which Chapter 11 will pin down. And bijectivity (2) is exactly the condition we proved in §9.2 lets you build a two-sided inverse (1) and solve every system uniquely (8). They are one idea told eight ways. The practical upshot is enormous: to check invertibility you pick whichever face is cheapest. By hand on a $2\times 2$? Use the determinant. On a $4\times 4$ with a zero column staring at you? Use "columns dependent." Writing code? Row-reduce and count pivots, or call np.linalg.matrix_rank.

# Three faces of the SAME condition agree on whether a matrix is invertible.
import numpy as np
def report(A, name):
    A = np.array(A, float)
    n = A.shape[0]
    print(f"{name}: rank={np.linalg.matrix_rank(A)} (need {n}),",
          f"det={round(float(np.linalg.det(A)), 3)},",
          "INVERTIBLE" if np.linalg.matrix_rank(A) == n else "SINGULAR")
report([[2, 1], [1, 3]], "A (shear+stretch)")   # rank 2, det 5  -> invertible
report([[1, 2], [2, 4]], "S (collapsed)")        # rank 1, det 0  -> singular
report([[1, 0, 0], [0, 1, 0], [0, 0, 0]], "Z")   # rank 2 of 3    -> singular

A (shear+stretch): rank=2 (need 2), det=5.0, INVERTIBLE
S (collapsed): rank=1 (need 2), det=0.0, SINGULAR
Z: rank=2 (need 3), det=0.0, SINGULAR

Rank and determinant agree on every case, because they are reporting the same underlying fact. The third matrix has rank $2$ but lives in $\mathbb{R}^3$ (it needs rank $3$), so its third dimension was lost — singular, even though it isn't obviously degenerate at a glance.

Check Your Understanding — Without computing a determinant, decide whether $A = \begin{bmatrix} 1 & 3 & 2 \\ 0 & 0 & 5 \\ 2 & 6 & 1 \end{bmatrix}$ is invertible. (Hint: look hard at the columns, or the rows.)

Answer

Look at the columns: column 2 is $(3, 0, 6) = 3 \times (1, 0, 2) = 3 \times$ column 1. The columns are linearly dependent, so by face (5) of the theorem the matrix is singular — no inverse. Geometrically, two of the three destination-arrows point along the same line, so the transformation collapses at least one dimension; its image can't fill all of $\mathbb{R}^3$. You didn't need the determinant (it's $0$, of course) — spotting one dependent column was faster.

What does the Gauss-Jordan algorithm do when you hand it a singular matrix? It can't reduce the left block to $I$, and the failure is visible as a row of zeros. Watch it try to invert the singular $S = \begin{bmatrix}1 & 2 \\ 2 & 4\end{bmatrix}$ (column 2 is twice column 1): $$\left[\begin{array}{cc|cc} 1 & 2 & 1 & 0 \\ 2 & 4 & 0 & 1 \end{array}\right] \xrightarrow{R_2 \leftarrow R_2 - 2R_1} \left[\begin{array}{cc|cc} 1 & 2 & 1 & 0 \\ 0 & 0 & -2 & 1 \end{array}\right].$$ The bottom-left is now all zeros: there is no pivot in the second column, so the left block can never become $I$, and the algorithm halts and reports "singular." The right block $\begin{bmatrix}-2 & 1\end{bmatrix}$ in that dead row is meaningless as an inverse — it's the algorithm's way of saying $0 = -2x_1 + x_2$ has no solution that recovers a unique input. A missing pivot is the computational fingerprint of collapse, the exact same fact as "rank $< n$" (face 4 $\equiv$ face 3) and as the geometric flattening of §9.3. The eight faces of the theorem really are one.

Real-World Application — Equilibrium in a production economy (economics). Wassily Leontief won the 1973 Nobel Prize in Economics for modeling how the sectors of an economy feed one another. Let $C$ be the consumption matrix whose entry $C_{ij}$ is the amount of sector $i$'s output needed to make one unit of sector $j$'s output. To meet an external final demand $\mathbf{d}$, total output $\mathbf{x}$ must satisfy "output minus what's consumed internally equals demand," i.e. $\mathbf{x} - C\mathbf{x} = \mathbf{d}$, or $(I - C)\mathbf{x} = \mathbf{d}$. The economy can meet any demand with a unique production plan exactly when $I - C$ is invertible, and the plan is $\mathbf{x} = (I - C)^{-1}\mathbf{d}$ — the so-called Leontief inverse, whose entries reveal the total (direct plus indirect) requirements rippling through every supply chain. If $I - C$ were singular, some demands would be unmeetable and others would have no unique plan — the invertibility conditions of this section deciding whether an entire economy can function. Case Study 1 works a two-sector example end to end, and it is a perfect illustration of §9.8's lesson: economists interpret the inverse $(I-C)^{-1}$ as a meaningful object (the multiplier matrix), one of the genuine cases where you want the inverse itself, not just a single solution.

Math-Major Sidebar (optional) — Why "square" matters, and the left/right inverse story. The theorem's equivalences are special to square matrices. For a general $m\times n$ matrix, injectivity (trivial null space, full column rank $n$) gives a left inverse $B$ with $BA = I_n$; surjectivity (full row rank $m$) gives a right inverse $C$ with $AC = I_m$. A matrix has a two-sided inverse iff it has both, which forces $m = n$ and full rank. For a tall matrix ($m > n$) with independent columns, the canonical left inverse is $(A^{\mathsf{T}}A)^{-1}A^{\mathsf{T}}$ — the pseudoinverse that solves least-squares problems (Chapter 19). So "non-square matrices aren't invertible" is true for the two-sided notion but hides a rich one-sided theory that powers regression and overdetermined systems. We return to it with the four fundamental subspaces in Part III.

9.7 Why does a one-sided inverse of a square matrix have to be two-sided? (A proof)

Our definition demanded both $A^{-1}A = I$ and $AA^{-1} = I$, which seems like two separate things to verify. For square matrices it turns out you only ever need to check one — a clean theorem that also explains why the Gauss-Jordan method, which builds $X$ with $AX = I$, automatically delivers a genuine two-sided inverse.

Why we care. Checking one matrix equation is half the work of checking two, and the result reassures us that the Gauss-Jordan output (which solves $AX = I$, a right inverse) is the full inverse, not just half of one. It also patches a subtle gap: without this theorem, "invertible" would be ambiguous about sidedness.

Key idea. A square matrix with a one-sided inverse must be full-rank (it can't have collapsed a dimension), and a full-rank square matrix's one-sided inverse is forced to work on the other side too, because the underlying transformation is bijective.

Theorem. Let $A$ be a square $n\times n$ matrix. If there exists a matrix $B$ with $BA = I$, then also $AB = I$, so $B = A^{-1}$. (And symmetrically, $AB = I$ alone forces $BA = I$.)

Proof. Suppose $BA = I$. We first show $A$ has trivial null space. If $A\mathbf{x} = \mathbf{0}$, then multiplying on the left by $B$ gives $\mathbf{x} = I\mathbf{x} = (BA)\mathbf{x} = B(A\mathbf{x}) = B\mathbf{0} = \mathbf{0}$. So the only solution of $A\mathbf{x} = \mathbf{0}$ is $\mathbf{x} = \mathbf{0}$ — face (7) of the Invertible Matrix Theorem. For a square matrix, a trivial null space forces full rank $n$ (the $n$ columns are independent, hence span $\mathbb{R}^n$; this is the square case of the Rank–Nullity theorem, Chapter 14, but here it's immediate: $n$ independent vectors in $\mathbb{R}^n$ are a basis). Full rank means the columns span $\mathbb{R}^n$, so $A\mathbf{x} = \mathbf{b}$ is solvable for every $\mathbf{b}$ — in particular for each standard basis vector $\mathbf{e}_j$, pick $\mathbf{c}_j$ with $A\mathbf{c}_j = \mathbf{e}_j$, and let $C$ be the matrix with columns $\mathbf{c}_j$, so $AC = I$ (a right inverse exists). Now we have both $BA = I$ and $AC = I$. Associativity (Chapter 8) finishes it: $$B = B I = B(AC) = (BA)C = I C = C.$$ So $B = C$, and therefore $AB = AC = I$ as well. Hence $B$ is a genuine two-sided inverse, $B = A^{-1}$. $\blacksquare$

What this means. For square matrices, "left inverse," "right inverse," and "inverse" all coincide, and they're unique (the proof's $B = C$ also shows any two inverses are equal). Concretely: when Gauss-Jordan produces an $X$ with $AX = I$, you get $XA = I$ for free — no need to re-reduce. The theorem fails the moment $A$ isn't square, which is exactly why the Math-Major Sidebar's one-sided inverses are a genuinely different (and useful) animal.

Geometric Intuition — The proof is really geometric. $BA = I$ says "$A$ first, then $B$, does nothing," so $A$ can't have destroyed any information (if it had, $B$ couldn't reconstruct it) — meaning $A$ is injective, hence (being square) bijective, hence reversible on both sides. You can't half-undo a reversible motion: if a motion can be perfectly retraced, retracing it forward then backward also lands you home. Square-ness is what guarantees "injective" upgrades to "bijective."

9.8 Why do you rarely actually invert a matrix to solve Ax = b?

Here is the most practically important section of the chapter, and the one most likely to save you from writing slow, inaccurate code. The inverse suggests an obvious way to solve a linear system: from $A\mathbf{x} = \mathbf{b}$, multiply both sides on the left by $A^{-1}$ to get $$\mathbf{x} = A^{-1}\mathbf{b}.$$ This is correct mathematics and a terrible computational plan. In real numerical work — in numpy, MATLAB, LAPACK, every serious library — you almost never form $A^{-1}$ to solve a system. You solve directly, using elimination or, equivalently, the LU factorization of Chapter 10.

Common Pitfall — Coding x = np.linalg.inv(A) @ b to solve $A\mathbf{x} = \mathbf{b}$. Use x = np.linalg.solve(A, b) instead. The solve route factors $A$ once (LU, Chapter 10) and runs forward/back substitution — it is faster (computing the full inverse does extra work you then throw away) and more accurate (forming and then multiplying by $A^{-1}$ accumulates more floating-point error than solving in place). The numerical-analysis maxim is blunt and worth memorizing: "never invert a matrix to solve a system." Reserve inv for the rare cases where you genuinely need the inverse as an object (e.g. a covariance matrix's inverse interpreted as a precision matrix, or a one-time color-transform inverse reused on millions of pixels).

Two reasons, one about speed and one about accuracy. Speed: solving $A\mathbf{x} = \mathbf{b}$ by elimination costs about $\tfrac13 n^3$ operations. Computing $A^{-1}$ costs about $n^3$ (it's like solving $n$ systems, one per column of $I$), and then you still pay $n^2$ more to multiply $A^{-1}\mathbf{b}$ — so inverting does roughly three times the work to get the same answer, and that's before counting the cost if you only ever needed one solution. Accuracy: every floating-point operation introduces a little rounding error, and the two-step "build $A^{-1}$, then multiply" pipeline compounds more error than solving in one pass; for an ill-conditioned $A$ (Chapter 38), the difference can be the gap between a usable answer and garbage. numpy's solve is engineered to take the good path.

# Solve Ax = b: the right way (solve) and the wrong-but-correct way (inverse).
import numpy as np
A = np.array([[2, 1, 1],
              [1, 2, 1],
              [1, 1, 2]], dtype=float)
b = np.array([4, 5, 6], dtype=float)
x_solve = np.linalg.solve(A, b)            # preferred: factor + substitute
x_inv   = np.linalg.inv(A) @ b             # discouraged: build inverse, then multiply
print("solve:", np.round(x_solve, 4))
print("inv  :", np.round(x_inv,   4))
print("same? ", np.allclose(x_solve, x_inv))

solve: [0.25 1.25 2.25]
inv  : [0.25 1.25 2.25]
same?  True

Both give $\mathbf{x} = (0.25, 1.25, 2.25)$ here — on a small, well-behaved system the answers coincide, so this isn't about getting a different number on a toy problem. It's about cost and robustness at scale: when $A$ is $10{,}000\times 10{,}000$, or merely poorly conditioned, solve is the one you want, every time. The inverse is a concept to understand (it tells you the transformation is reversible and how) far more often than an object to compute.

The operation counts make the speed argument concrete. For an $n\times n$ system, the work scales like $n^3$, and the constants matter at scale:

Task	Method	Approx. operations
Solve $A\mathbf{x} = \mathbf{b}$ once	LU factor + substitute (`solve`)	$\tfrac13 n^3$
Form $A^{-1}$	Gauss-Jordan / $n$ solves	$\approx n^3$
Then compute $A^{-1}\mathbf{b}$	matrix-vector multiply	$n^2$

So the inverse route does roughly three times the arithmetic of a direct solve and still owes a matrix-vector multiply at the end. If you have many right-hand sides $\mathbf{b}_1, \dots, \mathbf{b}_k$ sharing the same $A$, you still don't invert: you factor $A$ once (LU, Chapter 10) and reuse the factors for each substitution — cheaper and more accurate than forming $A^{-1}$ and multiplying $k$ times. The accuracy gap is just as real. Forming $A^{-1}$ and then multiplying chains two rounds of floating-point error, and for an ill-conditioned matrix (one near collapse — large condition number, Chapter 38) that extra round can corrupt digits that a direct solve would have preserved. This is not pedantry; it is why every numerical library routes "solve a system" away from inv.

# The inverse route and the direct solve agree here, but inv does more work.
import numpy as np
rng = np.random.default_rng(0)
A = rng.standard_normal((6, 6))
b = rng.standard_normal(6)
x_solve = np.linalg.solve(A, b)
x_inv   = np.linalg.inv(A) @ b
print("max difference between the two routes:", float(np.max(np.abs(x_solve - x_inv))))
print("both satisfy A x = b ?",
      np.allclose(A @ x_solve, b), np.allclose(A @ x_inv, b))

max difference between the two routes: 8.881784197001252e-16
both satisfy A x = b ? True True

On a small well-conditioned random matrix the two answers differ only in the last bit (about $10^{-16}$, the floating-point noise floor) — so again, this is about cost and robustness, not about getting a visibly wrong number on easy problems. Scale $A$ up, or push it toward singularity, and the direct solve is the one that holds together.

The Key Insight — $A^{-1}$ is primarily a theoretical and conceptual tool — it certifies reversibility, undoes color/coordinate transforms, and makes derivations clean ($\mathbf{x} = A^{-1}\mathbf{b}$, the normal equations, $A = PDP^{-1}$ in Chapter 25). For the single most common task, solving a system, prefer elimination/solve. "Write $A^{-1}\mathbf{b}$ on paper; type solve(A, b) in code" is a habit that will serve you for the rest of your computational life. This is one of the recurring tensions of the book — theory and computation each have their place (theme 3) — made completely concrete. The same preference scales up to the most demanding settings, including the large sparse and iterative solvers surveyed in numerical methods for large systems, where forming an explicit inverse is not merely wasteful but often impossible.

FAQ: If I shouldn't compute the inverse, why learn it at all?

Because understanding that a transformation is invertible, and why, is indispensable even when you never write its inverse down. Invertibility is the gatekeeper for whether a system has a unique solution (face 8 of the theorem), whether a change of coordinates is reversible (Chapter 16), whether a matrix can be diagonalized as $A = PDP^{-1}$ (Chapter 25 — note the $P^{-1}$ in the formula, which we manipulate symbolically without ever inverting a number). The inverse is one of the load-bearing concepts of linear algebra and one of the rarely-computed objects — and keeping those two facts in mind at once is exactly the maturity this chapter is building. You learn to invert by hand (Gauss-Jordan) to understand the inverse; you call solve to use linear systems.

9.9 What rules do inverses obey? (Properties)

Inverses interact with the other matrix operations from Chapter 8 in clean, memorable ways — with one famous twist that trips up nearly everyone. We collect the essential properties, each with a one-line geometric reason, because rules you understand are rules you don't forget.

1. The inverse of the inverse is the original. $(A^{-1})^{-1} = A$. Undoing the undo is doing — obviously. If $A^{-1}$ retraces $A$ backward, then retracing that backward is just $A$ again.

2. The inverse of a product reverses the order. This is the famous one: $$\boxed{\,(AB)^{-1} = B^{-1}A^{-1}\,}$$ Note the order flip — it is $B^{-1}A^{-1}$, not $A^{-1}B^{-1}$. The reason is pure common sense once you remember $AB$ means "do $B$, then $A$" (composition, Chapter 8). To undo "socks then shoes," you must take off the shoes first, then the socks: you reverse the most recent action first. So to undo $AB$ — which did $B$ and then $A$ — you first undo $A$ (apply $A^{-1}$), then undo $B$ (apply $B^{-1}$), giving the combined undo $B^{-1}A^{-1}$. We can verify the rule (and the failure of the wrong order) in numpy:

# (AB)^{-1} = B^{-1} A^{-1}  — order reversed. The wrong order generally fails.
import numpy as np
A = np.array([[2, 0], [0, 1]], float)     # scale x by 2
B = np.array([[1, 1], [0, 1]], float)     # shear
AB = A @ B
print("(AB)^{-1}   =", np.round(np.linalg.inv(AB), 3).tolist())
print("B^{-1}A^{-1} =", np.round(np.linalg.inv(B) @ np.linalg.inv(A), 3).tolist())
print("A^{-1}B^{-1} =", np.round(np.linalg.inv(A) @ np.linalg.inv(B), 3).tolist(), "<- WRONG order")

(AB)^{-1}   = [[0.5, -1.0], [0.0, 1.0]]
B^{-1}A^{-1} = [[0.5, -1.0], [0.0, 1.0]]
A^{-1}B^{-1} = [[0.5, -0.5], [0.0, 1.0]] <- WRONG order

The reversed order $B^{-1}A^{-1}$ nails $(AB)^{-1}$ exactly; the un-reversed $A^{-1}B^{-1}$ gives a different matrix (look at the top-right entry: $-1.0$ versus $-0.5$). The proof is one line of associativity: $(AB)(B^{-1}A^{-1}) = A(BB^{-1})A^{-1} = AIA^{-1} = AA^{-1} = I$, and likewise on the other side, so $B^{-1}A^{-1}$ is the inverse of $AB$ by definition.

Common Pitfall — Writing $(AB)^{-1} = A^{-1}B^{-1}$ (the un-reversed order). This is the single most common inverse error, and it is wrong precisely because matrix multiplication does not commute (Chapter 8): $A^{-1}B^{-1} \neq B^{-1}A^{-1}$ in general. Always reverse the order: $(AB)^{-1} = B^{-1}A^{-1}$. The mnemonic is dressing: you put on socks ($B$) then shoes ($A$); to undo, you remove shoes ($A^{-1}$) then socks ($B^{-1}$) — last on, first off. The same order-reversal governs the transpose, $(AB)^{\mathsf{T}} = B^{\mathsf{T}}A^{\mathsf{T}}$ (Chapter 8), which is no coincidence.

3. The inverse of a transpose is the transpose of the inverse. $(A^{\mathsf{T}})^{-1} = (A^{-1})^{\mathsf{T}}$. Because these two operations commute, the notation $A^{-\mathsf{T}}$ is unambiguous and used freely. The proof is again one line: transpose both sides of $A^{-1}A = I$ using $(XY)^{\mathsf{T}} = Y^{\mathsf{T}}X^{\mathsf{T}}$ to get $A^{\mathsf{T}}(A^{-1})^{\mathsf{T}} = I^{\mathsf{T}} = I$, which says $(A^{-1})^{\mathsf{T}}$ is the inverse of $A^{\mathsf{T}}$.

# (A^T)^{-1} = (A^{-1})^T : inverse and transpose commute.
import numpy as np
A = np.array([[2, 1], [1, 3]], float)
print("(A^T)^{-1} =", np.round(np.linalg.inv(A.T), 3).tolist())
print("(A^{-1})^T =", np.round(np.linalg.inv(A).T, 3).tolist())

(A^T)^{-1} = [[0.6, -0.2], [-0.2, 0.4]]
(A^{-1})^T = [[0.6, -0.2], [-0.2, 0.4]]

They match. (This rule, by the way, is why the inverse of a symmetric matrix is symmetric — as we saw in §9.5's tidy $A^{-1}$. If $A^{\mathsf{T}} = A$, then $(A^{-1})^{\mathsf{T}} = (A^{\mathsf{T}})^{-1} = A^{-1}$, so $A^{-1}$ is symmetric too.)

4. Scalars and determinants. For a nonzero scalar $c$, $(cA)^{-1} = \tfrac1c A^{-1}$ (scale up by $c$, undo by scaling down by $c$). And, as §9.4 previewed, $\det(A^{-1}) = 1/\det(A)$ — the inverse undoes the area/volume scaling. Both are exactly what "undo" demands.

5. Powers invert by negating the exponent. If you apply $A$ three times, you undo it by applying $A^{-1}$ three times: $(A^3)^{-1} = (A^{-1})^3$, which we write $A^{-3}$. More generally $(A^k)^{-1} = (A^{-1})^k = A^{-k}$ for any positive integer $k$, and with $A^0 = I$ this extends the exponent laws $A^m A^n = A^{m+n}$ to all integers — exactly as for ordinary numbers. The order-reversal of property 2 doesn't bite here because $A$ commutes with itself: undoing "$A$ then $A$ then $A$" peels off one $A^{-1}$ at a time, but every factor is the same, so the reversed order looks identical. This is the rule behind solving systems like $A^{10}\mathbf{x} = \mathbf{b}$ symbolically ($\mathbf{x} = A^{-10}\mathbf{b}$), and it foreshadows the powers-of-a-matrix machinery that diagonalization makes effortless in Chapter 25, where $A^k = PD^kP^{-1}$ reduces a matrix power to a scalar power.

The Key Insight — The inverse plays beautifully with the operations of Chapter 8, but it reverses order on products (and on transposes): $(AB)^{-1} = B^{-1}A^{-1}$. Geometry explains every rule: undo means retrace, and retracing a sequence means peeling off the last action first. If you remember nothing else from this section, remember to flip the order.

Historical Note — The systematic elimination procedure at the heart of $[A\,|\,I]$ reduction is named for Carl Friedrich Gauss (early 1800s), though essentially the same method appears in the ancient Chinese text The Nine Chapters on the Mathematical Art, roughly two millennia earlier [verify]. The "Jordan" of Gauss–Jordan is usually attributed to the geodesist Wilhelm Jordan, who popularized the full reduction to $I$ in an 1888 handbook — not the mathematician Camille Jordan of Jordan normal form (Chapter 36), a frequent and understandable confusion [verify]. The notion of a matrix inverse as the algebraic embodiment of "undoing a substitution" goes back to Arthur Cayley's matrix algebra of 1858, the same memoir that gave us matrix multiplication.

9.10 Build Your Toolkit

This chapter's toolkit contribution turns the Gauss-Jordan algorithm of §9.5 into running code — and, satisfyingly, it reuses the row reduction you already wrote in Chapter 4 (toolkit/linear_systems.py). Building the inverse from scratch cements the insight that inverting is just solving $[A\,|\,I]$, and that detecting singularity is just detecting a missing pivot.

Build Your Toolkit — Implement inverse(A) in toolkit/inverse.py, in pure Python (no numpy in the implementation): - Form the augmented matrix $[A\,|\,I]$ — concatenate each row of A with the corresponding row of the identity. - Run Gauss-Jordan on it. The cleanest route is to reuse row_reduce from Chapter 4's toolkit/linear_systems.py: call R, pivots = row_reduce(augmented), which already does partial-pivoting RREF. - Detect singular: if the left $n\times n$ block of R is not the identity (equivalently, there are fewer than $n$ pivots, or some pivot column lies in the right half), raise ValueError("matrix is singular"). Otherwise return the right $n\times n$ block — that's $A^{-1}$. - Verify against numpy: for several random invertible A, confirm inverse(A) matches np.linalg.inv(A) (use np.allclose), and confirm inverse raises on a singular matrix like [[1, 2], [2, 4]].

A reference skeleton (you fill in the body):

```python

toolkit/inverse.py — matrix inverse via Gauss-Jordan on [A | I] (Chapter 9).

from toolkit.linear_systems import row_reduce # reuse Chapter 4's RREF

def inverse(A): """Return A^{-1} via Gauss-Jordan on [A | I]. Raise ValueError if singular.""" n = len(A) if any(len(row) != n for row in A): raise ValueError("inverse expects a square matrix") # build [A | I] aug = [[float(A[i][j]) for j in range(n)] + [1.0 if i == k else 0.0 for k in range(n)] for i in range(n)] R, pivots = row_reduce(aug) # reduced row echelon form # invertible <=> pivots are exactly the n left columns 0..n-1 if pivots != list(range(n)): raise ValueError("matrix is singular: no inverse exists") return [row[n:] for row in R] # right half is A^{-1} ``` The whole function is six lines of logic on top of Chapter 4. That is the toolkit philosophy in miniature: each chapter's hard-won routine becomes the next chapter's building block. Chapter 10 will reuse this same elimination yet again to build the LU factorization — the efficient way to solve systems we championed in §9.8.

A quick verification you can run:

# Verify the from-scratch inverse against numpy, and check singular detection.
import numpy as np
from toolkit.inverse import inverse
A = [[2, 1, 1], [1, 2, 1], [1, 1, 2]]
print(np.allclose(inverse(A), np.linalg.inv(np.array(A, float))))   # True
try:
    inverse([[1, 2], [2, 4]])         # singular: should raise
except ValueError as e:
    print("correctly refused:", e)

True
correctly refused: matrix is singular: no inverse exists

Your from-scratch inverse agrees with numpy on the invertible case and correctly refuses the singular one — the same two outcomes the geometry demanded in §9.3 and §9.4.

Computational Note — numpy's np.linalg.inv does not literally run Gauss-Jordan; it calls LAPACK, which computes an LU factorization (Chapter 10) and then solves $AX = I$ column by column with partial pivoting for stability. The mathematical result is the same as our $[A\,|\,I]$ reduction, but the engineering is tuned for speed and floating-point robustness. And recall §9.8: even inv should usually yield to np.linalg.solve when your real goal is solving a system — inv is for when you truly need the inverse matrix itself.

9.11 What did we just learn, and where does it go?

Step back and see the arc. Chapter 7 made a matrix a verb; Chapter 8 chained verbs by composition; this chapter gave that algebra a past tense. You can now look at a transformation and ask, with full precision, can I undo this? — and you have both the geometric instinct and the computational tools to answer.

The inverse $A^{-1}$ is the transformation that undoes $A$ (§9.1): $A^{-1}A = AA^{-1} = I$. Geometry first — reverse the stretch, reverse the rotation, reverse the shear — algebra second.
A transformation is reversible exactly when it's bijective (§9.2): no two inputs collide, every output is reached. For square matrices, injective and surjective coincide (§9.7).
Singular matrices collapse space (§9.3, §9.4): they flatten $\mathbb{R}^n$ onto a lower-dimensional shadow, destroying the information needed to reverse them. We watched the unit square spring back under $A^{-1}A$ and watched it crushed flat by a singular projection.
Gauss-Jordan on $[A\,|\,I]$ computes the inverse (§9.5), reusing Chapter 4's row reduction; reduce the left block to $I$ and read $A^{-1}$ off the right, or discover singularity when a pivot goes missing.
The invertibility conditions are all one condition (§9.6): invertible $\iff$ bijective $\iff$ full rank $\iff$ a pivot in every column $\iff$ independent columns $\iff$ $\det \neq 0$ $\iff$ trivial null space $\iff$ unique solutions. Pick the cheapest face to check.
Inverses obey clean rules (§9.9), with the headline order-reversal $(AB)^{-1} = B^{-1}A^{-1}$ and the commuting $(A^{\mathsf{T}})^{-1} = (A^{-1})^{\mathsf{T}}$.
You rarely invert to solve (§9.8): write $\mathbf{x} = A^{-1}\mathbf{b}$ on paper, type solve(A, b) in code.

The recurring themes are all here. Linear algebra is the study of transformations — and now also of reversing them (theme 1). Geometry and algebra are one object — "loses no dimension," "$\det \neq 0$," and "full rank" are the same fact in three languages, which is the whole point of the Invertible Matrix Theorem (theme 2). And computation validates theory while theory guides computation — every inverse matched numpy, your toolkit gained a from-scratch inverse, and §9.8's lesson is precisely about which computation to trust (theme 3).

Three doors open onto the rest of Part II and beyond. How do real libraries solve systems efficiently, if not by inverting? By factoring — Chapter 10's LU and PLU decomposition, the efficient backbone we promised in §9.8, built from this very elimination. What is this determinant we keep invoking as the invertibility tell? Chapter 11 makes "$\det(A) = 0 \iff$ singular" rigorous and reveals the determinant as the signed volume-scaling factor the visualizer has been whispering in every title. And the deepest thread: the null space we met in §9.3 (the vectors a singular matrix kills) and the column space we met in Chapter 7 (where outputs live) are two of the four fundamental subspaces that organize all of linear algebra — the subject of Part III, where "invertible" finally takes its place inside the grand structural picture of what a matrix can reach and what it destroys.

FAQ: Is the inverse the same as the transpose, or as "dividing by a matrix"?

No to both, and the confusions are worth clearing. The transpose $A^{\mathsf{T}}$ flips rows and columns; it is generally a different transformation from $A^{-1}$ and usually doesn't undo $A$ at all (a horizontal shear's transpose is a vertical shear, not its inverse — Chapter 7's pitfall). The one elegant exception is orthogonal matrices (rotations and reflections, Chapter 21), where it happens that $A^{-1} = A^{\mathsf{T}}$ — undoing a rotation really is just transposing it, a fact that makes orthogonal matrices computationally golden. As for "dividing by a matrix": there is no division operator for matrices, because multiplication doesn't commute, so "$\mathbf{b}/A$" is ambiguous (left or right?). Multiplying by $A^{-1}$ is the closest thing, and you must be careful which side: $A\mathbf{x} = \mathbf{b}$ gives $\mathbf{x} = A^{-1}\mathbf{b}$ (left-multiply), never $\mathbf{b}A^{-1}$. The inverse is "the undo," full stop — not a transpose, not a division. Hold that picture and the algebra will never mislead you.