Appendix E — Proof Techniques

A handful of chapters in this book prove things, and all of the Math-Major Sidebars do. Appendix A reminded you what a proof is — a derivation in which every step follows from a definition or an earlier result, where "it looks true" is never a step. This appendix is the toolbox: the small number of proof shapes that, between them, cover essentially every argument in the book. Each one comes with a worked linear-algebra example, written in the book's standard four-part style — why we care, the key idea in one sentence, the proof, and what it means — so you see the technique doing real work, not on a toy from outside the subject.

The encouraging fact is that linear algebra uses few proof methods, and the same ones recur. Learn these six patterns and you can read — and write — the proofs in this book. The recurring first move, true across all of them, is the one from Appendix A: unfold the definitions. Replace the word ("subspace," "independent," "injective") with the literal statement it abbreviates, and the path usually appears.

E.1 Direct proof

A direct proof assumes the hypotheses and reaches the conclusion by a chain of valid steps. Most proofs in this book are direct; it is the default, and you reach for the others only when the direct road is blocked.

Worked example — the subspace test. Claim: the null space $N(A) = {\mathbf{x} : A\mathbf{x}=\mathbf{0}}$ of an $m\times n$ matrix $A$ is a subspace of $\mathbb{R}^n$.

Why we care. Calling $N(A)$ a "space" (Ch. 13) is only licensed if it really satisfies the subspace axioms; this is the proof that the four fundamental subspaces deserve the name.

Key idea. A nonempty set is a subspace exactly when it is closed under addition and scalar multiplication (the subspace test, Ch. 6) — so just check those two closures directly, using the linearity of $A$.

Proof. First, $N(A)$ is nonempty: $A\mathbf{0}=\mathbf{0}$, so $\mathbf{0}\in N(A)$. Now take any $\mathbf{u},\mathbf{v}\in N(A)$ and any scalar $c$; by definition $A\mathbf{u}=\mathbf{0}$ and $A\mathbf{v}=\mathbf{0}$. Using linearity of the map $\mathbf{x}\mapsto A\mathbf{x}$ (Ch. 7), $$ A(c\mathbf{u}+\mathbf{v}) = c\,A\mathbf{u} + A\mathbf{v} = c\,\mathbf{0} + \mathbf{0} = \mathbf{0}, $$ so $c\mathbf{u}+\mathbf{v}\in N(A)$. The set is closed under linear combinations, hence it is a subspace. $\blacksquare$

What it means. The proof used only the linearity of $A$ — which is why the same argument shows the kernel of any linear map is a subspace (Ch. 35). The single combined check $c\mathbf{u}+\mathbf{v}\in N(A)$ folds both closures into one line; that is the standard efficient form of the subspace test.

E.2 Proof by contrapositive

The statement "if $P$ then $Q$" is logically identical to its contrapositive, "if not $Q$ then not $P$." Proving the contrapositive proves the original. Reach for it when the negation of the conclusion is more concrete to work with than the conclusion itself.

Worked example — a linear map is injective iff its null space is trivial. Claim: the linear map $T(\mathbf{x})=A\mathbf{x}$ is injective (one-to-one) if and only if $N(A)=\{\mathbf{0}\}$. We prove the forward direction's contrapositive and the reverse directly.

Why we care. This is the bridge between an algebraic fact you can compute (is the null space just the zero vector?) and a structural property you care about (does the map lose information?). It is used constantly from Chapter 13 onward.

Key idea. Injectivity says different inputs give different outputs; its failure gives two inputs with the same output, and their difference lands in the null space.

Proof. ($\Rightarrow$, by contrapositive.) Suppose $N(A)\ne\{\mathbf{0}\}$ — that is, not trivial. Then there is a nonzero $\mathbf{z}$ with $A\mathbf{z}=\mathbf{0}$. But also $A\mathbf{0}=\mathbf{0}$, so the two distinct inputs $\mathbf{z}$ and $\mathbf{0}$ share the output $\mathbf{0}$; $T$ is not injective. We have shown "not trivial null space $\Rightarrow$ not injective," the contrapositive of "injective $\Rightarrow$ trivial null space."

($\Leftarrow$, directly.) Suppose $N(A)=\{\mathbf{0}\}$ and $A\mathbf{u}=A\mathbf{v}$. Then $A(\mathbf{u}-\mathbf{v})=\mathbf{0}$ by linearity, so $\mathbf{u}-\mathbf{v}\in N(A)={\mathbf{0}}$, forcing $\mathbf{u}-\mathbf{v}=\mathbf{0}$, i.e. $\mathbf{u}=\mathbf{v}$. Hence $T$ is injective. $\blacksquare$

What it means. A linear map collapses information only through its null space — if nothing but $\mathbf{0}$ is sent to $\mathbf{0}$, nothing is conflated. This is why "trivial null space" appears all over the Invertible Matrix Theorem (Appendix B.5): for a square matrix it is equivalent to invertibility.

Common Pitfall — The contrapositive of "if $P$ then $Q$" is "if not $Q$ then not $P$" — you negate and swap both parts. The converse ("if $Q$ then $P$") and the inverse ("if not $P$ then not $Q$") are different statements that need not be true. Confusing the contrapositive (always valid) with the converse (a separate claim) is the classic logical slip.

E.3 Proof by contradiction

To prove a statement, assume it is false and derive an impossibility (a contradiction). The false assumption must therefore have been wrong, so the statement is true. This is the natural tool for impossibility and uniqueness claims, where there is no constructive object to build directly.

Worked example — a set containing $\mathbf{0}$ is dependent. Claim: any set of vectors $\{\mathbf{v}_1,\dots,\mathbf{v}_k\}$ that includes the zero vector is linearly dependent.

Why we care. It is a clean, frequently used sanity check (Ch. 6): the moment you spot $\mathbf{0}$ in a candidate basis, you can stop — it cannot be independent.

Key idea. Independence forbids any nontrivial combination equal to $\mathbf{0}$, but the zero vector hands you one for free.

Proof. Suppose, for contradiction, that the set is linearly independent even though (say) $\mathbf{v}_1=\mathbf{0}$. Consider the combination $1\cdot\mathbf{v}_1 + 0\cdot\mathbf{v}_2 + \dots + 0\cdot\mathbf{v}_k = \mathbf{0}$, which holds because $\mathbf{v}_1=\mathbf{0}$. This is a combination equal to the zero vector whose coefficients are not all zero (the first is $1$). But linear independence requires that the only such combination has all-zero coefficients — a direct contradiction. Hence the set cannot be independent; it is dependent. $\blacksquare$

What it means. Independence is a strong condition: it rules out every nontrivial relation, so a single witnessed relation destroys it. Notice the shape — we assumed the opposite of the claim and produced an object the assumption forbids.

E.4 Proof by induction

When a statement is indexed by a natural number $n$ — "for every $n$, …" — mathematical induction proves it in two moves: the base case (it holds for the smallest $n$) and the inductive step (if it holds for $n$, it holds for $n+1$). Together these knock down the whole infinite line of dominoes.

Worked example — powers of a diagonalizable matrix. Claim: if $A=PDP^{-1}$ (Ch. 25), then $A^n = PD^nP^{-1}$ for every integer $n\ge 1$.

Why we care. This is the formula that makes diagonalization useful: it turns the expensive computation $A^n$ into raising a diagonal matrix to the $n$th power, which is just raising each diagonal entry to the $n$th power. It powers Markov chains (Ch. 29) and discrete dynamical systems (Ch. 37).

Key idea. Each extra factor of $A$ lets an interior $P^{-1}P$ cancel to the identity, advancing the exponent by one.

Proof. Base case ($n=1$): $A^1 = PD^1P^{-1}$ is the hypothesis itself. Inductive step: assume $A^n = PD^nP^{-1}$ for some $n\ge 1$. Then $$ A^{n+1} = A\cdot A^n = (PDP^{-1})(PD^nP^{-1}) = PD\,(P^{-1}P)\,D^nP^{-1} = PD\,I\,D^nP^{-1} = PD^{n+1}P^{-1}, $$ using associativity and $P^{-1}P=I$. So the formula holds for $n+1$. By induction it holds for all $n\ge 1$. $\blacksquare$

What it means. The interior cancellation $P^{-1}P=I$ is the engine, and induction just turns "it advances by one" into "it holds forever." Geometrically: $A^n$ changes into the eigenbasis once, stretches $n$ times along the eigen-axes, and changes back once — the coordinate changes at the ends never accumulate.

Math-Major Sidebar — Two common variants. Strong induction assumes the claim for all values up to $n$ (not just $n$) when proving $n+1$ — useful when the step reaches back several rows, as in some determinant cofactor-expansion arguments (Ch. 11). And induction need not start at $1$: a claim about $n\times n$ matrices might have its base case at $n=2$. Always name where the base case sits.

E.5 Set and subspace equality by double inclusion

To prove two sets are equal, $S = T$, prove both inclusions: $S\subseteq T$ (every element of $S$ lies in $T$) and $T\subseteq S$. This "double inclusion" is the standard way to prove two subspaces coincide, and it appears throughout Parts III and IV when two descriptions of the same space must be reconciled.

Worked example — the null space of $A^{\mathsf{T}}A$ equals the null space of $A$. Claim: for any real $m\times n$ matrix $A$, $N(A^{\mathsf{T}}A) = N(A)$. (This is the fact that makes the normal equations of least squares solvable, Ch. 19.)

Why we care. Least squares (Ch. 17, 19) leans on $A^{\mathsf{T}}A$ being invertible when $A$ has independent columns. This equality is exactly the link: $A^{\mathsf{T}}A$ collapses nothing that $A$ does not already collapse, so independent columns of $A$ make $A^{\mathsf{T}}A$ invertible.

Key idea. One inclusion is immediate; the other extracts a norm by the trick $A^{\mathsf{T}}A\mathbf{x}=\mathbf{0}\Rightarrow \mathbf{x}^{\mathsf{T}}A^{\mathsf{T}}A\mathbf{x}=0\Rightarrow\lVert A\mathbf{x}\rVert^2=0$.

Proof. ($N(A)\subseteq N(A^{\mathsf{T}}A)$.) If $A\mathbf{x}=\mathbf{0}$, then $A^{\mathsf{T}}A\mathbf{x}=A^{\mathsf{T}}\mathbf{0}=\mathbf{0}$, so $\mathbf{x}\in N(A^{\mathsf{T}}A)$. This inclusion is direct.

($N(A^{\mathsf{T}}A)\subseteq N(A)$.) Suppose $A^{\mathsf{T}}A\mathbf{x}=\mathbf{0}$. Multiply on the left by $\mathbf{x}^{\mathsf{T}}$: $$ \mathbf{x}^{\mathsf{T}}A^{\mathsf{T}}A\mathbf{x} = (A\mathbf{x})^{\mathsf{T}}(A\mathbf{x}) = \lVert A\mathbf{x}\rVert^2 = 0. $$ A norm is zero only for the zero vector (a defining property of the norm, Ch. 18), so $A\mathbf{x}=\mathbf{0}$, i.e. $\mathbf{x}\in N(A)$.

Both inclusions hold, so $N(A^{\mathsf{T}}A)=N(A)$. $\blacksquare$

What it means. Forming $A^{\mathsf{T}}A$ never enlarges the null space — it is the algebraic reason least squares has a unique solution exactly when $A$'s columns are independent. The second inclusion is the workhorse half; the move "$\mathbf{x}^{\mathsf{T}}(\cdot)\mathbf{x}$ turns a matrix equation into a statement about a length" recurs whenever symmetry and positivity are in play (Ch. 28).

E.6 Uniqueness arguments

Many theorems assert that an object is unique — the solution, the inverse, the representation in a basis. The standard pattern: assume two such objects exist, then show they must be equal. (Existence — that there is at least one — is a separate question, usually proved by constructing one.)

Worked example — coordinates in a basis are unique. Claim: if $\{\mathbf{v}_1,\dots,\mathbf{v}_n\}$ is a basis of a vector space $V$, then every $\mathbf{w}\in V$ has exactly one representation $\mathbf{w} = c_1\mathbf{v}_1 + \dots + c_n\mathbf{v}_n$.

Why we care. This uniqueness is what makes "coordinates" meaningful (Ch. 15): a basis assigns to each vector one unambiguous list of numbers, which is the whole foundation of change of basis (Ch. 16) and of representing a vector in numpy at all.

Key idea. Two representations of the same vector subtract to a combination equal to $\mathbf{0}$; independence forces every coefficient of that difference to vanish.

Proof. Existence is the spanning property of a basis, so suppose $\mathbf{w}$ has two representations, $$ \mathbf{w} = c_1\mathbf{v}_1 + \dots + c_n\mathbf{v}_n = d_1\mathbf{v}_1 + \dots + d_n\mathbf{v}_n. $$ Subtracting, $(c_1-d_1)\mathbf{v}_1 + \dots + (c_n-d_n)\mathbf{v}_n = \mathbf{0}$. Since the basis vectors are linearly independent, the only such combination has all coefficients zero: $c_i - d_i = 0$ for every $i$, that is $c_i = d_i$. The two representations are identical, so the representation is unique. $\blacksquare$

What it means. Independence is precisely "no vector has two names." Spanning guarantees at least one set of coordinates; independence guarantees at most one; a basis, being both, guarantees exactly one. That is the deep reason the definition of a basis pairs those two words (Appendix B.2).

E.7 Putting it together — "do these vectors form a basis?"

A single concrete question often calls on several techniques at once. Suppose you must decide whether $\{(1,1),\,(1,-1)\}$ is a basis of $\mathbb{R}^2$ — and prove your answer, not just assert it.

Strategy. "Basis" unfolds (definition!) into two requirements: independent and spanning. In $\mathbb{R}^n$ there is a shortcut worth knowing — any $n$ independent vectors in an $n$-dimensional space automatically span it (a dimension-counting theorem from Ch. 15) — so for two vectors in $\mathbb{R}^2$ it suffices to prove independence.

Proof. Suppose $c_1(1,1) + c_2(1,-1) = (0,0)$. Reading components gives the system $c_1 + c_2 = 0$ and $c_1 - c_2 = 0$. Adding the equations: $2c_1 = 0$, so $c_1 = 0$; back-substituting, $c_2 = 0$. The only combination equal to $\mathbf{0}$ is the trivial one, so the two vectors are independent. Being two independent vectors in the two-dimensional space $\mathbb{R}^2$, they form a basis. $\blacksquare$

Verification (the book's third habit). Independence of these columns is equivalent to $\det\begin{bmatrix}1&1\\1&-1\end{bmatrix} = (1)(-1)-(1)(1) = -2 \ne 0$ — nonzero determinant, so the matrix is invertible, so its columns are a basis (the Invertible Matrix Theorem, Appendix B.5). In numpy, np.linalg.det(np.array([[1.,1.],[1.,-1.]])) returns -2.0 and np.linalg.matrix_rank(...) returns 2, confirming the proof — but remember the distinction Appendix A drew: the computation is evidence; the argument above is the proof. Code builds confidence and catches slips; only the derivation delivers certainty.

The Key Insight — Nearly every proof in this book is one of six moves — direct, contrapositive, contradiction, induction, double inclusion, uniqueness — and nearly every one opens by unfolding a definition into the literal statement it abbreviates. When a proof stalls, the cause is almost always an unused hypothesis or a word left folded. Name what you may assume, name what you must reach, replace each term by its definition, and the road usually appears.