Appendix B — Formula and Theorem Reference

This appendix is the card you keep next to you while you work. It collects the book's core definitions, its major theorems with their conditions stated precisely, and the standard formulas you will reach for again and again. It is organized by topic, and every entry points back to the chapter where the idea is motivated, derived, and proved — because a formula you can look up is useful, but a formula you understand is yours. When an entry has a condition, the condition is not decoration: the theorem is simply false without it. The first rule of using this card is to read the hypotheses before you use the conclusion.

A note on notation, which is frozen throughout (see Appendix F): vectors are bold lowercase $\mathbf{v}$, matrices are italic capitals $A$, the transpose is $A^{\mathsf{T}}$, the four fundamental subspaces are $C(A), N(A), C(A^{\mathsf{T}}), N(A^{\mathsf{T}})$, the norm uses double bars $\lVert\cdot\rVert$, and an abstract inner product is $\langle\cdot,\cdot\rangle$.

B.1 Vectors, dot products, and norms (Ch. 2, 18)

Quantity	Formula	Notes
Linear combination	$c_1\mathbf{v}_1 + \dots + c_k\mathbf{v}_k$	the single most important operation in the subject (Ch. 2)
Dot product	$\mathbf{u}\cdot\mathbf{v} = \sum_{i=1}^{n} u_i v_i = \mathbf{u}^{\mathsf{T}}\mathbf{v}$	a scalar; $=\lVert\mathbf{u}\rVert\,\lVert\mathbf{v}\rVert\cos\theta$ (Ch. 18)
Euclidean norm	$\lVert\mathbf{v}\rVert = \sqrt{\mathbf{v}\cdot\mathbf{v}} = \sqrt{v_1^2 + \dots + v_n^2}$	length; always $\ge 0$ (Ch. 18)
Unit vector	$\hat{\mathbf{v}} = \mathbf{v}/\lVert\mathbf{v}\rVert$	defined only for $\mathbf{v}\ne\mathbf{0}$
Angle between vectors	$\cos\theta = \dfrac{\mathbf{u}\cdot\mathbf{v}}{\lVert\mathbf{u}\rVert\,\lVert\mathbf{v}\rVert}$	cosine similarity in data science (Ch. 18)
Orthogonality	$\mathbf{u}\perp\mathbf{v} \iff \mathbf{u}\cdot\mathbf{v}=0$	the zero vector is orthogonal to everything

Cauchy–Schwarz inequality (Ch. 18). For all $\mathbf{u},\mathbf{v}$ in an inner product space, $$ |\langle\mathbf{u},\mathbf{v}\rangle| \le \lVert\mathbf{u}\rVert\,\lVert\mathbf{v}\rVert, $$ with equality if and only if $\mathbf{u}$ and $\mathbf{v}$ are linearly dependent (one is a scalar multiple of the other). This is what makes $\cos\theta\in[-1,1]$ well-defined in every dimension, so "angle" makes sense in $\mathbb{R}^{1000}$.

Triangle inequality (Ch. 18). $\lVert\mathbf{u}+\mathbf{v}\rVert \le \lVert\mathbf{u}\rVert + \lVert\mathbf{v}\rVert$, a direct corollary of Cauchy–Schwarz. The shortest path between two points is a straight line.

B.2 Vector spaces, subspaces, and independence (Ch. 5, 6, 15)

A vector space is a set closed under addition and scalar multiplication satisfying the eight axioms of Chapter 5. A subspace $W$ of $V$ is a nonempty subset that is itself a vector space under the same operations.

The subspace test (Ch. 6). A nonempty subset $W\subseteq V$ is a subspace if and only if it is closed under addition and scalar multiplication — equivalently, if $c\mathbf{u}+\mathbf{v}\in W$ for all $\mathbf{u},\mathbf{v}\in W$ and all scalars $c$. (It follows that $\mathbf{0}\in W$; if $\mathbf{0}\notin W$, stop — $W$ is not a subspace.)

Concept	Definition (Ch. 6, 15)
Span	$\operatorname{span}\{\mathbf{v}_1,\dots,\mathbf{v}_k\}$ = the set of all linear combinations
Linear independence	the only solution of $c_1\mathbf{v}_1+\dots+c_k\mathbf{v}_k=\mathbf{0}$ is $c_1=\dots=c_k=0$
Basis	a linearly independent set that spans the space
Dimension	$\dim(V)$ = the number of vectors in any basis (all bases have the same size)

Warning

— A basis must be both independent and spanning; dropping either word breaks it. A spanning set that is dependent has redundancy; an independent set that does not span misses directions. In $\mathbb{R}^n$, any $n$ independent vectors automatically form a basis — but that shortcut needs the dimension to be known first.

B.3 The four fundamental subspaces and Rank–Nullity (Ch. 13, 14)

For an $m\times n$ matrix $A$ (a map $\mathbb{R}^n\to\mathbb{R}^m$):

Subspace	Lives in	Dimension
Column space $C(A)$	$\mathbb{R}^m$	$\operatorname{rank}(A)=r$
Null space $N(A)$	$\mathbb{R}^n$	$n-r$ (the nullity)
Row space $C(A^{\mathsf{T}})$	$\mathbb{R}^n$	$r$
Left null space $N(A^{\mathsf{T}})$	$\mathbb{R}^m$	$m-r$

The row space and the column space always have the same dimension $r$ — that common number is the rank. The row space and null space are orthogonal complements in $\mathbb{R}^n$; the column space and left null space are orthogonal complements in $\mathbb{R}^m$.

Rank–Nullity Theorem (Ch. 14). For any $m\times n$ matrix $A$ (no condition beyond having a fixed shape; equivalently, for any linear map $T:\mathbb{R}^n\to \mathbb{R}^m$), $$ \operatorname{rank}(A) + \dim N(A) = n \qquad(\text{the number of columns}). $$ In words: dimensions in = dimensions that survive + dimensions that collapse to zero. Conservation of dimension. (The abstract form, $\dim\operatorname{im}(T) + \dim\ker(T) = \dim(\text{domain})$, is Ch. 35.)

B.4 Matrix operations and the small standard formulas (Ch. 7–9, 11)

Operation	Formula	Condition / note
Matrix–vector product	$(A\mathbf{x})_i = \sum_{j} a_{ij}x_j$	$A\mathbf{x}$ is a combination of $A$'s columns (Ch. 7)
Matrix product (composition)	$(AB)_{ij} = \sum_{k} a_{ik}b_{kj}$	$A$ is $m\times p$, $B$ is $p\times n$ (Ch. 8)
Transpose of a product	$(AB)^{\mathsf{T}} = B^{\mathsf{T}}A^{\mathsf{T}}$	order reverses (Ch. 8)
Inverse of a product	$(AB)^{-1} = B^{-1}A^{-1}$	both invertible; order reverses (Ch. 9)

The $2\times 2$ determinant and inverse (Ch. 9, 11). For $A=\begin{bmatrix} a & b \\ c & d \end{bmatrix}$, $$ \det(A) = ad - bc, \qquad A^{-1} = \frac{1}{ad-bc}\begin{bmatrix} d & -b \\ -c & a \end{bmatrix}, $$ where the formula for $A^{-1}$ is valid if and only if $\det(A)=ad-bc\ne 0$ (swap the diagonal, negate the off-diagonal, divide by the determinant). Geometrically $|\det(A)|$ is the area-scaling factor of the map; $\det(A)=0$ means the map collapses the plane to a line or a point, which is exactly why it has no inverse.

Properties of the determinant (Ch. 11). $\det(AB)=\det(A)\det(B)$; $\det(A^{\mathsf{T}})=\det(A)$; $\det(A^{-1})=1/\det(A)$ (when $A$ is invertible); and for an $n\times n$ matrix $\det(cA)=c^n\det(A)$. For a triangular matrix, $\det(A)$ is the product of the diagonal entries.

The rotation matrix (Ch. 21). Counterclockwise rotation of the plane by angle $\theta$: $$ R_\theta = \begin{bmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix}, \qquad \det(R_\theta)=1, \quad R_\theta^{-1}=R_\theta^{\mathsf{T}}=R_{-\theta}. $$ It is the prototype of an orthogonal matrix: its columns are orthonormal, it preserves every length and angle, and a real rotation (for $\theta$ not a multiple of $180^\circ$) has the complex eigenvalues $\cos\theta\pm i\sin\theta$ — no real direction is left fixed (Ch. 26).

B.5 The Invertible Matrix Theorem (Ch. 9, 11, 14)

This is the book's grand unification of Part II. Let $A$ be a square $n\times n$ matrix. The following statements are all equivalent — each is true exactly when every other is true (if any one fails, all fail):

$A$ is invertible (there is a matrix $A^{-1}$ with $AA^{-1}=A^{-1}A=I$).
$A\mathbf{x}=\mathbf{b}$ has a unique solution for every $\mathbf{b}\in\mathbb{R}^n$.
$A\mathbf{x}=\mathbf{0}$ has only the trivial solution $\mathbf{x}=\mathbf{0}$.
$\det(A)\ne 0$.
$\operatorname{rank}(A)=n$ (full rank).
The columns of $A$ are linearly independent and form a basis of $\mathbb{R}^n$.
The RREF of $A$ is the identity $I_n$ (it has $n$ pivots).
$N(A)=\{\mathbf{0}\}$ and $C(A)=\mathbb{R}^n$.
$0$ is not an eigenvalue of $A$.
$A^{\mathsf{T}}$ is invertible.

Warning

— Every clause assumes $A$ is square. A non-square matrix is never "invertible" in this sense; the closest analogue is the pseudoinverse built from the SVD (Ch. 30). Do not apply this theorem to a $3\times 2$ matrix.

B.6 Projection and least squares (Ch. 17, 19)

The projection matrix (Ch. 19). Let $A$ be $m\times n$ with linearly independent columns (equivalently, $A^{\mathsf{T}}A$ is invertible). The matrix that orthogonally projects any vector onto the column space $C(A)$ is $$ P = A\,(A^{\mathsf{T}}A)^{-1}A^{\mathsf{T}}. $$ It is symmetric ($P^{\mathsf{T}}=P$) and idempotent ($P^2=P$): projecting a second time changes nothing, because the point is already on the subspace.

The normal equations (Ch. 17, 19). The least-squares solution $\hat{\mathbf{x}}$ — the one minimizing $\lVert A\mathbf{x}-\mathbf{b}\rVert$ — satisfies $$ A^{\mathsf{T}}A\,\hat{\mathbf{x}} = A^{\mathsf{T}}\mathbf{b}. $$ When $A$ has independent columns this has the unique solution $\hat{\mathbf{x}} = (A^{\mathsf{T}}A)^{-1}A^{\mathsf{T}}\mathbf{b}$, and the fitted vector is $A\hat{\mathbf{x}} = P\mathbf{b}$ — the projection of $\mathbf{b}$ onto $C(A)$. The geometry is the whole proof: the error $\mathbf{b}-A\hat{\mathbf{x}}$ must be orthogonal to every column of $A$, which is precisely $A^{\mathsf{T}}(\mathbf{b}-A\hat{\mathbf{x}})=\mathbf{0}$. (For a numerically stabler route that avoids forming $A^{\mathsf{T}}A$, solve via QR; see Ch. 20 and Appendix D.)

B.7 Eigenvalues, diagonalization, and the Spectral Theorem (Ch. 23–27)

Definitions (Ch. 23, 24). A scalar $\lambda$ and a nonzero vector $\mathbf{v}$ form an eigenpair of a square $A$ when $A\mathbf{v}=\lambda\mathbf{v}$. The eigenvalues are the roots of the characteristic polynomial $\det(A-\lambda I)=0$. For a $2\times2$ matrix this is the handy $$ \lambda^2 - \operatorname{tr}(A)\,\lambda + \det(A) = 0, $$ so the eigenvalues sum to the trace and multiply to the determinant. The algebraic multiplicity of $\lambda$ is its multiplicity as a root; the geometric multiplicity is $\dim N(A-\lambda I)$, and geometric $\le$ algebraic always.

Diagonalization (Ch. 25). A square $n\times n$ matrix $A$ is diagonalizable if and only if it has $n$ linearly independent eigenvectors (equivalently, the geometric multiplicity equals the algebraic multiplicity for every eigenvalue). Then $$ A = PDP^{-1}, $$ where $D$ is diagonal with the eigenvalues on its diagonal and the columns of $P$ are the corresponding eigenvectors. Powers are then cheap: $A^k = PD^kP^{-1}$. A matrix that lacks a full set of independent eigenvectors is called defective and is not diagonalizable (its canonical form is the Jordan form of Ch. 36).

Warning

— Not every matrix is diagonalizable. The shear $\begin{bmatrix}1&1\\0&1\end{bmatrix}$ has the single eigenvalue $1$ with only a one-dimensional eigenspace — it is defective. "Has $n$ distinct eigenvalues" is a sufficient condition for diagonalizability, but it is not necessary (the identity has one repeated eigenvalue and is already diagonal).

The Spectral Theorem (Ch. 27). This is the theorem whose conditions matter most, so state them exactly:

Real symmetric case. If $A$ is a real symmetric matrix ($A^{\mathsf{T}}=A$), then $A$ has real eigenvalues and an orthonormal basis of eigenvectors. Equivalently $A = QDQ^{\mathsf{T}}$ with $Q$ orthogonal ($Q^{\mathsf{T}}Q=I$) and $D$ real diagonal.
Complex Hermitian case. If $A$ is Hermitian ($A^{*}=A$, where $A^{*}$ is the conjugate transpose), the same holds over $\mathbb{C}$: real eigenvalues and a unitary diagonalization $A=UDU^{*}$.

Warning

— The Spectral Theorem applies to symmetric (real) or Hermitian (complex) matrices — never "all matrices." A general real matrix can have complex eigenvalues (a rotation) or fail to be diagonalizable at all (a shear). The guarantee of orthonormal eigenvectors and real eigenvalues is precisely what symmetry buys you, and nothing weaker will do. When a result says "let $A$ be symmetric," that word is load-bearing.

B.8 The SVD, low-rank approximation, and PageRank (Ch. 29, 30, 31)

Singular Value Decomposition (Ch. 30). Every $m\times n$ matrix $A$ — rectangular or square, full-rank or rank-deficient, real or complex, with no condition whatsoever — factors as $$ A = U\Sigma V^{\mathsf{T}}, $$ where $U$ ($m\times m$) and $V$ ($n\times n$) are orthogonal, and $\Sigma$ is $m\times n$ "diagonal" with the **singular values** $\sigma_1\ge\sigma_2\ge\dots\ge 0$ on its diagonal. The number of nonzero singular values equals $\operatorname{rank}(A)$. Geometrically the SVD reads (right to left) as rotate, stretch, rotate: $V^{\mathsf{T}}$ reorients, $\Sigma$ scales the axes by the $\sigma_i$, and $U$ reorients again. The singular values are the square roots of the eigenvalues of $A^{\mathsf{T}}A$, which is why they are real and nonnegative — $A^{\mathsf{T}}A$ is symmetric positive semidefinite, so the Spectral Theorem of §B.7 applies to it.

Eckart–Young theorem (Ch. 31). Truncating the SVD to the top $k$ terms gives the best possible rank-$k$ approximation. Precisely, among all matrices $B$ of rank at most $k$, the truncated SVD $$ A_k = \sum_{i=1}^{k} \sigma_i\,\mathbf{u}_i\mathbf{v}_i^{\mathsf{T}} $$ minimizes both $\lVert A-B\rVert_2$ (spectral norm) and $\lVert A-B\rVert_F$ (Frobenius norm), with the errors $\lVert A-A_k\rVert_2=\sigma_{k+1}$ and $\lVert A-A_k\rVert_F=\sqrt{\sigma_{k+1}^2+\dots+\sigma_r^2}$. This single theorem underwrites image compression, PCA, denoising, and latent-semantic analysis — keep the big singular values, discard the small ones. (The Eckart–Young–Mirsky attribution: the Frobenius-norm case is Eckart and Young, 1936; the extension to all unitarily invariant norms is due to Mirsky [verify].)

Positive definiteness (Ch. 28). A symmetric matrix $A$ is positive definite if $\mathbf{x}^{\mathsf{T}}A\mathbf{x}>0$ for all $\mathbf{x}\ne\mathbf{0}$, equivalently all eigenvalues are positive, equivalently it has a Cholesky factorization $A=LL^{\mathsf{T}}$ with $L$ lower-triangular and positive diagonal. Replace "$>0$" with "$\ge 0$" and "positive" with "nonnegative" for positive semidefinite.

Perron–Frobenius theorem (Ch. 29). For PageRank we need to know the steady state exists and is unique. The relevant statement: a square matrix with strictly positive entries has a real, positive, simple dominant eigenvalue $\lambda_1$ (strictly larger in magnitude than every other eigenvalue) whose eigenvector can be chosen to have all-positive entries. For a column-stochastic matrix (nonnegative columns summing to $1$) the dominant eigenvalue is exactly $\lambda_1=1$; the corresponding positive eigenvector, normalized, is the stationary distribution. PageRank's damping (the random-teleport term, typically $0.85$) is what forces the strictly-positive hypothesis to hold, guaranteeing a unique ranking — and power iteration (repeatedly multiplying by the matrix) converges to that dominant eigenvector at a rate set by the ratio $|\lambda_2|/|\lambda_1|$.

B.9 One-line theorem index

Theorem	Condition	Conclusion	Ch.
Rank–Nullity	any $m\times n$ matrix	$\operatorname{rank}+\text{nullity}=n$	14
Cauchy–Schwarz	any inner product space	$\lvert\langle\mathbf{u},\mathbf{v}\rangle\rvert\le\lVert\mathbf{u}\rVert\lVert\mathbf{v}\rVert$	18
Invertible Matrix Thm	square $A$	10 equivalent conditions for invertibility	9
Diagonalization	$n$ independent eigenvectors	$A=PDP^{-1}$	25
Spectral Theorem	symmetric (real) / Hermitian (complex)	real eigenvalues, orthonormal eigenvectors	27
SVD existence	every matrix (no condition)	$A=U\Sigma V^{\mathsf{T}}$	30
Eckart–Young	best rank-$k$ approximation	truncated SVD minimizes the error	31
Perron–Frobenius	positive (or stochastic) matrix	unique positive dominant eigenvector	29

If you remember nothing else from this card, remember the conditions in bold. A theorem applied outside its hypotheses is not a shortcut — it is a wrong answer waiting to happen.