Glossary
Every important term in this book, defined once and for good. Each entry gives a crisp definition in the book's geometry-first voice and the chapter where the idea is introduced (look there for the full picture, the proof, and the numpy). Notation follows the locked conventions: vectors are bold lowercase $\mathbf{v}$, matrices italic capital $A$, the transpose is $A^{\mathsf{T}}$, and the four fundamental subspaces are $C(A)$, $N(A)$, $C(A^{\mathsf{T}})$, $N(A^{\mathsf{T}})$.
How to read an entry. A term in bold inside a definition has its own entry elsewhere in the glossary. "(Ch. N)" is where the term is introduced; many terms deepen across several later chapters.
A
Additive inverse. For a vector $\mathbf{v}$, the unique vector $-\mathbf{v}$ with $\mathbf{v} + (-\mathbf{v}) = \mathbf{0}$; one of the vector space axioms. (Ch. 5)
Adjacency / link matrix. A matrix encoding the edges of a graph; for PageRank, a column-stochastic version that splits each page's vote equally among its out-links. (Ch. 29)
Affine transformation. A linear transformation followed by a translation, $\mathbf{x} \mapsto A\mathbf{x} + \mathbf{b}$; not linear (it moves the origin) but made linear in one higher dimension via homogeneous coordinates. (Ch. 12)
Algebraic multiplicity. The number of times an eigenvalue appears as a root of the characteristic polynomial. Always at least the geometric multiplicity. (Ch. 24)
Augmented matrix. The coefficient matrix of a linear system with the right-hand side appended as an extra column, written $[A \mid \mathbf{b}]$; the object you row-reduce. (Ch. 3)
B
Back-substitution. Solving a triangular system from the bottom row upward, once Gaussian elimination has produced an upper-triangular form. (Ch. 4)
Basis. A linearly independent spanning set of a vector space — the smallest set of vectors whose linear combinations reach everything. Its size is the dimension. (Ch. 15)
Bilinearity. The property of the dot product (and any inner product) of being linear in each argument separately. (Ch. 18)
C
Cauchy–Schwarz inequality. $|\mathbf{u}\cdot\mathbf{v}| \le \lVert\mathbf{u}\rVert\,\lVert\mathbf{v}\rVert$, with equality exactly when the vectors are parallel; it is what guarantees the angle formula returns a real number. (Ch. 18)
Change of basis. Re-expressing the same vector or transformation in a different coordinate system; the matrix changes by similarity, the transformation does not. (Ch. 16)
Characteristic polynomial. $\det(A - \lambda I)$, a degree-$n$ polynomial in $\lambda$ whose roots are the eigenvalues of $A$. (Ch. 24)
Cholesky factorization. The factorization $A = LL^{\mathsf{T}}$ of a positive definite matrix into a lower-triangular factor times its transpose. (Ch. 28)
Cofactor expansion. A recursive formula for the determinant that expands along a row or column using signed minors; also called Laplace expansion. (Ch. 11)
Column space $C(A)$. The span of the columns of $A$ — exactly the set of right-hand sides $\mathbf{b}$ for which $A\mathbf{x} = \mathbf{b}$ has a solution. One of the four fundamental subspaces. (Ch. 13)
Column-stochastic matrix. A nonnegative matrix whose columns each sum to $1$; it moves probability around without creating or destroying it, and always has $\lambda = 1$ as an eigenvalue. (Ch. 29)
Complex eigenvalue. An eigenvalue that is not real; for a real matrix these come in complex conjugate pairs and signal a hidden rotation-and-scaling in an invariant plane. (Ch. 26)
Condition number $\kappa(A)$. The ratio $\sigma_1/\sigma_n$ of largest to smallest singular value; it measures how much a matrix amplifies input error, and so how trustworthy a numerical solution to $A\mathbf{x}=\mathbf{b}$ is. (Ch. 30, 38)
Consistent system. A linear system with at least one solution (one or infinitely many); an inconsistent system has none. (Ch. 3)
Conjugate (Hermitian) transpose $A^{*}$. The transpose combined with complex conjugation of every entry; the natural generalization of $A^{\mathsf{T}}$ over $\mathbb{C}$. (Ch. 21, 27)
Covariance matrix. A symmetric positive semidefinite matrix whose entries are the pairwise covariances of data features; its eigenvectors are the principal components. (Ch. 28, 32)
D
Defective matrix. A square matrix with fewer independent eigenvectors than its size — too few to form an eigenbasis, so it is not diagonalizable. (Ch. 24, 25)
Determinant $\det(A)$. The signed factor by which a transformation scales area (2D) or volume (nD); $\det(A) = 0$ exactly when $A$ collapses space and is singular. (Ch. 11)
Diagonalization. Writing $A = PDP^{-1}$ with $D$ diagonal and the columns of $P$ an eigenbasis; it exposes the transformation as pure scaling along the eigen-directions. (Ch. 25)
Dimension $\dim(V)$. The number of vectors in any basis of $V$ — the count of independent directions, or degrees of freedom, the space has. (Ch. 15)
Dot product. $\mathbf{u}\cdot\mathbf{v} = \sum_i u_i v_i = \lVert\mathbf{u}\rVert\lVert\mathbf{v}\rVert\cos\theta$; a single number that encodes both length and the angle between two vectors. (Ch. 18)
E
Eckart–Young theorem. The statement that the best rank-$k$ approximation of a matrix (in Frobenius or operator norm) is its truncated SVD — keep the $k$ largest singular values, discard the rest. (Ch. 31)
Eigenbasis. A basis consisting entirely of eigenvectors of a matrix; it exists exactly when the matrix is diagonalizable. (Ch. 25)
Eigenspace $E_\lambda$. The set of all eigenvectors for a given eigenvalue $\lambda$, together with the zero vector — a subspace, namely $N(A - \lambda I)$. (Ch. 23)
Eigenvalue. The scalar $\lambda$ in $A\mathbf{v} = \lambda\mathbf{v}$: the factor by which the matrix stretches its eigenvector. May be zero, negative, or complex. (Ch. 23)
Eigenvector. A nonzero vector $\mathbf{v}$ whose direction is unchanged by $A$, so that $A\mathbf{v} = \lambda\mathbf{v}$; the invariant direction that reveals what the matrix really does. (Ch. 23)
Elementary row operation. One of three reversible operations — swap two rows, scale a row, add a multiple of one row to another — used to row-reduce a matrix. (Ch. 4)
F
Field. The system of scalars (usually $\mathbb{R}$ or $\mathbb{C}$) over which a vector space is defined; you can add, subtract, multiply, and divide its elements. (Ch. 5)
Four fundamental subspaces. The column space $C(A)$, null space $N(A)$, row space $C(A^{\mathsf{T}})$, and left null space $N(A^{\mathsf{T}})$ — Strang's organizing picture of every matrix. (Ch. 13, 14)
Fourier series. An expansion of a periodic function in the orthogonal basis of sines and cosines; each coefficient is a projection of the function onto a basis frequency. (Ch. 22)
Free variable. A variable in a linear system corresponding to a non-pivot column; it can take any value, and each one adds a dimension to the solution set. (Ch. 4)
Frobenius norm $\lVert A\rVert_F$. The square root of the sum of squares of all entries, equivalently $\sqrt{\sum_i \sigma_i^2}$; the "Euclidean length" of a matrix. (Ch. 30)
G
Gaussian elimination. The systematic use of elementary row operations to reduce a system to triangular form, then solve by back-substitution. (Ch. 4)
Geometric multiplicity. The dimension of the eigenspace $E_\lambda$ — the number of independent eigenvectors for $\lambda$. Never exceeds the algebraic multiplicity. (Ch. 24)
Gram–Schmidt process. A procedure that turns any basis into an orthonormal one by subtracting off projections; its bookkeeping is exactly the QR decomposition. (Ch. 20)
H
Hermitian matrix. A complex matrix equal to its own conjugate transpose, $A = A^{*}$; the complex analogue of a symmetric matrix, with real eigenvalues (the spectral theorem). (Ch. 27)
Hessian matrix. The symmetric matrix of second partial derivatives of a function; its definiteness classifies a critical point as a minimum, maximum, or saddle. (Ch. 28)
Homogeneous coordinates. Representing a point in $\mathbb{R}^n$ by an $(n+1)$-vector so that translations become matrix multiplications; the trick behind the graphics pipeline. (Ch. 12)
Homogeneous system. A linear system $A\mathbf{x} = \mathbf{0}$; its solution set is exactly the null space $N(A)$ and always contains $\mathbf{0}$. (Ch. 3, 13)
I
Identity matrix $I$. The matrix that leaves every vector unchanged, $I\mathbf{x} = \mathbf{x}$; ones on the diagonal, zeros elsewhere. (Ch. 7, 8)
Inconsistent system. A linear system with no solution — geometrically, planes (or the column picture) with no common point. (Ch. 3)
Inner product $\langle\mathbf{u},\mathbf{v}\rangle$. A generalization of the dot product to abstract spaces (including function spaces): a positive-definite, symmetric, bilinear form that defines length and angle. (Ch. 34)
Inverse matrix $A^{-1}$. The transformation that undoes $A$, satisfying $AA^{-1} = A^{-1}A = I$; it exists exactly when $A$ is invertible (nonsingular). (Ch. 9)
Invertible matrix. A square matrix with an inverse — equivalently, full rank, nonzero determinant, trivial null space, and a bijective transformation (the Invertible Matrix Theorem). (Ch. 9)
Isometry. A distance-preserving map; in linear algebra, exactly the orthogonal (real) or unitary (complex) transformations. (Ch. 21)
J
Jordan normal form. The closest a defective matrix can come to being diagonal: a block-diagonal form of Jordan blocks built from generalized eigenvectors. (Ch. 36)
Jordan block. An eigenvalue $\lambda$ on the diagonal with $1$'s on the superdiagonal — a scaling with an unavoidable shear, the signature of a defective eigenvalue. (Ch. 36)
K
Kernel. Another name for the null space of a linear map: the set of inputs sent to $\mathbf{0}$. (Ch. 13, 35)
Krylov subspace. The span of $\mathbf{b}, A\mathbf{b}, A^2\mathbf{b}, \dots$; the playground of modern iterative methods like conjugate gradient. (Ch. 38)
L
Least squares. The method of finding the $\hat{\mathbf{x}}$ minimizing $\lVert A\mathbf{x} - \mathbf{b}\rVert$ when $A\mathbf{x} = \mathbf{b}$ has no exact solution; geometrically, the orthogonal projection of $\mathbf{b}$ onto $C(A)$. (Ch. 17, 19)
Left null space $N(A^{\mathsf{T}})$. The null space of the transpose — the vectors $\mathbf{y}$ with $A^{\mathsf{T}}\mathbf{y} = \mathbf{0}$; the orthogonal complement of the column space. (Ch. 14)
Linear combination. A weighted sum $c_1\mathbf{v}_1 + \cdots + c_k\mathbf{v}_k$; the single most important operation in the subject — everything is built from it. (Ch. 2)
Linear independence. A set of vectors none of which is a linear combination of the others; equivalently, only the trivial combination gives $\mathbf{0}$. (Ch. 6)
Linear transformation. A map $T$ with $T(\mathbf{u}+\mathbf{v}) = T(\mathbf{u})+T(\mathbf{v})$ and $T(c\mathbf{v}) = cT(\mathbf{v})$ — one that respects superposition and fixes the origin. The central object of the book; matrices are how we represent it. (Ch. 1, 7, 35)
LU decomposition. The factorization $A = LU$ into lower- and upper-triangular factors recording Gaussian elimination; with row swaps it becomes $PA = LU$ (PLU). (Ch. 10)
M
Markov chain. A process whose state evolves by repeated multiplication by a column-stochastic matrix; its long-run behavior is the dominant eigenvector. (Ch. 25, 29)
Matrix. A rectangular array of numbers that represents a linear transformation in chosen coordinates; its columns are the images of the standard basis vectors. (Ch. 1, 7)
Matrix exponential $e^{At}$. The matrix series $\sum_k \frac{(At)^k}{k!}$; it solves the linear system of ODEs $\mathbf{x}' = A\mathbf{x}$ exactly. (Ch. 37)
Matrix multiplication. The operation that represents composition of transformations — do one map, then another; this is why it is associative but generally not commutative. (Ch. 8)
Minor. The determinant of the submatrix left after deleting one row and one column; the building block of cofactor expansion. (Ch. 11)
N
Norm $\lVert\mathbf{v}\rVert$. The length of a vector, $\sqrt{\mathbf{v}\cdot\mathbf{v}}$ for the Euclidean ($\ell^2$) norm; more generally any function satisfying positivity, homogeneity, and the triangle inequality. (Ch. 18)
Normal equations. $A^{\mathsf{T}}A\,\hat{\mathbf{x}} = A^{\mathsf{T}}\mathbf{b}$ — the linear system whose solution is the least squares answer. (Ch. 17, 19)
Null space $N(A)$. The set of all $\mathbf{x}$ with $A\mathbf{x} = \mathbf{0}$; the directions the transformation collapses, and one of the four fundamental subspaces. (Ch. 13)
Nullity. The dimension of the null space; by the rank–nullity theorem, $\operatorname{rank}(A) + \text{nullity}(A) = n$. (Ch. 14)
O
Operator (spectral) norm $\lVert A\rVert_2$. The largest factor by which $A$ stretches any unit vector, equal to the top singular value $\sigma_1$. (Ch. 30)
Orthogonal complement. Given a subspace $W$, the set of all vectors orthogonal to every vector in $W$; the row space and null space are complements, as are the column and left-null spaces. (Ch. 14, 19)
Orthogonal matrix. A real square matrix with orthonormal columns, $Q^{\mathsf{T}}Q = I$; it preserves lengths and angles (an isometry) — a rotation or reflection. (Ch. 21)
Orthogonal projection. The closest point in a subspace to a given vector, found by dropping a perpendicular; the error is orthogonal to the subspace. (Ch. 19)
Orthonormal. A set of vectors that are mutually orthogonal and each of unit length — the most convenient kind of basis. (Ch. 18, 20)
P
PageRank. Google's original ranking algorithm: the stationary distribution (dominant eigenvector) of a column-stochastic "random surfer" matrix with a damping factor. (Ch. 29)
Parametric solution. A description of an infinite solution set as a particular solution plus free combinations of null space vectors. (Ch. 4, 13)
Permutation matrix. An identity matrix with rows reordered; it records the row swaps in PLU decomposition and is orthogonal. (Ch. 10)
Perron–Frobenius theorem. The guarantee that a positive (or suitably connected) matrix has a unique largest real eigenvalue with a positive eigenvector — the mathematical foundation of PageRank. (Ch. 29)
Pivot. The first nonzero entry in a row of an echelon form; its column is a pivot column and corresponds to a leading (non-free) variable. (Ch. 4)
Polar decomposition. The factorization $A = QP$ of a square matrix into an orthogonal $Q$ (rotation) times a symmetric positive semidefinite $P$ (stretch) — the matrix analogue of $re^{i\theta}$. (Ch. 30)
Positive definite. A symmetric matrix $A$ with $\mathbf{x}^{\mathsf{T}}A\mathbf{x} > 0$ for all nonzero $\mathbf{x}$ — equivalently, all eigenvalues positive; it defines a true "energy" or distance. (Ch. 28)
Principal component analysis (PCA). Finding the orthogonal directions of greatest variance in data by taking the eigenvectors of the covariance matrix (or the right singular vectors of the centered data). (Ch. 32)
Projection matrix. The matrix $P$ that performs an orthogonal projection; it is symmetric and idempotent ($P^2 = P$). (Ch. 19)
Pseudoinverse $A^{+}$. The SVD-built generalization of the inverse, $A^{+} = V\Sigma^{+}U^{\mathsf{T}}$, that gives the least squares solution even for non-square or rank-deficient $A$. (Ch. 30)
Q
QR decomposition. The factorization $A = QR$ into an orthogonal $Q$ and upper-triangular $R$, produced by Gram–Schmidt; the workhorse of stable least squares and the QR algorithm for eigenvalues. (Ch. 20)
Quadratic form. A function $\mathbf{x}^{\mathsf{T}}A\mathbf{x}$ with $A$ symmetric; its shape (bowl, dome, or saddle) is read off the definiteness of $A$. (Ch. 28)
Qubit. A quantum bit, modeled as a unit vector in $\mathbb{C}^2$; unitary matrices are its gates, and Hermitian matrices its measurements. (Ch. 1, 21, 27, 34)
R
Rank. The number of pivots of a matrix — equivalently the dimension of its column space (and row space); it counts the genuinely independent directions in the transformation. (Ch. 4, 14)
Rank–nullity theorem. For an $m\times n$ matrix, $\operatorname{rank}(A) + \text{nullity}(A) = n$ — the directions either survive (rank) or collapse (nullity). (Ch. 14)
Reduced row echelon form (RREF). The unique fully simplified echelon form: leading $1$'s, with zeros above and below each pivot. (Ch. 4)
Row space $C(A^{\mathsf{T}})$. The span of the rows of $A$; it equals the column space of the transpose and is the orthogonal complement of the null space. (Ch. 14)
S
Scalar. A single number (an element of the field) used to stretch vectors via scalar multiplication. (Ch. 2, 5)
Similarity. The relation $B = P^{-1}AP$ between matrices that represent the same transformation in different bases; similar matrices share eigenvalues, determinant, and trace. (Ch. 16)
Singular matrix. A square matrix that is not invertible: $\det(A) = 0$, it collapses space, and $A\mathbf{x} = \mathbf{b}$ fails to have a unique solution. (Ch. 9, 11)
Singular value. A nonnegative number $\sigma_i$ on the diagonal of $\Sigma$ in the SVD; the amount of stretch along the $i$-th principal axis, equal to $\sqrt{\lambda_i(A^{\mathsf{T}}A)}$. (Ch. 30)
Singular value decomposition (SVD). The factorization $A = U\Sigma V^{\mathsf{T}}$ that every matrix has: rotate ($V^{\mathsf{T}}$), stretch along axes ($\Sigma$), rotate ($U$). The most useful factorization in applied mathematics. (Ch. 30)
Span. The set of all linear combinations of a collection of vectors — the smallest subspace containing them. (Ch. 6)
Spectral theorem. The result that a real symmetric (or complex Hermitian) matrix is orthogonally diagonalizable: $A = Q\Lambda Q^{\mathsf{T}}$ with real eigenvalues and an orthonormal eigenbasis. (Ch. 27)
Stationary distribution. The steady-state probability vector of a Markov chain, the eigenvector for $\lambda = 1$ of a column-stochastic matrix. (Ch. 29)
Subspace. A subset of a vector space that is itself a vector space — closed under addition and scalar multiplication, and containing $\mathbf{0}$. (Ch. 6)
Superposition. The principle that a linear system's response to a sum of inputs is the sum of the responses; the defining feature of linearity. (Ch. 1)
Symmetric matrix. A matrix equal to its own transpose, $A = A^{\mathsf{T}}$; the best-behaved matrices, governed by the spectral theorem. (Ch. 8, 27)
T
Trace $\operatorname{tr}(A)$. The sum of the diagonal entries, equal to the sum of the eigenvalues; invariant under similarity. (Ch. 16, 23)
Transpose $A^{\mathsf{T}}$. The matrix obtained by flipping rows and columns, $a_{ij} \mapsto a_{ji}$; it swaps the column and row spaces. (Ch. 7, 8)
Triangle inequality. $\lVert\mathbf{u}+\mathbf{v}\rVert \le \lVert\mathbf{u}\rVert + \lVert\mathbf{v}\rVert$; a straight path is never longer than a detour, and a defining property of any norm. (Ch. 18)
U
Unit vector. A vector of length $1$; dividing a vector by its norm (normalization) produces one pointing the same way. (Ch. 18)
Unitary matrix. The complex analogue of an orthogonal matrix, $U^{*}U = I$; it preserves the complex inner product and is the form of a quantum gate. (Ch. 21)
V
Vector. Both an arrow with length and direction and a list of numbers (its components); the bridge between the two pictures is the whole game. (Ch. 2)
Vector space. A set closed under vector addition and scalar multiplication satisfying eight axioms; $\mathbb{R}^n$, polynomials, matrices, and functions are all examples. (Ch. 5)
Z
Zero vector $\mathbf{0}$. The unique vector that is the additive identity, $\mathbf{v} + \mathbf{0} = \mathbf{v}$; it lives in every subspace and is the one vector that is never an eigenvector. (Ch. 2, 5)