Appendix A — Prerequisite Math Review
Who this is for. Linear algebra does not assume much, but it assumes that what it does assume is fluent. This appendix is the short list of things this book quietly takes for granted, with just enough review to knock off the rust and a pointer to the chapter where each one first earns its keep. If you can read these pages nodding along, you are ready. If a section makes you pause, spend twenty minutes here now — it will save you hours later, because a shaky prerequisite does not announce itself; it just makes a later chapter feel mysteriously hard.
You do not need calculus to start this book. (A handful of optional Math-Major Sidebars and the matrix-exponential chapter touch derivatives, and we flag them.) You do not need to have seen vectors or matrices before — that is what Chapters 1 and 2 are for. What you need is comfort with high-school algebra, a picture of the coordinate plane, the bare idea of a complex number, and a willingness to read an argument and ask "is each step true?" We take those one at a time.
A.1 Basic algebra: the moves you will make a thousand times
Linear algebra is, at bottom, the algebra of linear expressions — sums of variables each multiplied by a constant, like $3x_1 + 2x_2 - x_3$. You will rearrange these constantly, so the elementary moves must be automatic.
- Distributing and collecting. $a(x + y) = ax + ay$, and $ax + bx = (a+b)x$. When we factor a common matrix out of a sum, $A\mathbf{u} + A\mathbf{v} = A(\mathbf{u}+\mathbf{v})$, it is exactly this rule with a matrix in the role of $a$. That single line is the definition of linearity (Chapter 1).
- Solving a linear equation in one unknown. From $ax + b = c$ you isolate $x = (c - b)/a$, valid precisely when $a \ne 0$. The whole machinery of Chapter 4 (Gaussian elimination) is this move, performed in a disciplined order on many equations at once. The caveat "valid when $a \ne 0$" is not pedantry: the case $a = 0$ is exactly where a system has no solution or infinitely many, the trichotomy of Chapter 3.
- The quadratic formula. The roots of $ax^2 + bx + c = 0$ are $$ x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}, \qquad a \ne 0. $$ You will use this verbatim in Chapter 24 to find the two eigenvalues of a $2\times 2$ matrix from its characteristic polynomial $\lambda^2 - \operatorname{tr}(A)\,\lambda + \det(A) = 0$. The quantity under the root, the discriminant $b^2 - 4ac$, decides everything: positive gives two real roots, zero gives a repeated root, negative gives a complex-conjugate pair. In Chapter 26 those three cases become two real eigendirections, a defective repeated direction, and a rotation in disguise — same arithmetic, geometric payoff.
- Functions and function notation. A function $f$ takes an input and returns an output; we write $f(x)$. The book's central metaphor — a matrix is a function that transforms space, written $T(\mathbf{x}) = A\mathbf{x}$ — leans entirely on your being comfortable that $f(\text{input}) = \text{output}$, and that composing functions ("do $f$, then $g$") is itself a function, $g(f(x))$. Matrix multiplication is composition (Chapter 8); if function composition feels natural, matrix multiplication will too.
Common Pitfall — $\sqrt{a^2} = |a|$, not $a$. The square root symbol returns the nonnegative root. This bites the moment lengths appear: a norm $\lVert \mathbf{v}\rVert = \sqrt{v_1^2 + \cdots + v_n^2}$ is always $\ge 0$ by construction (Chapter 18). A negative "length" is always an arithmetic slip.
A.2 Coordinate and analytic geometry
This is a geometry-first book, so the coordinate plane is home base. You should be able to flip between an equation and the picture it draws without friction.
Points and the plane. A point in the plane is an ordered pair $(x, y)$; in space, a triple $(x, y, z)$. In Chapter 2 the very same pair becomes a vector $\begin{bmatrix} x \\ y \end{bmatrix}$ — the arrow from the origin to that point. Holding "point" and "arrow" as two readings of the same two numbers is the threshold idea of Chapter 2, so it pays to be at ease with coordinates first.
Lines. The line $y = mx + b$ has slope $m$ and intercept $b$. But linear algebra prefers the symmetric form $a_1 x + a_2 y = c$, because that is what one row of a linear system looks like. A system of two such equations is two lines, and Chapter 3 reads its solution geometrically: the lines cross at one point (one solution), are parallel (no solution), or are the same line (infinitely many). Three equations in three unknowns become three planes, and the same three outcomes return as planes meeting in a point, in a line, or not at all.
The Pythagorean theorem and distance. In the right triangle with legs $a, b$ and hypotenuse $c$, $a^2 + b^2 = c^2$. From it, the distance between $(x_1, y_1)$ and $(x_2, y_2)$ is $$ d = \sqrt{(x_2 - x_1)^2 + (y_2 - y_1)^2}. $$ This is the seed of the norm and of all of Part IV (Orthogonality). The length of a vector (Chapter 2), the distance between data points, the "closest point" that least squares finds (Chapters 17 and 19) — all are Pythagoras, generalized to $n$ dimensions, where it reads $\lVert \mathbf{v}\rVert = \sqrt{v_1^2 + \cdots + v_n^2}$.
Geometric Intuition — Almost every formula in this book is the Pythagorean theorem or the distributive law wearing a costume. When a formula looks unfamiliar, ask "is this measuring a length (Pythagoras) or rearranging a linear combination (distribution)?" Usually it is one of the two.
A.3 Trigonometry: just enough for rotations
You need a little trig, and almost all of it is for one purpose: describing rotations (Chapters 21 and 26) and angles between vectors (Chapter 18).
- Right-triangle ratios. For an angle $\theta$ in a right triangle, $\cos\theta = \text{adjacent}/\text{hypotenuse}$ and $\sin\theta = \text{opposite}/\text{hypotenuse}$. On the unit circle, the point at angle $\theta$ from the positive $x$-axis is exactly $(\cos\theta, \sin\theta)$.
- The values worth memorizing. $\cos 0 = 1,\ \sin 0 = 0$; $\cos 90^\circ = 0,\ \sin 90^\circ = 1$; and the $45^\circ$ pair $\cos 45^\circ = \sin 45^\circ = \tfrac{\sqrt 2}{2} \approx 0.707$. These appear constantly in worked rotation examples.
- The Pythagorean identity. $\cos^2\theta + \sin^2\theta = 1$. Geometrically: the point $(\cos\theta, \sin\theta)$ sits on the unit circle, distance $1$ from the origin. This is why the rotation matrix preserves length, which is why it is an orthogonal matrix (Chapter 21).
- The rotation matrix. The map that rotates the plane counterclockwise by $\theta$ is
$$
R_\theta = \begin{bmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix}.
$$
You will see this derived from scratch in Chapter 21 (it is just "where do the basis arrows $\mathbf{e}_1$ and $\mathbf{e}_2$ land?"), so you do not need to memorize it now — but recognizing the sines and cosines when they appear will let you read the geometry at a glance. Radians versus degrees matters in code: numpy's
np.cos,np.sinexpect radians ($180^\circ = \pi$), a gotcha we revisit in Appendix C.
Real-World Application — Every time a game engine turns your character, a phone rotates a photo to portrait, or a robot arm swings to a new heading, a rotation matrix like $R_\theta$ is doing the work (Chapter 12, Computer Graphics). The trig is small; its reach is enormous.
A.4 Complex numbers
Real eigenvalues are the friendly case. But a perfectly innocent real matrix — a rotation, say — can have complex eigenvalues, and Chapter 26 is built around them. You do not need deep complex analysis; you need the four basics below.
- What they are. A complex number is $z = a + bi$, where $a$ is the real part, $b$ is the imaginary part, and $i$ is defined by $i^2 = -1$. The reals sit inside the complex numbers as the case $b = 0$.
- Arithmetic. Add and subtract part-by-part: $(a+bi) + (c+di) = (a+c) + (b+d)i$. Multiply by distributing and using $i^2 = -1$: $(a+bi)(c+di) = (ac - bd) + (ad + bc)i$.
- Conjugate and modulus. The conjugate of $z = a + bi$ is $\bar z = a - bi$ (flip the sign of the imaginary part). Its modulus (size) is $|z| = \sqrt{a^2 + b^2}$, and a clean identity ties them together: $z\bar z = a^2 + b^2 = |z|^2$. The conjugate is why the complex inner product needs a conjugate-transpose $A^{*}$ rather than a plain transpose $A^{\mathsf{T}}$ (Chapters 27 and 34): conjugating is exactly what keeps a "length squared" real and nonnegative when the entries are complex.
- Why a real matrix can have complex eigenvalues. Eigenvalues solve the characteristic polynomial (Chapter 24). A real quadratic with negative discriminant has no real roots but always has a complex-conjugate pair — and a rotation matrix $R_\theta$ is the standard example, with eigenvalues $\cos\theta \pm i\sin\theta$. Geometrically there is no real direction the rotation leaves fixed (it turns everything), so the invariant structure can only live in the complex numbers. Chapter 26 turns this into a clean picture: complex eigenvalues are rotation-and-scaling.
The Key Insight — Complex numbers are not a detour into abstraction; they are the language in which "a real matrix that rotates" finally tells you its eigenvalues. When the discriminant goes negative, the algebra is reporting a rotation. Chapter 26 makes that translation precise.
A note for later: the symbols $\mathbb{R}^n$ (real $n$-space) and $\mathbb{C}^n$ (complex $n$-space) name the two settings. Most of the book lives in $\mathbb{R}^n$; complex space appears for eigenvalues (Chapter 26), the spectral theorem's Hermitian case (Chapter 27), and the quantum-mechanics applications (the qubit lives in $\mathbb{C}^2$).
A.5 Summation notation
Linear algebra is full of sums, and writing them with dots ($v_1 + v_2 + \cdots + v_n$) gets unwieldy fast. The compact form uses the Greek capital sigma: $$ \sum_{i=1}^{n} a_i \;=\; a_1 + a_2 + \cdots + a_n. $$ Read it aloud as "the sum, as $i$ runs from $1$ to $n$, of $a_i$." The letter $i$ is a dummy index — it is internal bookkeeping, and renaming it to $j$ or $k$ changes nothing.
Three patterns recur so often they are worth recognizing on sight:
- Dot product (Chapter 18): $\mathbf{u}\cdot\mathbf{v} = \sum_{i=1}^{n} u_i v_i$. Multiply matching components, add them up.
- Matrix–vector product, row $i$: $(A\mathbf{x})_i = \sum_{j=1}^{n} a_{ij}\, x_j$. Each output component is one dot product (a row of $A$ with $\mathbf{x}$).
- Matrix product, entry $(i,j)$ (Chapter 8): $(AB)_{ij} = \sum_{k=1}^{n} a_{ik}\, b_{kj}$. Row $i$ of $A$ dotted with column $j$ of $B$.
Common Pitfall — Sums commute and distribute, but you must keep the index straight. $\sum_i (a_i + b_i) = \sum_i a_i + \sum_i b_i$ is fine; pulling a constant out is fine, $\sum_i c\, a_i = c\sum_i a_i$; but you cannot pull out something that depends on the index $i$. When in doubt, expand the first two and last terms with dots and check.
A double subscript like $a_{ij}$ always means "row $i$, column $j$" in this book (the LOCKED convention; see the style bible's notation table). Mathematics indexes from 1, so the top-left entry is $a_{11}$; numpy indexes from 0, so that same entry is A[0, 0]. We flag this clash every time it bites, and Appendix C makes peace with it once and for all.
A.6 What "proof" means
A handful of chapters — and all the Math-Major Sidebars — prove things. If your last exposure to proof was geometry class, here is the short orientation; Appendix E is the full treatment.
A definition says precisely what a word means (e.g., a set of vectors is linearly independent if the only way to combine them to the zero vector is with all-zero coefficients). A theorem is a claim that is true because it can be derived from definitions and earlier theorems by valid steps. A proof is that derivation, written so a careful reader can check each step. The contract of a proof is simple and strict: every step must follow from something already established, and "it looks true" is never a step.
Three habits will carry you through every proof in this book:
- Know exactly what you may assume (the hypotheses) and exactly what you must reach (the conclusion). Most stuck proofs are really "I forgot to use one of the hypotheses." Theorems in this book always state their conditions — "for an invertible $A$", "for a symmetric real $A$" — and those conditions are the fuel.
- Unfold the definitions. Nine times out of ten, the first real move is to replace a word ("orthogonal", "subspace", "basis") with the literal statement it abbreviates, and then just compute.
- Read each step asking "why is this allowed?" If you can answer — "by the distributive law", "because $A$ is invertible so $A^{-1}$ exists", "by the definition of span" — the step is earned.
Warning
— A worked numerical example is evidence, not a proof. Verifying that two specific matrices satisfy $(AB)^{\mathsf{T}} = B^{\mathsf{T}}A^{\mathsf{T}}$ checks one case; the proof must show it for all conforming $A$ and $B$. This book uses numpy to demonstrate and verify results, never to substitute for the argument that they always hold (style bible §12). Keep the two roles distinct: code builds confidence and catches mistakes; proof delivers certainty.
A.7 A two-minute self-check
If you can do these without notes, start Chapter 1 with confidence. If one trips you, the section above is your warm-up.
- Solve $5x - 3 = 12$ for $x$. (A.1)
- Find the roots of $x^2 - 5x + 6 = 0$. What is the discriminant, and what does its sign tell you? (A.1)
- Write the equation of the line through the points $(0, 1)$ and $(2, 5)$, then rewrite it in the form $a_1 x + a_2 y = c$. (A.2)
- Compute the distance between $(1, 2)$ and $(4, 6)$. (A.2)
- Evaluate $\cos 45^\circ$ and $\sin 90^\circ$, and write the matrix that rotates the plane by $90^\circ$. (A.3)
- Multiply $(2 + 3i)(1 - i)$ and give its real and imaginary parts. Then compute $|2 + 3i|$. (A.4)
- Expand $\sum_{i=1}^{3} (2i - 1)$ and find its value. (A.5)
- In one sentence, explain why checking a formula on a single $2\times 2$ example is not a proof. (A.6)
Answers
1. $x = 3$. 2. $x = 2$ or $x = 3$; discriminant $25 - 24 = 1 > 0$, so two distinct real roots. 3. Slope $= (5-1)/(2-0) = 2$, so $y = 2x + 1$, i.e. $-2x + y = 1$ (equivalently $2x - y = -1$). 4. $\sqrt{(4-1)^2 + (6-2)^2} = \sqrt{9 + 16} = \sqrt{25} = 5$. 5. $\cos 45^\circ = \tfrac{\sqrt2}{2}\approx 0.707$; $\sin 90^\circ = 1$; the $90^\circ$ rotation is $\begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix}$. 6. $(2+3i)(1-i) = 2 - 2i + 3i - 3i^2 = 2 + i + 3 = 5 + i$, so real part $5$, imaginary part $1$; $|2+3i| = \sqrt{4+9} = \sqrt{13}$. 7. $(2\cdot1 - 1) + (2\cdot2 - 1) + (2\cdot3 - 1) = 1 + 3 + 5 = 9$. 8. One example confirms the formula in that single case; a theorem claims it for *every* matrix of the right shape, which only a general argument can establish.Where each prerequisite first pays off
| Prerequisite | First needed | Why |
|---|---|---|
| Distributive law, linear expressions | Ch. 1, 4 | Definition of linearity; Gaussian elimination |
| Quadratic formula, discriminant | Ch. 24, 26 | Eigenvalues of $2\times2$; real vs. complex spectrum |
| Function notation, composition | Ch. 7, 8 | "Matrix is a function"; multiplication as composition |
| Coordinate geometry, lines/planes | Ch. 2, 3 | Vectors as arrows; geometry of solution sets |
| Pythagoras / distance | Ch. 2, 18, 19 | Norm; angles; least-squares "closest point" |
| Trigonometry (sin, cos, identity) | Ch. 21, 26 | Rotation and orthogonal matrices |
| Complex numbers, conjugate, modulus | Ch. 26, 27, 34 | Complex eigenvalues; Hermitian case; the qubit |
| Summation notation | Ch. 8, 18 | Matrix products and dot products in compact form |
| Reading and writing proofs | Ch. 5 onward; all A-track sidebars | Definitions $\to$ theorems $\to$ proofs (see Appendix E) |
You are equipped. Turn to Chapter 1, and let the pictures lead.