45 min read

> Learning paths. Math majors — read everything, but treat this chapter as a chance to see how the proofs you worked through become guarantees in running code; the validation harnesses are where theory earns its keep. CS / Data Science — this is...

Prerequisites

  • chapter-30-singular-value-decomposition
  • chapter-32-principal-component-analysis

Learning Objectives

  • Assemble the from-scratch toolkit/ as one coherent library and trace how each module depends on the ones before it.
  • Choose one of five applied linear algebra project tracks and design it end-to-end at the level of modules, data flow, and validation.
  • Identify exactly which toolkit modules and which theorems each capstone track relies on.
  • Validate a from-scratch result against numpy and interpret any discrepancy through conditioning and stability (Chapter 38).
  • Present a linear algebra project so that the geometry, the algebra, and the computation are all visible and defensible.
  • Connect a finished project back to the book's six recurring themes.

Capstone: The Linear Algebra of Your Chosen Application — Build, Analyze, Present

Learning paths. Math majors — read everything, but treat this chapter as a chance to see how the proofs you worked through become guarantees in running code; the validation harnesses are where theory earns its keep. CS / Data Science — this is your chapter. Pick a track, follow the design, wire your toolkit modules together, and ship something that runs. Physics / Engineering — the 3D rendering and PCA tracks are closest to home, but read all five: the same factorization keeps reappearing, which is the whole point.

Thirty-eight chapters ago you met a single arrow in the plane. Then you watched a matrix bend that arrow, learned to read the four subspaces of any map, dropped a perpendicular to find the closest point, hunted down the directions a matrix refuses to rotate, and finally factored every matrix into a rotation, a stretch, and another rotation. Along the way you did something quieter and, in the end, more important: you built the mathematics yourself. Every chapter left a pure-Python function in toolkit/ — no numpy in the implementation, numpy only to check your answer — and now those functions are going to do real work.

This is the capstone. It has two halves. First we assemble the toolkit: we walk the whole toolkit/ directory in dependency order and watch the pieces lock together, so you can see that the library you built is not a pile of exercises but a coherent stack where each layer rests on the one below it. Then we open five project tracks — SVD image compression, a matrix-factorization recommender, a PageRank simulation, PCA on a real dataset, and a 3D rendering pipeline — and design each one end-to-end: the linear algebra it uses, the toolkit modules it calls, how to validate the result, and how to present what you found. You will choose one track (or, if you are ambitious, more) and carry it from blank file to a runnable demo with a results section you would be proud to show.

If you want a bank of linear algebra project ideas to choose from, this chapter is exactly that — five complete, vetted designs, each a genuine applied linear algebra project rather than a toy. But the deeper goal is the threshold idea of the whole book, stated one last time: every applied system here is a short composition of operations you already implemented by hand. The application is just a choice of which matrix, and what the rows and columns mean. Once you internalize that, linear algebra stops being a course and becomes a lens.

39.0.1 FAQ: How does the capstone close the arcs the book has been building?

Four threads have run through this book from the first chapter, and the capstone is where each one pays off in code you wrote. The 2D transformation visualizer that opened Chapter 1 — the unit square bending under a 2×2 matrix — is the literal ancestor of Track E's rendering pipeline, where the same idea (a matrix moves a shape) scales up to a cube in 3D and a perspective projection onto your screen. PageRank, seeded informally back in Chapter 3 and crowned in Chapter 29, becomes a thing you run in Track C; the dominant eigenvector you proved exists is now a ranking you compute. SVD image compression, teased in Chapter 30 and delivered in Chapter 31, is Track A, the book's signature "wow." And the quantum qubit you have followed since Chapters 1 and 5 lives in the same orthogonal-matrix, inner-product-space machinery that Track E's rotations and Track B's dot products use — a reminder that the lens you ground here points at physics too. The capstone is not new material; it is the moment all four arcs land at once.

That convergence is also the bridge from this book to the next. The five tracks here are scoped to be finished in a sitting, but each one opens onto a whole field. If a track lights you up, the natural next step is a portfolio of larger efforts — the kind catalogued in these data science projects, where the same SVD, PCA, and matrix-factorization ideas drive real pipelines at scale. And if you took Track B (the recommender) or Track D (PCA) and found yourself wondering how these matrices end up inside a neural network, the short answer — every layer is a matrix multiply followed by a nonlinearity — is unpacked in an accessible treatment of how models work. Linear algebra is the floor those buildings stand on; the capstone is where you first feel the floor hold your weight.

39.1 What does it mean to "assemble" the toolkit?

Up to now you have used your toolkit modules one at a time, usually in the chapter that introduced them. Assembling the toolkit means treating toolkit/ as a single library and importing across modules freely — projection.py already calls matmul from matrices.py and gaussian_elimination from linear_systems.py, and your capstone demo will reach across all of them.

Geometric Intuition — Picture the toolkit as a tower. At the foundation is a vector: a list of numbers you can add and scale. One floor up, a matrix is a function that moves vectors around — and matrix multiplication is just composing those functions. Above that sit the things you do to a matrix: solve Ax = b, invert it, factor it, measure its volume, find its invariant directions, factor it into rotate–stretch–rotate. The capstone is the roof: a single application that draws on every floor at once. You cannot build the roof without the floors, which is why we built them first.

The dependency order is not arbitrary; it is the order the book introduced them, and it is the order in which each one needs the ones below.

The Key Insight — A linear algebra library is a dependency graph, not a list. svd_from_scratch needs an eigensolver; the eigensolver needs matrix multiplication; matrix multiplication is composition of the vector operations at the very bottom. Read the graph from the bottom up and the whole subject reassembles itself in front of you.

39.1.1 FAQ: In what order do the toolkit modules depend on each other?

Here is the assembly tour, bottom to top. For each module we name what it gives you and which chapter built it; the arrow reads "is used by."

toolkit/vectors.py (Chapter 2, extended in Chapter 18). The floor. add, scale, dot, norm, angle, cosine_similarity. Two operations — addition and scalar multiplication — define a vector space; everything above is built from them. The dot product is the quiet hero: length is sqrt(dot(v, v)), the angle between vectors comes from the dot product, and cosine similarity is the workhorse of every recommender and search engine.

toolkit/matrices.py (Chapters 7–8). apply (matrix times vector, as a weighted sum of the columns), transpose, add, scale, identity, and matmul (matrix times matrix, as the composition "do B, then A"). This is the single most-reused module: matmul and transpose appear in projection, in QR, in PCA, everywhere. Notice the design choice you made back in Chapter 7 — apply loops over columns first, so the geometry (a result built from the columns of A) is literally what the code does.

toolkit/linear_systems.py (Chapter 4). row_reduce (to RREF), gaussian_elimination, back_substitute. The backbone. Every "solve" downstream routes through here — including least squares and the projection matrix.

toolkit/inverse.py (Chapter 9). inverse via Gauss–Jordan on [A | I]. Built directly on row reduction. (You learned early — Chapter 10 — to avoid explicit inverses in production code; you build one anyway because understanding the inverse is non-negotiable, and because the Gauss–Jordan construction is the proof that an inverse exists exactly when there is a pivot in every column.)

toolkit/lu.py (Chapter 10). lu_decompose, plu, solve_lu. Factorization A = LU (or PA = LU with pivoting) so you can solve Ax = b for many right-hand sides without re-eliminating each time. This is your first encounter with the recurring idea that the right factorization is the whole game.

toolkit/determinant.py (Chapter 11). det_cofactor (the recursive definition) and det_via_lu (the efficient route: the determinant is the product of the pivots, up to the sign of the permutation). The determinant is the signed volume-scaling factor of the transformation — and it is exactly zero when the matrix collapses space, i.e. when it is singular.

toolkit/projection.py (Chapters 17, 19). normal_equations, least_squares, project_onto, projection_matrix. This module reaches down — it imports matmul/transpose and gaussian_elimination — and that is the first place you really feel the stack working as one thing. Least squares solves Aᵀ A x = Aᵀ b; projection drops the perpendicular onto the column space C(A).

toolkit/gram_schmidt.py (Chapter 20). gram_schmidt, qr_decompose. Turns any independent set into an orthonormal one, and factors A = QR. This is the numerically stable cure for the near-dependent columns that make the normal equations of projection.py fragile (Chapter 38).

toolkit/eigen.py (Chapters 23, 24, 29). trace, det_2x2, eig_2x2 (the 2×2 characteristic quadratic), power_iteration (the dominant eigenpair by repeated multiplication), and qr_algorithm (the full spectrum by iteration). Crucial honesty, baked into the module's own docstring: there is no general formula for eigenvalues of a 5×5 or larger matrix (Abel–Ruffini), so from-scratch eigenfinding must be iterative beyond 2×2. This is not a limitation of your code; it is a theorem about polynomials.

toolkit/svd.py (Chapter 30). svd_from_scratch, computed via the eigendecomposition of Aᵀ A: the eigenvalues of Aᵀ A are the squares of the singular values, and its eigenvectors are the right singular vectors V. So the SVD sits directly on top of eigen.py, which sits on matrices.py, which sits on vectors.py. The tower is complete.

toolkit/pca.py (Chapter 32). pca: center the data, form the covariance matrix, and eigendecompose it (or, equivalently and more stably, take the SVD of the centered data). PCA is the SVD wearing a statistician's hat.

toolkit/capstone/ (this chapter). One runnable demo per track: image_compression, recommender, pagerank, pca_demo, render3d. The capstone demos are the only modules allowed to use numpy directly — their job is to integrate at a useful scale, and in the spirit of the project you can swap in your from-scratch svd_from_scratch or pca and watch the same numbers come out.

39.1.2 FAQ: Can I see the stack actually compose, end to end?

Yes — and watching one computation fall through every floor is the fastest way to feel that the toolkit is one object. Take the projection example you have met since Chapter 19: project b = (6, 0, 0) onto the column space of the tall matrix A with columns (1,1,1) and (0,1,2). When you call project_onto(b, A), here is the cascade of toolkit calls it triggers, top to bottom:

# One high-level call -> a cascade down the whole tower.  (Chapters 7, 8, 4, 17, 19)
from toolkit.projection import project_onto
print(project_onto([6, 0, 0], [[1, 0], [1, 1], [1, 2]]))   # -> [5.0, 2.0, -1.0]
#   project_onto      (Ch 19) calls
#     least_squares   (Ch 17) calls
#       normal_equations -> matmul(transpose(A), A)  and  matmul(transpose(A), b-as-column)
#         transpose, matmul  (Ch 7-8) call apply, which is the weighted sum of columns (Ch 7)
#       gaussian_elimination (Ch 4) -> forward elim + back_substitute on the 2x2 normal system
#     matmul(A, x_hat)  rebuilds p = A x-hat back in R^3

That single line reaches through five chapters. least_squares forms the normal equations Aᵀ A x = Aᵀ b (Chapter 17), which needs transpose and matmul (Chapters 7–8), which are built from apply, the weighted-sum-of-columns matrix–vector product (Chapter 7). The 2×2 system is then solved by gaussian_elimination (Chapter 4). The answer, x̂ = [5, −3], is fed back through matmul(A, x̂) to land the projection p = (5, 2, −1) in ℝ³. Numpy confirms it: A @ np.linalg.inv(A.T @ A) @ A.T @ b is [5., 2., -1.]. Five floors, one call, one verified answer. Your capstone demo does the same thing on a bigger problem — it is high-level orchestration resting on the floors you built.

The Key Insight — You never "use linear algebra"; you compose a handful of primitives. The whole subject is add, scale, and dot at the bottom, apply and matmul (composition) one floor up, and everything else — solve, invert, factor, project, diagonalize, decompose — assembled from those. Internalize the cascade above and you can read any numerical library, and write your own.

Common Pitfall — Many readers assume the capstone re-implements everything from scratch a second time. It does not. The demos integrate — they import your toolkit and orchestrate it. Re-teaching is not the job here; composition is. If you find yourself rewriting matmul inside a capstone demo, stop: import it.

Real-World Application — This layered structure is exactly how production numerical libraries are organized. LAPACK (the Fortran library under numpy, MATLAB, and R) is built on BLAS, which provides the vector and matrix-multiply primitives at the bottom; the high-level routines (svd, eig, solve) call down through the same dependency graph you just walked. When you import numpy, you are standing on a tower with the same floor plan as yours — only the basement is hand-tuned assembly and yours is readable Python.

39.1.3 FAQ: How do I know my assembled toolkit is correct before I build on it?

Run every module's __main__ self-check, then run the verification tests. Each toolkit file ends with a if __name__ == "__main__": block that reproduces the chapter's worked example, and toolkit/tests/ checks the from-scratch result against numpy/scipy. A correct foundation is the precondition for trusting anything the capstone reports. Here is the one-liner you run before you start building:

# Smoke-test the whole stack: each module's self-check + the numpy verification.
# (Run from the repository root so "toolkit" is importable.)
import toolkit.vectors, toolkit.matrices, toolkit.linear_systems
import toolkit.projection, toolkit.eigen          # add inverse, lu, determinant, svd, pca as built
# Each module's docstring examples are runnable; the tests/ dir asserts they match numpy.
print("toolkit imports cleanly — foundation ready")

The principle here is one of the book's six themes: computation validates theory. Your hand-built least_squares([[1,0],[1,1],[1,2]], [6,0,0]) returns [5.0, -3.0]; numpy's np.linalg.lstsq returns the same [5., -3.]. When the two agree, you have evidence — not proof, but strong evidence — that both your code and your understanding are right. When they disagree, Chapter 38 tells you where to look: conditioning, the order of operations, catastrophic cancellation. Disagreement is information.

39.2 How should I choose a capstone track?

Five tracks, one common shape. Each one is: pick a matrix, decide what its rows and columns mean, apply one factorization or iteration, and read the answer back out. Choose by what you want to feel at the end.

Track The "wow" Core LA Headline toolkit modules Field
A. SVD image compression a blurry rank-10 image sharpening to indistinguishable at rank 200 low-rank approximation, Eckart–Young svd, matrices data science
B. Recommender (matrix factorization) predicting a rating no one ever entered low-rank factorization, dot products matrices, vectors, (svd) machine learning
C. PageRank a ranking that emerges from a graph with no scores at all dominant eigenvector, power iteration eigen, matrices CS / web
D. PCA on a real dataset 3 correlated features collapsing to 1 meaningful axis covariance eigenvectors, variance pca, svd, eigen statistics / data science
E. 3D rendering pipeline a cube spinning on screen, every frame a matrix product homogeneous coordinates, composition matrices graphics / games

Geometric Intuition — Squint and all five are the same picture: a matrix transforms a space, and you read meaning off the transformed space. SVD and PCA find the directions that matter most (the long axes of the data cloud). PageRank finds the one direction the transformation leaves fixed (its dominant eigenvector). The recommender finds a low-dimensional space in which users and items live close to the ratings they produce. Rendering composes transformations to move a 3D world onto a 2D screen. Same lens, five focal lengths.

Each track below follows the same four-part design: (1) the linear algebra, (2) the toolkit modules and data flow, (3) key code, (4) validation and presentation. All numbers shown match the accompanying numpy. Read the track you chose closely; skim the others, because the family resemblance is the lesson — the fourth recurring theme, that linear algebra is the most applied branch of pure mathematics, made concrete.

39.3 Track A — How do I compress an image with the SVD?

(1) The linear algebra. A grayscale image is a matrix A: each entry is a pixel's brightness. The SVD writes A = U Σ Vᵀ, and truncating it to the top k terms,

$$A_k = \sum_{i=1}^{k} \sigma_i \, \mathbf{u}_i \mathbf{v}_i^{\mathsf{T}},$$

gives — by the Eckart–Young theorem (Chapter 31) — the best possible rank-k approximation of A in the Frobenius (and spectral) norm. Each rank-one term σᵢ uᵢ vᵢᵀ is one "layer" of the image, ordered by importance: the first few layers carry the broad shapes, later ones add fine detail. Because real photographs have rapidly decaying singular values, a handful of layers reconstructs most of what your eye cares about.

The compression is exact in its bookkeeping: storing A_k costs k(m + n + 1) numbers (the k columns of U, the k rows of Vᵀ, and the k singular values) instead of mn. And the error is not a mystery — it is exactly the energy in the singular values you threw away:

$$\frac{\lVert A - A_k\rVert_F}{\lVert A\rVert_F} = \frac{\sqrt{\sum_{i>k}\sigma_i^2}}{\sqrt{\sum_{i}\sigma_i^2}}.$$

The Key Insight — Compression with the SVD is honest: every bit of fidelity you lose is accounted for by a singular value you chose to discard. There is no hand-waving. You can compute the error before you even reconstruct the image, just from the spectrum.

(2) Toolkit modules and data flow. The image is loaded as a matrix → svd_from_scratch (your toolkit/svd.py) or numpy's np.linalg.svd factors it → rank_k_approximation rebuilds A_k for each chosen k → you report storage, ratio, error, and energy. The reconstruction A_k = (U_k Σ_k) V_kᵀ is two matrix multiplications — matmul country.

(3) Key code. The heart of toolkit/capstone/image_compression.py (which you already have) is two short functions. Reconstruction:

# Rebuild A_k = U_k Σ_k V_k^T = sum_{i=1}^{k} σ_i u_i v_i^T  (Chapter 31).
def rank_k_approximation(U, s, Vt, k):
    return (U[:, :k] * s[:k]) @ Vt[:k]    # broadcasting scales the k kept columns by σ_i

And the per-rank report, which computes the storage and the two Eckart–Young invariants it then asserts:

# For one rank k: storage, compression ratio, relative error, energy captured.
m, n = A.shape
U, s, Vt = np.linalg.svd(A, full_matrices=False)   # or your svd_from_scratch
A_k       = rank_k_approximation(U, s, Vt, k)
rel_error = np.linalg.norm(A - A_k, "fro") / np.linalg.norm(A, "fro")
ey_tail   = np.sqrt(np.sum(s[k:]**2)) / np.linalg.norm(A, "fro")  # discarded energy
assert np.isclose(rel_error, ey_tail)              # the two MUST agree, exactly
storage   = k * (m + n + 1)

(4) Validation and presentation. Run the demo. On the built-in 512×512 synthetic image (a deterministic power-law spectrum, so the numbers are reproducible) you get exactly:

rank k |   storage |    ratio |  % orig |  rel err |  energy
     5 |      5125 |   51.15x |    2.0% |   0.3304 |  0.8908
    10 |     10250 |   25.58x |    3.9% |   0.2382 |  0.9433
    50 |     51250 |    5.12x |   19.6% |   0.1042 |  0.9891
   200 |    205000 |    1.28x |   78.2% |   0.0430 |  0.9982

Two things to validate, both already asserted inside the demo: (i) the relative error decreases monotonically as k grows (more layers, closer image), and (ii) at every k the measured rel_error equals the Eckart–Young tail sqrt(Σ_{i>k} σᵢ²)/‖A‖_F to floating-point precision. If either assertion fails, your reconstruction is wrong — not your theory. To present: show the original and the rank-5, rank-50, rank-200 reconstructions side by side (the visual "wow"), plot the singular-value spectrum on a log scale to motivate why truncation works, and plot relative error versus k to make the accuracy-vs-storage trade-off explicit. The story writes itself: at rank 10 you store 3.9% of the data and capture 94.3% of the energy.

Check Your Understanding — At rank 50 the table reports storage 51,250 numbers against 262,144 for the full image, a compression ratio of about 5.12×, yet the energy captured is 0.9891. Why does the ratio fall so much faster than the fidelity as k grows — and what does that tell you about where the useful compression lives?

Answer Storage grows linearly in k (it is k(m+n+1)), so doubling k roughly doubles storage and halves the ratio. But for a heavy-tailed spectrum the captured energy grows fast at first and then flattens — the first few singular values are huge, later ones tiny — so going from k=5 to k=50 adds storage 10× but only nudges energy from 0.891 to 0.989. The useful compression therefore lives at small k, on the steep part of the energy curve; past the knee you are paying linearly for diminishing returns. That knee is exactly what the scree-style plot of relative error versus k makes visible, and choosing k near it is the whole art of low-rank approximation.

Computational Note — If you swap in your from-scratch svd_from_scratch (eigendecomposition of Aᵀ A), expect the singular values to match numpy to many digits, but individual singular vectors may differ in sign — and within a repeated singular value, in the choice of basis for that subspace. This does not change A_k at all (the signs cancel in σᵢ uᵢ vᵢᵀ), which is itself a small proof that the reconstruction, not the factorization, is what is uniquely determined. Chapter 30 flagged exactly this sign ambiguity.

39.3.1 FAQ: Why is image compression the canonical first capstone?

Because it makes an abstract theorem visible. Eckart–Young is a statement about norms; here it becomes a picture that sharpens before your eyes, with a number — the discarded energy — that you can read off the spectrum in advance. It is the book's signature application precisely because the geometry, the algebra, and the computation all show up in one image. If you have never felt the SVD "click," build this first.

39.4 Track B — How do I build a recommender with matrix factorization?

(1) The linear algebra. A recommender starts with a user–item rating matrix R that is mostly empty — most users have rated only a few items. The bet of collaborative filtering is that R is approximately low rank: a small number k of hidden "factors" (genre, tone, era…) explains most of the variation. So we approximate

$$R \;\approx\; \mu + U V^{\mathsf{T}},$$

where U is m×k (one taste vector uᵢ per user), V is n×k (one profile vector vⱼ per item), and μ is the global mean rating. The predicted rating of user i for item j is a dot product — straight out of Chapter 18:

$$\hat R_{ij} = \mu + \mathbf{u}_i \cdot \mathbf{v}_j.$$

Geometric Intuition — Each user and each item becomes a point in the same k-dimensional "taste space." A user rates an item highly when their vectors point the same way — when their dot product is large. The model never knew what "action movie" means; it discovered a small set of axes that make the observed ratings line up, then used those axes to predict the ratings it never saw. Forcing the rank down to a small k is exactly what stops it from memorizing and forces it to generalize.

(2) Toolkit modules and data flow. Ratings matrix R plus a mask (1 where a rating is known, 0 where not) → fit U, V by gradient descent on the observed entries only → predict fills the whole matrix μ + U Vᵀrecommend ranks a user's unrated items → rmse_on_known validates the fit. Under the hood every prediction is a dot product (vectors.dot) and the gradient steps are matrix products (matrices.matmul). There is also an SVD route (svd_complete) that makes the Chapter 30–31 connection explicit.

(3) Key code. The fitting loop from toolkit/capstone/recommender.py — note how the mask zeroes the error on unknown entries so blanks exert no pull:

# Factor R ≈ μ + U V^T using ONLY the known entries (mask == 1).  (Chapters 8, 33)
mu = R[mask > 0].mean()                  # baseline: mean of observed ratings
U  = rng.normal(0, 0.1, (m, k))          # latent user factors (taste)
V  = rng.normal(0, 0.1, (n, k))          # latent item factors (genre mix)
for _ in range(steps):
    E  = mask * (R - (mu + U @ V.T))     # masked error: blanks contribute zero
    U += lr * (E   @ V - reg * U)        # gradient step on the observed error
    V += lr * (E.T @ U - reg * V)

(4) Validation and presentation. On the Chapter 33 anchor (5 users × 4 movies; movies 0–1 are action, 2–3 are romance) the demo prints exactly:

global mean mu      = 3.111
train RMSE on known = 0.21
predict (user0, movie3) = 1.5   (action fan, romance film -> low)
predict (user3, movie1) = 2.98  (romance fan, action film -> moderate)
  k=1: train RMSE = 0.437
  k=2: train RMSE = 0.210
  k=3: train RMSE = 0.063

The two non-negotiable validations (both honored in the demo): (i) fit only the known entries — multiply the error by the mask, or you are cheating by training on blanks you invented; and (ii) check rmse_on_known before trusting any prediction — a model that cannot reproduce the ratings it was shown has no business predicting the ones it was not. The diagnostic table also shows the right qualitative trend: more factors fit the training data better (RMSE drops 0.437 → 0.210 → 0.063 as k goes 1 → 2 → 3). Notice why the predictions are believable, not just close: with k = 2, the two latent axes the model discovered behave like "action-ness" and "romance-ness," even though no one labeled them. User 0 rated the two action films high and a romance film low, so their taste vector u₀ points toward the action axis; movie 3 is a romance film, so v₃ points the other way; their dot product is small, and the prediction μ + u₀·v₃ lands at 1.5 — the model inferred that an action fan would dislike a romance film. That is collaborative filtering working exactly as the geometry predicts: alignment in taste space is a high rating. To present: show two predicted blanks and argue they are sensible — the action fan is predicted 1.5 on a romance film, the romance fan a moderate 2.98 on an action film — then warn honestly that monotonically decreasing training RMSE is the classic overfitting trap, which is why real systems hold out a test set and tune k (and the regularization reg) against it.

Warning

— Falling training error as k grows is not evidence of a better recommender. With enough factors the model memorizes the known entries and predicts garbage on the rest. The condition that makes matrix factorization generalize is that k stays small relative to the data — exactly the low-rank assumption. State this in your write-up; it is the single most important caveat in the track.

39.4.1 FAQ: Is a matrix-factorization recommender really "just" the SVD?

Almost, with one decisive difference. If R were full (no blanks), the best rank-k factorization would be the truncated SVD by Eckart–Young — and the demo's svd_complete route shows this. But R is mostly blank, and you must not fill the blanks with a guess and then factor (that bakes your guess into the answer). The gradient route fits only the observed entries, which is why production systems (the Netflix Prize winners among them) use it for sparse data. So: same low-rank idea as the SVD, fit differently because the data has holes.

39.5 Track C — How do I rank a network with PageRank?

(1) The linear algebra. Model a small web (or any directed network — citations, social follows, road maps) as a graph. Build the column-stochastic link matrix M: column j holds the outgoing links of page j, each entry 1/(out-degree) so the column sums to 1. The PageRank vector is the dominant eigenvector of the Google matrix

$$G = d\,M + \frac{1-d}{n}\,\mathbf{1}\mathbf{1}^{\mathsf{T}},$$

where d ≈ 0.85 is the damping factor and the second term is the "teleport" that makes the chain irreducible. By the Perron–Frobenius theorem (Chapter 29), a positive column-stochastic matrix has a unique dominant eigenvalue equal to 1 with a positive eigenvector — and power iteration (Chapter 23/29) converges to it: start from a uniform vector and multiply by G repeatedly until it stops changing.

Geometric Intuition — Imagine a web surfer clicking links at random, and occasionally (probability 1 − d) teleporting to a random page. PageRank is the long-run fraction of time the surfer spends on each page — the stationary distribution. Power iteration literally simulates this: each multiplication by G is one step of the random walk averaged over all surfers, and the distribution settles into the one direction G leaves fixed. That fixed direction is the eigenvector with eigenvalue 1.

(2) Toolkit modules and data flow. Adjacency list → build column-stochastic M → form G with damping → power_iteration (your toolkit/eigen.py) returns the dominant eigenpair → normalize to sum 1 → sort to get the ranking. Each iteration is one matrices.apply (matrix × vector). You validate by checking the dominant eigenvalue is 1 and by cross-checking the eigenvector against a direct eigensolver.

(3) Key code. The whole simulation is a dozen lines. Building G and running the iteration:

# Column-stochastic link matrix M, then the Google matrix G, then power iteration.
n = len(links)
M = np.zeros((n, n))
for src, dests in links.items():            # links: {page: [pages it links to]}
    for d_ in dests:
        M[d_, src] = 1.0 / len(dests)       # column j sums to 1
G = 0.85 * M + (0.15 / n) * np.ones((n, n)) # damping d = 0.85
v = np.ones(n) / n                          # start uniform
for _ in range(100):                        # power iteration (Chapter 29)
    v = G @ v
    v = v / v.sum()                         # renormalize to a probability vector

(4) Validation and presentation. On the canonical 4-node graph {0→[1,2], 1→[2], 2→[0], 3→[0,2]}, power iteration converges to exactly:

PageRank (d=0.85):   [0.3797, 0.1989, 0.3839, 0.0375]
dominant eigenvalue: 1.0000   (Perron–Frobenius)
ranking (best first): page 2, page 0, page 1, page 3

Validate three ways: (i) the converged vector should match the dominant eigenvector from a direct eigensolver (np.linalg.eig) — and it does, to four-plus digits; (ii) the dominant eigenvalue must be 1 (it is 1.0000), the Perron–Frobenius guarantee; (iii) the result is a probability vector — nonnegative and summing to 1. To present: draw the graph, then annotate each node with its PageRank, and tell the story the numbers tell — page 2 edges out page 0 not because it has the most incoming links but because it is pointed to by the high-ranking page 0 (and page 3), while page 3 ranks lowest because nothing of importance links to it. That is the recursive heart of PageRank: a link from an important page counts for more. Then show the convergence curve — how fast v stops moving — to make the power iteration visible.

Common Pitfall — Confusing in-degree with PageRank. A page with many incoming links from junk pages can rank below a page with one incoming link from an important page. PageRank is recursive — "important = linked-to by important things" — which is precisely why you need an eigenvector, not a link count. Stating this distinction is the difference between a report that understands the algorithm and one that merely runs it.

39.5.1 FAQ: Why use power iteration instead of solving for the eigenvector directly?

Scale and the structure of the problem. The real web matrix is billions of pages — far too large to form det(G − λI) = 0 or call a dense eigensolver. But M is extremely sparse (most pages link to only a handful of others), so one multiplication G @ v is cheap, and power iteration needs only multiplications. It also converges fast here because the gap between the dominant eigenvalue (1) and the next one is controlled by the damping d. This is the recurring lesson of Chapter 38 and the eigen.py docstring: beyond 2×2 there is no formula (Abel–Ruffini), so large eigenproblems are always solved by iteration.

39.6 Track D — How do I run PCA on a real dataset?

(1) The linear algebra. Principal Component Analysis (Chapter 32) finds the orthogonal directions along which a dataset varies most. Center the data (subtract the mean of each feature), form the covariance matrix C = (1/(N−1)) Xᶜᵀ Xᶜ, and eigendecompose it. The eigenvectors are the principal components — the axes of the data cloud — and each eigenvalue is the variance captured along its axis. Project the centered data onto the top few components and you have reduced the dimension while keeping the structure that matters. Equivalently and more stably, take the SVD of the centered data Xᶜ = U Σ Vᵀ: the right singular vectors V are the principal components, and σᵢ²/(N−1) are the variances — which is why PCA is the SVD wearing a statistician's hat.

Geometric Intuition — A correlated dataset looks like a cigar-shaped (or pancake-shaped) cloud of points. PCA rotates your coordinate axes to line up with the cloud: the first axis runs down the long direction (most variance), the next perpendicular to it (next-most), and so on. If the cloud is nearly a line, the first component captures almost everything and you can describe each point by a single number with little loss — that is dimensionality reduction.

(2) Toolkit modules and data flow. Data matrix X → center → covariance Cpca (your toolkit/pca.py, which eigendecomposes C via eigen.py, or SVDs Xᶜ via svd.py) → eigenvalues give explained variance, eigenvectors give the components → project onto the top k to get scores. Centering uses vectors/matrices; the covariance is matmul(transpose(Xc), Xc) scaled.

(3) Key code. The mathematical core, in a few lines (your toolkit/pca.py does this from scratch; here is the numpy verification the capstone uses):

# PCA two ways, which must agree (Chapter 32).
mu  = X.mean(axis=0)
Xc  = X - mu                                  # center each feature
C   = (Xc.T @ Xc) / (X.shape[0] - 1)          # sample covariance matrix
evals, evecs = np.linalg.eigh(C)              # symmetric -> real eigenpairs
evals, evecs = evals[::-1], evecs[:, ::-1]    # sort descending
U, s, Vt = np.linalg.svd(Xc, full_matrices=False)
assert np.allclose(evals, s**2 / (X.shape[0] - 1))   # covariance eig == σ²/(N-1)

(4) Validation and presentation. On a small 5-sample dataset of three correlated features (height, weight, age), the two routes agree exactly and the result is striking:

covariance eigenvalues (variances): [239.32, 1.06, 0.23]
explained variance ratio:           [0.9946, 0.0044, 0.0009]
singular values of centered data:   [30.94, 2.06, 0.95]
check s^2/(N-1):                     [239.32, 1.06, 0.23]   ✓ matches eigenvalues
cumulative variance, first 2 PCs:    0.9990

Validate three things: (i) the eigenvector route and the SVD route give the same explained variances — s²/(N−1) equals the covariance eigenvalues (239.32, 1.06, 0.23); (ii) the explained-variance ratios sum to 1 (you have accounted for all the variance); (iii) the principal components are orthonormal — check Vᵀ V = I, the spectral-theorem guarantee for the symmetric covariance matrix (Chapter 27). To present: report the scree plot (eigenvalues in descending order) to justify how many components to keep, state the headline ("the first principal component explains 99.46% of the variance"), and — if your data has labels — plot the 2D scores and show that classes separate along the principal axes. The whole point lands in one sentence: three correlated measurements really live on essentially one axis. That single sentence — a 3D dataset collapsed to 1D with under half a percent of variance lost — is the entire promise of dimensionality reduction, and you can defend every digit of it from the spectrum you computed.

Warning

— PCA is scale-dependent. If one feature is in millimeters and another in kilometers, the large-scale feature dominates the covariance purely because of its units. The spectral theorem still applies — the covariance matrix is symmetric, so its eigenvectors are orthogonal — but the result is meaningless. Standardize (z-score) the features first unless they are already in comparable units. Note this in your report; graders and reviewers look for it specifically.

39.6.1 FAQ: Should I compute PCA from the covariance matrix or from the SVD?

For a capstone, do both and show they agree — that is a result, because it demonstrates you understand the equivalence. For production, prefer the SVD of the centered data. Forming Xᶜᵀ Xᶜ squares the condition number (Chapter 38), so the small eigenvalues — exactly the ones that tell you a direction carries little variance — lose precision. The SVD works on Xᶜ directly and is numerically gentler. This is the same conditioning lesson that made you prefer QR over the normal equations in Chapter 20.

39.7 Track E — How does a 3D rendering pipeline use linear algebra?

(1) The linear algebra. Rendering a 3D scene onto a 2D screen is a composition of matrix transformations (Chapter 12) — the purest expression of the book's first theme. The trick that makes it all linear is homogeneous coordinates: represent a 3D point (x, y, z) as the 4-vector (x, y, z, 1), so that translation — which is not linear in 3D — becomes a 4×4 matrix multiplication. The pipeline chains four transformations:

$$\mathbf{p}_{\text{clip}} = P \cdot V \cdot M \cdot \mathbf{p}_{\text{world}},$$

the Model matrix M (place the object: rotate, scale, translate), the View matrix V (place the camera), and the Projection matrix P (perspective: far things look smaller). A final perspective divide by the homogeneous w-coordinate maps everything into normalized device coordinates on the screen.

Geometric Intuition — Every frame of every 3D game is a flurry of matrix products. A cube's 8 vertices, each a 4-vector, are pushed through P·V·M; the rotation block of M spins the cube, the translation column moves it, the projection matrix makes the far face smaller than the near face, and the perspective divide flattens it onto your screen. Sixty times a second. The geometry you have studied since Chapter 1 — a matrix is a function that transforms space — is, quite literally, what your GPU does.

(2) Toolkit modules and data flow. Vertices as homogeneous 4-vectors → build M, V, P as 4×4 matrices → compose them with matmul (your toolkit/matrices.py) → apply to each vertex with apply → perspective divide → 2D screen coordinates. This track leans almost entirely on matrices.py: matrix multiplication as composition is the entire idea. The rotation blocks are orthogonal matrices (Chapter 21) — they preserve lengths and angles, which is why rotating a cube does not distort it.

(3) Key code. Composing the model transform and pushing one vertex through (numpy verification matching the from-scratch matmul):

# Model = translate ∘ rotate-about-y; then perspective; then the divide.  (Chapter 12)
th = np.pi/4                                  # 45° rotation about the y-axis
Ry = np.array([[ np.cos(th), 0, np.sin(th), 0],
               [ 0,          1, 0,          0],
               [-np.sin(th), 0, np.cos(th), 0],
               [ 0,          0, 0,          1]])
T  = np.eye(4); T[:3, 3] = [0, 0, -5]         # push the object in front of the camera
Model = T @ Ry                                # COMPOSITION: rotate first, then translate
clip  = P @ (Model @ np.array([1, 1, 1, 1.0]))  # a cube vertex through the pipeline
ndc   = clip[:3] / clip[3]                    # perspective divide by w

(4) Validation and presentation. For the cube vertex (1,1,1) rotated 45° about y, translated to z = −5, and run through a 90°-field-of-view perspective, the pipeline produces exactly:

after Model (rotate+translate):  (1.4142, 1.0, -5.0, 1.0)
clip coordinates:                (1.4142, 1.0, 3.8889, 5.0)
NDC (after divide by w=5.0):     (0.2828, 0.2000, 0.7778)
rotation block R3 @ R3.T = I,  det(R3) = 1.0

Validate the transformations, not just the pixels: (i) the rotation block satisfies R Rᵀ = I and det(R) = 1 — it is a proper rotation, lengths preserved (Chapter 21); (ii) the model transform preserves the shape of the cube — apply it to all 8 vertices and confirm edge lengths are unchanged; (iii) points farther from the camera get a larger w, so the divide shrinks them, which is correct perspective foreshortening. To present: render the cube as a wireframe at several rotation angles (an animation if you can), and overlay one fully worked vertex's journey through the four spaces — world → view → clip → screen — so the reader sees a single point being transformed step by step. Nothing makes "a matrix is a transformation" land harder than watching a corner of a cube travel through the pipeline you built.

Common Pitfall — Matrix multiplication is not commutative (Chapter 8), and order is everything in graphics. T @ Ry (rotate about the object's own center, then translate) is a different transform from Ry @ T (translate away from the origin, then rotate — which swings the object around the origin like a planet). Reversing the order is the single most common rendering bug. Your from-scratch matmul already encodes the correct reading: matmul(A, B) means "do B, then A."

39.7.1 FAQ: Why bother with the 4th homogeneous coordinate?

Because translation is not a linear map on ℝ³ — it does not fix the origin, so no 3×3 matrix can do it. Lifting to ℝ⁴ with a constant 1 in the fourth slot turns translation into a shear in the higher space, which is linear and is a matrix. The same 4×4 framework then unifies rotation, scaling, translation, and perspective into a single composable operation — exactly the kind of abstraction-pays-off move you first saw with vector spaces in Chapter 5. The perspective divide at the end is the one nonlinear step, and the w-coordinate is precisely what carries the depth information that makes it work.

39.8 What do all five tracks have in common?

You have now seen five projects that look nothing alike on the surface — a photograph, a movie-ratings table, a web graph, a cloud of measurements, a spinning cube. The threshold idea of this chapter is that, underneath, they are the same computation with different labels on the rows and columns. Pinning down that shared skeleton is the most valuable thing you can take from the capstone, because it is what lets you walk into a sixth application you have never seen and immediately know what to do.

39.8.1 FAQ: What is the shared skeleton behind every applied track?

Every track is an instance of one five-step recipe. Read it once, then re-read each track above and watch it fit.

  1. Encode the problem as a matrix. Decide what the rows and columns mean. Image: rows and columns are pixel coordinates, entries are brightness. Recommender: rows are users, columns are items, entries are ratings. PageRank: rows and columns are pages, entries are link weights. PCA: rows are samples, columns are features. Rendering: columns are the images of the basis vectors, i.e. the transformation itself. The hardest and most important modeling decision is this first one — what does an entry represent?
  2. Choose the operation that exposes the structure you want. Low-rank structure → SVD (image, recommender, the SVD route of PCA). The single fixed direction → dominant eigenvector via power iteration (PageRank). The directions of maximal variance → eigenvectors of a symmetric covariance matrix (PCA). A composed motion → a product of transformation matrices (rendering).
  3. Compute it with the toolkit, at scale with numpy. This is the part you built. Every operation routes down the dependency tower of §39.1 to apply, matmul, dot, add, scale.
  4. Read meaning back out of the result. Singular values are importance; eigenvalues are variance or stationarity; dot products are similarity; transformed coordinates are screen positions. The numbers are not the answer — their interpretation is.
  5. Validate, then present. Cross-check against an independent computation and against an invariant the theory guarantees; then make the transformation visible.

The Key Insight — Modeling is mostly step 1 and step 4 — deciding what the matrix means and how to read its decomposition. Steps 2 and 3 are the linear algebra you already own. This is why a linear algebra background transfers across so many fields: once you can see a problem as "a matrix whose rows are ___ and columns are ___," the rest is machinery you have built and validated.

Look at where the same factorization does completely different jobs. The SVD compresses an image (Track A), underlies the comparison route of a recommender (Track B), and is PCA when applied to centered data (Track D) — one decomposition, three applications, exactly as the book's fourth theme promises. The eigenvalue problem ranks the web (Track C, the dominant eigenvector) and finds principal directions (Track D, the covariance eigenvectors) — the sixth theme, eigenvalues revealing a matrix's true action, in two disguises. And the humble dot product measures user–item affinity (Track B), surface lighting (Track E's extension), and document similarity all at once. The applications are many; the mathematics is one.

39.8.2 FAQ: How exactly do I validate a from-scratch result against numpy — and what do discrepancies mean?

This is the third recurring theme — computation validates theory — turned into a procedure you run on your code. There are three kinds of agreement to check, in increasing strength.

(a) Agreement on a known answer. Feed a problem whose answer you computed by hand and confirm both your toolkit and numpy reproduce it. Your eig_2x2([[4,1],[2,3]]) returns (5.0, 2.0); sorted(np.linalg.eigvals([[4,1],[2,3]])) returns [2., 5.]. Three independent witnesses — pencil, your code, numpy — agreeing is strong evidence.

(b) Agreement up to a documented ambiguity. Some objects are not unique, and naive equality will "fail" for correct code. Singular and eigen vectors are defined only up to sign; within a repeated singular/eigenvalue, only up to a choice of basis for the subspace. So compare singular values (unique) directly, but compare vectors up to sign — or, better, compare the thing that is unique, like the rank-k reconstruction A_k or the projection matrix P. A from-scratch SVD whose U differs from numpy's only by column signs is correct; A_k will match to many digits.

(c) Agreement with a theoretical invariant. The strongest check verifies a property the theorem guarantees, independent of any reference implementation. The reconstruction error equals the discarded singular-value energy (Eckart–Young). The PageRank eigenvalue is exactly 1 (Perron–Frobenius). The principal components are orthonormal, Vᵀ V = I (spectral theorem). A rotation block satisfies R Rᵀ = I and det R = 1. These are self-checks that need no numpy at all — they check your code against the mathematics.

When agreement fails, the failure is diagnostic, and Chapter 38 is your field guide:

# A from-scratch result that "disagrees" with numpy: read the discrepancy, don't hide it.
import numpy as np
A = np.array([[1e-8, 1.0], [1.0, 1.0]])     # ill-conditioned-ish example
print("condition number:", np.linalg.cond(A))   # large -> small errors get amplified
# If your from-scratch solve diverges here, the cause is conditioning, not a bug.

A mismatch in the last few digits of a well-conditioned problem is ordinary floating-point rounding — expect it, set a tolerance (np.allclose, default ~1e-8), move on. A mismatch in the leading digits means one of three things, in order of likelihood: a genuine bug (check first), an ill-conditioned problem where the answer is legitimately sensitive (compute np.linalg.cond; a condition number of 10^k can cost you about k digits), or catastrophic cancellation in a subtraction of nearly equal numbers (the normal equations Aᵀ A are the classic offender, which is why Chapter 20's QR exists). Never silently widen the tolerance until the test passes — that hides the very information the disagreement is offering you. Diagnose, then decide.

Common Pitfall — Treating any numerical difference as a bug. Two correct programs computing the same quantity will routinely differ in the last digit or two, and two correct programs computing a non-unique quantity (an eigenvector's sign, a basis for a repeated eigenspace) can differ entirely while both are right. Know which kind of object you are comparing before you call something broken. This single habit will save you hours on every capstone.

39.9 What does a good capstone look like? (The rubric)

A finished capstone is judged on more than "it runs." Across all five tracks, the same five qualities separate a strong project from a mediocre one. Use this as a checklist while you build and again before you present.

1. Correct linear algebra, correctly used. The right factorization or iteration for the problem (SVD for low-rank, power iteration for the dominant eigenvector, the covariance eigenproblem for PCA), with the conditions stated. If you used the spectral theorem, say the matrix is symmetric. If you used Eckart–Young, say it is the best rank-k approximation. A project that runs but misuses the theory is a weak project.

2. The toolkit is genuinely integrated. Your demo imports and composes your from-scratch modules — it does not silently re-implement them or lean entirely on numpy where your code would do. The capstone demos may use numpy for scale, but a strong submission demonstrates at least one place where the from-scratch module produces the same answer (the validation step below).

3. Results are validated, not just produced. Every track above ships with assertions and cross-checks: error decreases monotonically, the from-scratch result matches numpy, the eigenvalue is 1, the two PCA routes agree, the rotation block is orthogonal. A number you have not validated is a number you do not believe. This is the third recurring theme — computation validates theory — operationalized as code that checks itself.

4. The presentation makes the geometry and the algebra visible. Figures that show the transformation (the reconstructed image, the data cloud and its principal axes, the annotated graph, the cube's vertex journey), not just a table of numbers. A reader should be able to see what the linear algebra did. This is the G-track contract of the whole book, carried into your own work.

5. Honest limitations. Name what your project does not handle: overfitting as k grows (recommender), scale-dependence (PCA), the sign/basis ambiguity (SVD), conditioning of the normal equations (least squares), the order-sensitivity of composed transforms (rendering). A project that knows its own edges is a mature one.

Geometric Intuition — Notice that all five rubric items are really one item viewed from different angles: did you keep the geometry, the algebra, and the computation in view simultaneously? That is the habit the whole book has been training — the second recurring theme, that geometry and algebra are two views of the same object, plus the third, that computation and theory are partners. A great capstone is just that habit made visible to someone else.

39.9.1 FAQ: How do I present a linear algebra project to a non-specialist?

Lead with the picture and the one-sentence result, then reveal the machinery only as deep as your audience wants. "Here is a photo stored in 4% of the space, and it looks almost identical — that is the SVD keeping only the most important layers." "Here is which page the web considers most important, and why — it is pointed to by other important pages." Open with the transformation made visible, give the headline number, then let the curious follow the dot products, eigenvectors, and singular values down into the math. The geometry is your bridge to every reader; the algebra is the rigor underneath; the computation is the proof it actually works.

39.9.2 FAQ: How should I structure the final write-up?

A capstone report has a predictable, defensible shape, and following it makes your work easy to evaluate. Five short sections:

  1. Problem and encoding. One paragraph: what is the problem, and what does your matrix represent — rows, columns, entries? This is step 1 of the §39.8 recipe, and stating it well signals that you understand the modeling, not just the mechanics.
  2. Method. Name the operation (SVD, power iteration, covariance eigendecomposition, matrix product) and state its conditions (symmetric for the spectral theorem, full column rank for least squares, column-stochastic for PageRank). Say which toolkit modules you call and, in one line, why this operation exposes the structure you want.
  3. Results. The headline number and the key figure first, then the supporting table. "Rank-10 reconstruction captures 94.3% of the energy at 3.9% of the storage," with the side-by-side images. Quote the exact numbers your code prints — never round-trip them through memory.
  4. Validation. Show the cross-checks: the from-scratch-versus-numpy agreement, the invariant the theory guarantees (error equals discarded energy; eigenvalue equals 1; components orthonormal; rotation block orthogonal). This section is what separates a report that claims a result from one that demonstrates it.
  5. Limitations and next steps. The honest edges from rubric item 5, plus the one extension you would build next. A page that names its own boundaries reads as mature; a page that pretends it has none reads as naive.

Common Pitfall — The three failure modes that most often sink an otherwise-good capstone: (i) a wall of numbers with no picture — you computed the transformation but never made it visible, breaking the G-track contract; (ii) an unvalidated headline — a striking result with no cross-check, so the reader cannot tell a correct answer from a plausible-looking bug; and (iii) a misused theorem — running PCA on unscaled features, or calling a non-symmetric matrix's eigenvectors orthogonal, or filling a recommender's blanks before factoring. Each is fatal to trust, and each is avoidable by working through the rubric before you present, not after.

There is one more reason to write it up carefully: the act of explaining your project back to someone else is the final test of whether you understand it. If you can narrate a single data point's journey — a pixel into a rank-k layer, a user into taste space, a page into its PageRank, a sample onto a principal axis, a cube vertex onto the screen — and say which toolkit function and which theorem governs each step, then the subject has become yours. That narration is the capstone.

39.10 The themes, one last time — and the Build-Your-Toolkit capstone

Step back and look at what just happened across five wildly different applications. Transformations were the point every time: an image is a matrix transforming a grid of basis layers, a render is a chain of transforms, PageRank is the fixed direction of a transformation, PCA and the recommender both find the directions that matter under a transformation. Geometry and algebra were one object: the dot product that predicts a rating is the cosine that measures taste; the eigenvector that ranks the web is the fixed point of a random walk. Computation validated theory: every track shipped a check, and where your from-scratch code met numpy, two independent paths confirmed one truth. The four fundamental subspaces organized it all — least squares projects onto a column space, the SVD's U and V are orthonormal bases for the column and row spaces, PageRank's solution is the one-dimensional fixed subspace. Eigenvalues revealed the true action — of the Google matrix, of the covariance matrix, of every symmetric form you diagonalized. And through every one of them ran the fourth theme, the one this chapter exists to prove: linear algebra is the most applied branch of pure mathematics. You learned it once. You just used it five ways.

Historical Note — The threads you followed have real lineages. The SVD's existence was established by Beltrami and Jordan in the 1870s, with later contributions by Sylvester and others; the low-rank optimality you used is the Eckart–Young theorem of 1936. PageRank was introduced by Larry Page and Sergey Brin around 1996–1998 at Stanford, resting on the Perron–Frobenius theorem (Perron 1907, Frobenius 1912) and on power iteration, an old idea in numerical linear algebra. PCA traces to Karl Pearson (1901) and Harold Hotelling (1933). The dates and primary attributions here are the standard account; for any one you intend to assert as fact, confirm the source. [verify]

Build Your Toolkit — the capstone. This is the one. Assemble your complete toolkit/vectors, matrices, linear_systems, inverse, lu, determinant, projection, gram_schmidt, eigen, svd, pca — and confirm the whole stack imports cleanly and every module's self-check passes against numpy. Then choose one track and run its demo in toolkit/capstone/ end-to-end: image_compression, recommender, pagerank, pca_demo, or render3d. Validation checklist (do every box for your chosen track): (1) the demo runs from the repo root and prints its report; (2) the from-scratch result matches numpy to floating-point tolerance — singular values match (SVD), the two PCA routes agree, the dominant eigenvalue is 1 (PageRank), R Rᵀ = I (rendering), rmse_on_known is small (recommender); (3) the headline invariant holds — error decreases monotonically with k (SVD), the PageRank vector is nonnegative and sums to 1, the explained-variance ratios sum to 1 (PCA); (4) you can name the toolkit modules the demo calls and the theorem each step relies on; (5) you produce at least one figure that makes the transformation visible. Where a from-scratch result diverges from numpy, do not patch it away — diagnose it with Chapter 38 (conditioning, cancellation, the order of operations). When all five boxes are checked, you have not just finished a chapter. You have built, validated, and can defend a complete piece of applied linear algebra — which is the entire point of the book.

39.10.1 FAQ: How do I extend my capstone beyond the minimum?

Pick the next honest question your project raises and answer it. Compression: add color (run the SVD on each RGB channel, or on a luminance/chrominance split) and compare to JPEG. Recommender: hold out a test set, tune k and reg against it, and report test RMSE — the number that actually matters. PageRank: scrape a real small network (a Wikipedia category, a citation graph) and time how convergence speed depends on the damping d. PCA: run it on a labeled dataset (a classic is the iris measurements, or a handful of digit images) and show classes separating along the components. Rendering: add a second object, a moving camera, or simple lighting (which is a dot product between a surface normal and a light direction — Chapter 18 again). Each extension is a new chapter's worth of the same lens you have already ground. The exercises.md for this chapter lays out tiered milestones for exactly these.