Chapter 39 — Quiz: Synthesis

Twelve conceptual questions that test whether you can see across the whole toolkit and the five tracks. These are synthesis questions: most have a one-line answer and a one-line reason. No calculator needed.


1. The toolkit is described as "a dependency graph, not a list." What is at the very bottom of the graph, and what is at the top?

Answer At the bottom are the two vector operations — addition and scalar multiplication — plus the dot product (`vectors.py`). At the top is the capstone application in `toolkit/capstone/`. Everything between (matmul, solve, inverse, LU, determinant, projection, QR, eigen, SVD, PCA) is assembled from the bottom up. *Explanation:* each higher operation is literally implemented by calling the ones below it.

2. Why are the toolkit/capstone/ demos allowed to use numpy directly, when the rest of the toolkit forbids numpy in its implementations?

Answer Because the demos' job is to *integrate at a useful scale*, not to teach the primitives. The from-scratch modules exist so you understand how each operation works; the capstone exists so you can build something real. You may still swap in your from-scratch `svd`/`pca` and get the same answers. *Explanation:* understanding (pure Python) and scale (numpy) are different goals, and the book separates them deliberately.

3. Image compression (Track A), PCA (Track D), and the SVD route of the recommender (Track B) all use the same decomposition. Name it, and explain why one decomposition serves three applications.

Answer The singular value decomposition, `A = UΣVᵀ`. It serves all three because each is a *low-rank approximation* problem, and by Eckart–Young the truncated SVD is the best rank-`k` approximation. *Explanation:* this is the book's fourth theme — linear algebra is the most applied branch of pure mathematics — made concrete: learn the SVD once, use it everywhere.

4. PageRank ranks pages by the dominant eigenvector of the Google matrix, not by counting incoming links. Why does counting links give the wrong answer?

Answer Because importance is *recursive*: a page is important if it is linked to *by important pages*. A page with many junk links can rank below a page with one link from an authority. Capturing "important = linked-to by important" requires solving a self-referential equation, whose solution is exactly an eigenvector. *Explanation:* the recursion is what forces an eigenvalue problem rather than a tally.

5. Why does the capstone use power iteration for PageRank instead of solving det(G − λI) = 0 directly?

Answer Scale and sparsity. The real web matrix has billions of pages — far too large for a characteristic polynomial or a dense eigensolver — but each multiplication `G @ v` is cheap because the link matrix is sparse. Power iteration needs only multiplications and converges to the dominant eigenvector. *Explanation:* beyond 2×2 there is no eigenvalue formula (Abel–Ruffini), so large eigenproblems are always solved by iteration.

6. A from-scratch SVD produces a U whose columns have the opposite signs from numpy's. Is your code wrong?

Answer No. Singular vectors are defined only up to sign (and, within a repeated singular value, up to a choice of basis). The signs cancel in the rank-one terms `σᵢ uᵢ vᵢᵀ`, so the *reconstruction* `A_k` is identical. *Explanation:* compare the things that are unique (singular values, `A_k`), not the things that are not (individual vectors).

7. In the recommender, why must you fit only the observed entries (multiply the error by the mask) rather than treating blanks as zeros?

Answer Because a blank means "unknown," not "rated zero." Treating blanks as real ratings trains the model to reproduce zeros it never observed, destroying its predictions. Masking the error makes the blanks exert no pull, so the factorization fits only real data and *generalizes* to the blanks. *Explanation:* the whole point of collaborative filtering is to predict the unknowns, which you cannot do if you fabricate them as zeros.

8. A student reports that their recommender's training RMSE keeps dropping as they increase k. They conclude that larger k is always better. What is wrong?

Answer Falling *training* error as `k` grows is overfitting, not improvement: with enough factors the model memorizes the known entries and predicts garbage on the rest. Matrix factorization generalizes only when `k` is *small* relative to the data — the low-rank assumption. You must measure *test* RMSE on held-out ratings. *Explanation:* fit on training data always improves with capacity; only held-out error reveals genuine quality.

9. PCA can be computed from the eigendecomposition of the covariance matrix or from the SVD of the centered data. Which is numerically preferable, and why?

Answer The SVD of the centered data. Forming `Xᶜᵀ Xᶜ` squares the condition number, so the small eigenvalues — exactly the low-variance directions you care about identifying — lose precision. The SVD works on `Xᶜ` directly and is gentler. *Explanation:* this is the same conditioning lesson (Chapter 38) that makes QR preferable to the normal equations.

10. Why must features be standardized before PCA when they are in different units (e.g. millimeters and kilometers)?

Answer Because PCA finds directions of maximal *variance*, and a feature measured on a large numeric scale dominates the covariance purely because of its units, not its importance. The spectral theorem still applies (the covariance matrix is symmetric), but the result is meaningless. Z-scoring puts features on a comparable footing. *Explanation:* PCA is scale-dependent; standardization removes the unit artifact.

11. In the 3D rendering pipeline, T @ Ry and Ry @ T produce different images. What general property of matrix multiplication is this, and what is the practical consequence?

Answer Non-commutativity (Chapter 8). `T @ Ry` rotates the object about its own center then translates it; `Ry @ T` translates it away from the origin then rotates, swinging it around the origin. The practical consequence: transformation *order* is a real and common source of rendering bugs. *Explanation:* `matmul(A, B)` means "do `B`, then `A`," so the rightmost matrix acts first.

12. Across all five tracks, what single habit does a strong capstone demonstrate, according to the rubric?

Answer Keeping the **geometry, the algebra, and the computation in view simultaneously** — the picture that shows what the transformation did, the theorem (with its conditions) that justifies the method, and the validated code that proves it runs. *Explanation:* the rubric's five items are one item from different angles, and it is the habit the whole book has been training.