Chapter 39 — Quiz: Synthesis
Twelve conceptual questions that test whether you can see across the whole toolkit and the five tracks. These are synthesis questions: most have a one-line answer and a one-line reason. No calculator needed.
1. The toolkit is described as "a dependency graph, not a list." What is at the very bottom of the graph, and what is at the top?
Answer
At the bottom are the two vector operations — addition and scalar multiplication — plus the dot product (`vectors.py`). At the top is the capstone application in `toolkit/capstone/`. Everything between (matmul, solve, inverse, LU, determinant, projection, QR, eigen, SVD, PCA) is assembled from the bottom up. *Explanation:* each higher operation is literally implemented by calling the ones below it.2. Why are the toolkit/capstone/ demos allowed to use numpy directly, when the rest of the toolkit forbids numpy in its implementations?
Answer
Because the demos' job is to *integrate at a useful scale*, not to teach the primitives. The from-scratch modules exist so you understand how each operation works; the capstone exists so you can build something real. You may still swap in your from-scratch `svd`/`pca` and get the same answers. *Explanation:* understanding (pure Python) and scale (numpy) are different goals, and the book separates them deliberately.3. Image compression (Track A), PCA (Track D), and the SVD route of the recommender (Track B) all use the same decomposition. Name it, and explain why one decomposition serves three applications.
Answer
The singular value decomposition, `A = UΣVᵀ`. It serves all three because each is a *low-rank approximation* problem, and by Eckart–Young the truncated SVD is the best rank-`k` approximation. *Explanation:* this is the book's fourth theme — linear algebra is the most applied branch of pure mathematics — made concrete: learn the SVD once, use it everywhere.4. PageRank ranks pages by the dominant eigenvector of the Google matrix, not by counting incoming links. Why does counting links give the wrong answer?
Answer
Because importance is *recursive*: a page is important if it is linked to *by important pages*. A page with many junk links can rank below a page with one link from an authority. Capturing "important = linked-to by important" requires solving a self-referential equation, whose solution is exactly an eigenvector. *Explanation:* the recursion is what forces an eigenvalue problem rather than a tally.5. Why does the capstone use power iteration for PageRank instead of solving det(G − λI) = 0 directly?
Answer
Scale and sparsity. The real web matrix has billions of pages — far too large for a characteristic polynomial or a dense eigensolver — but each multiplication `G @ v` is cheap because the link matrix is sparse. Power iteration needs only multiplications and converges to the dominant eigenvector. *Explanation:* beyond 2×2 there is no eigenvalue formula (Abel–Ruffini), so large eigenproblems are always solved by iteration.6. A from-scratch SVD produces a U whose columns have the opposite signs from numpy's. Is your code wrong?
Answer
No. Singular vectors are defined only up to sign (and, within a repeated singular value, up to a choice of basis). The signs cancel in the rank-one terms `σᵢ uᵢ vᵢᵀ`, so the *reconstruction* `A_k` is identical. *Explanation:* compare the things that are unique (singular values, `A_k`), not the things that are not (individual vectors).7. In the recommender, why must you fit only the observed entries (multiply the error by the mask) rather than treating blanks as zeros?
Answer
Because a blank means "unknown," not "rated zero." Treating blanks as real ratings trains the model to reproduce zeros it never observed, destroying its predictions. Masking the error makes the blanks exert no pull, so the factorization fits only real data and *generalizes* to the blanks. *Explanation:* the whole point of collaborative filtering is to predict the unknowns, which you cannot do if you fabricate them as zeros.8. A student reports that their recommender's training RMSE keeps dropping as they increase k. They conclude that larger k is always better. What is wrong?
Answer
Falling *training* error as `k` grows is overfitting, not improvement: with enough factors the model memorizes the known entries and predicts garbage on the rest. Matrix factorization generalizes only when `k` is *small* relative to the data — the low-rank assumption. You must measure *test* RMSE on held-out ratings. *Explanation:* fit on training data always improves with capacity; only held-out error reveals genuine quality.9. PCA can be computed from the eigendecomposition of the covariance matrix or from the SVD of the centered data. Which is numerically preferable, and why?
Answer
The SVD of the centered data. Forming `Xᶜᵀ Xᶜ` squares the condition number, so the small eigenvalues — exactly the low-variance directions you care about identifying — lose precision. The SVD works on `Xᶜ` directly and is gentler. *Explanation:* this is the same conditioning lesson (Chapter 38) that makes QR preferable to the normal equations.10. Why must features be standardized before PCA when they are in different units (e.g. millimeters and kilometers)?
Answer
Because PCA finds directions of maximal *variance*, and a feature measured on a large numeric scale dominates the covariance purely because of its units, not its importance. The spectral theorem still applies (the covariance matrix is symmetric), but the result is meaningless. Z-scoring puts features on a comparable footing. *Explanation:* PCA is scale-dependent; standardization removes the unit artifact.11. In the 3D rendering pipeline, T @ Ry and Ry @ T produce different images. What general property of matrix multiplication is this, and what is the practical consequence?
Answer
Non-commutativity (Chapter 8). `T @ Ry` rotates the object about its own center then translates it; `Ry @ T` translates it away from the origin then rotates, swinging it around the origin. The practical consequence: transformation *order* is a real and common source of rendering bugs. *Explanation:* `matmul(A, B)` means "do `B`, then `A`," so the rightmost matrix acts first.12. Across all five tracks, what single habit does a strong capstone demonstrate, according to the rubric?