Chapter 31 — Exercises
Problems are graded in four tiers: ⭐ conceptual, ⭐⭐ computation by hand, ⭐⭐⭐ proof or coding, ⭐⭐⭐⭐ application. Coding problems are marked [code] and proofs [proof]. Notation follows the chapter: $A = \sum_i \sigma_i\mathbf{u}_i\mathbf{v}_i^{\mathsf{T}}$, with truncated SVD $A_k = \sum_{i=1}^k\sigma_i\mathbf{u}_i\mathbf{v}_i^{\mathsf{T}}$.
⭐ Conceptual
31.1. In one sentence, state what the truncated SVD $A_k$ is, and why the singular values being in decreasing order makes "keep the top $k$ layers" the natural way to approximate $A$.
31.2. True or false, with a one-line reason: a rank-1 matrix $\sigma\mathbf{u}\mathbf{v}^{\mathsf{T}}$ is a small matrix that takes little memory because it has few entries.
31.3. The Eckart-Young theorem says $A_k$ is the best rank-$k$ approximation in two norms. Name them, and state the formula for the error in each.
31.4. Explain why a photograph of a clear blue sky is more compressible by SVD than a photograph of television static. Use the words rank and linearly dependent.
31.5. Define the "energy" of a matrix in terms of its singular values. Why is the energy captured by the top $k$ layers equal to $\sum_{i\le k}\sigma_i^2 / \sum_i\sigma_i^2$ rather than $\sum_{i\le k}\sigma_i / \sum_i\sigma_i$?
31.6. In denoising, why do we keep the large singular values and discard the small ones, rather than the other way around? What property of random noise makes this work?
31.7. A friend claims "SVD compression beats JPEG, so my phone should use it." Give two reasons this is wrong, and state what SVD low-rank approximation is genuinely good for that JPEG is not.
31.8. Why is choosing the truncation rank $k$ described as "choosing how much energy to keep"? What does the scree plot show, and what is its "elbow"?
⭐⭐ Computation by hand
31.9. A matrix has singular values $\sigma = (6, 3, 2)$. Compute (a) the total energy $\lVert A\rVert_F^2$; (b) the energy captured by the rank-1 approximation, as a fraction; (c) the relative Frobenius error $\lVert A - A_1\rVert_F / \lVert A\rVert_F$ of the rank-1 approximation. Confirm that (b) and (c) satisfy "squared relative error $= 1 -$ energy captured."
31.10. For the matrix $A = \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix}$: (a) What is its rank? (b) Find its single nonzero singular value (hint: compute $A A^{\mathsf{T}}$). (c) Write the rank-1 approximation $A_1$ — what is it, and what is the relative error?
31.11. A grayscale image is $256 \times 256$ pixels. (a) How many numbers store it naively? (b) How many for a rank-30 approximation, $k(m+n+1)$? (c) What is the compression ratio, and what percentage of the original storage is that? (d) Above what rank does compression stop saving anything?
31.12. A matrix has singular values $\sigma = (5, 4, 3, 2, 1)$. Find the smallest rank $k$ that captures at least (a) $70\%$, (b) $90\%$, (c) $98\%$ of the energy. (Compute the cumulative energy fractions.)
31.13. Using Eckart-Young, give the operator-norm and Frobenius-norm errors of the best rank-2 approximation of a matrix with singular values $(10, 8, 6, 1, 1, 1)$. What fraction of the energy does rank 2 capture?
31.14. Take $A = \begin{bmatrix} 2 & 2 \\ 1 & -1 \end{bmatrix}$ from §31.4.2 (singular values $2\sqrt2, \sqrt2$). Without recomputing the SVD, write down its rank-1 approximation $A_1$ and the exact relative error. Then state $A_2$ and its error.
31.15. A dataset matrix has singular values whose squares are $\sigma_i^2 = (80, 12, 5, 2, 1)$. You want to reduce dimension while keeping at least $95\%$ of the variance. How many components $k$ do you keep? What relative reconstruction error does that imply?
⭐⭐⭐ Proof / Coding
31.16. [proof] Prove that the truncated SVD error in the Frobenius norm is $\lVert A - A_k\rVert_F = \sqrt{\sum_{i>k}\sigma_i^2}$. (Hint: show $A - A_k = \sum_{i>k}\sigma_i\mathbf{u}_i\mathbf{v}_i^{\mathsf{T}}$ is itself a valid SVD, with orthonormal singular vectors, then apply the fact from Chapter 30 that the Frobenius norm is the root-sum-of-squares of the singular values.)
31.17. [proof] Show that for any $m \times n$ matrix, $\lVert A\rVert_F^2 = \sum_i \sigma_i^2$, using $\lVert A\rVert_F^2 = \operatorname{tr}(A^{\mathsf{T}}A)$ and the eigendecomposition of $A^{\mathsf{T}}A$ from Chapter 30. (This is the identity that makes every energy calculation in the chapter work.)
31.18. [proof] Prove the operator-norm lower bound of Eckart-Young: for any $B$ with $\operatorname{rank}(B) \le k$, $\lVert A - B\rVert_2 \ge \sigma_{k+1}$. (Hint: $\ker(B)$ has dimension $\ge n-k$ and so intersects $\operatorname{span}\{\mathbf{v}_1,\dots,\mathbf{v}_{k+1}\}$ nontrivially; evaluate $(A-B)$ on a unit vector in that intersection.)
31.19. [code] Write relative_error(A, k) that returns the relative Frobenius error of the rank-$k$ truncated SVD of A, and a second function eckart_young_tail(A, k) that returns $\sqrt{\sum_{i>k}\sigma_i^2}/\lVert A\rVert_F$ directly from the singular values. Verify on three random matrices that the two agree to machine precision.
31.20. [code] Write energy_curve(A) that returns the cumulative-energy array $\big(\sum_{i\le k}\sigma_i^2 / \sum_i\sigma_i^2\big)_{k=1}^{r}$, and rank_for_energy(A, target) that returns the smallest $k$ meeting a target fraction. Plot the cumulative-energy curve (the scree-derived curve) for a random $50\times50$ matrix and mark the rank for $90\%$.
31.21. [code] Construct a rank-3 matrix clean (sum of three random outer products with weights $40, 25, 15$), add Gaussian noise of standard deviation $0.3$, and compute the SVD of the noisy matrix. Print the top 8 singular values and identify the spectral gap. Reconstruct at the rank you read off, and report the relative error of the noisy matrix vs. clean and the denoised vs. clean. Confirm the denoised error is smaller.
31.22. [code] Verify the storage formula empirically: for an image-shaped random matrix A of size $(m, n)$, confirm that storing $U_k$, $\Sigma_k$, $V_k$ uses exactly $k(m+n+1)$ floats and that reconstructing from them reproduces $A_k$ exactly. Print the compression ratio for $k = 10, 50, 100$.
⭐⭐⭐⭐ Application
31.23. [code] Image compression on a real photograph. Load any grayscale image with Pillow (Image.open(path).convert("L")), convert to a float matrix, and reconstruct it at ranks $k = 5, 20, 50, 200$. For each, display the reconstruction (or save it), and print the compression ratio, relative error, and energy captured. Comment on which rank first looks "good enough" to your eye and how that compares to the energy-captured number. (This is the chapter anchor — see toolkit/capstone/image_compression.py.)
31.24. [code] Latent factors in a ratings matrix. Build a synthetic $200 \times 50$ "user × item" ratings matrix as a rank-2 signal (two taste factors) plus small noise. Compute its SVD, truncate to rank 2, and show that the rank-2 reconstruction predicts held-out entries far better than the raw noisy matrix. Discuss how the choice of $k$ here is a denoising/dimensionality-reduction decision, tying it to §31.6 and Chapter 33.
31.25. [code] Background removal, in miniature. Simulate a short "video" as a matrix whose columns are frames: a constant background vector repeated across all columns, plus a small moving "object" in a few frames, plus noise. Compute the SVD and show that the rank-1 approximation recovers the background, so that noisy - A_1 isolates the moving object and noise. Relate this to the robust-PCA application in §31.7.
31.26. (Written, no code.) A medical-imaging team compresses MRI scans with the truncated SVD and reports "$99\%$ energy retained at rank 40." A radiologist worries that a small but clinically important feature (a tiny tumor) might live in the discarded $1\%$. Write a short paragraph explaining the genuine tension here: why energy-based truncation can discard small-but-important structure, why Eckart-Young's optimality does not protect against this, and what diagnostic (think §31.7.1 and the scree plot) you would recommend before trusting the compression for diagnosis.