Chapter 31 — Quiz

Q: What is the truncated SVD , and what is its rank (assuming )?

, the sum of the top rank-1 layers of the SVD. Its rank is exactly — it is a sum of independent rank-1 matrices. Explanation: keeping the largest singular values and their singular vectors gives the rank- matrix closest to (Eckart-Young).

Q: State the Eckart-Young theorem in one sentence, including its conditions.

For any matrix (no symmetry, squareness, or invertibility required), the truncated SVD is the closest rank- (or lower) matrix to in both the Frobenius and operator norms. Explanation: the minimizer is unique when there is a gap ; the errors are and .

Q: A grayscale image is pixels. How many numbers does a rank- approximation cost to store, and how does that compare to the original?

Rank- storage is numbers (the left vectors, right vectors, and singular values), versus for the original. Explanation: compression pays off whenever , i.e. for well below .

Q: The relative Frobenius error of and the "energy captured" are related by a single equation. What is it?

. Explanation: squared relative error equals one minus the fraction of energy kept — capture of the energy and the relative error is .

Q: Why is a rank-1 matrix "small" even though it has the same shape as ?

It has the same number of entries but tiny information content: every entry is determined by just numbers (the vectors and the scalar ). Explanation: confusing shape with storage cost hides the whole point of compression — looks full-size on screen but is cheap to store and transmit.

DataField.Dev

Chapter 31 — Quiz

Twelve conceptual checks. Try each before opening the answer.

Q1. What is the truncated SVD $A_k$, and what is its rank (assuming $\sigma_k > 0$)?

Answer

$A_k = \sum_{i=1}^{k}\sigma_i\mathbf{u}_i\mathbf{v}_i^{\mathsf{T}}$, the sum of the top $k$ rank-1 layers of the SVD. Its rank is exactly $k$ — it is a sum of $k$ independent rank-1 matrices. *Explanation:* keeping the largest $k$ singular values and their singular vectors gives the rank-$k$ matrix closest to $A$ (Eckart-Young).

Q2. State the Eckart-Young theorem in one sentence, including its conditions.

Answer

For *any* matrix $A$ (no symmetry, squareness, or invertibility required), the truncated SVD $A_k$ is the closest rank-$k$ (or lower) matrix to $A$ in both the Frobenius and operator norms. *Explanation:* the minimizer is unique when there is a gap $\sigma_k > \sigma_{k+1}$; the errors are $\lVert A-A_k\rVert_F = \sqrt{\sum_{i>k}\sigma_i^2}$ and $\lVert A-A_k\rVert_2 = \sigma_{k+1}$.

Q3. A grayscale image is $m \times n$ pixels. How many numbers does a rank-$k$ approximation cost to store, and how does that compare to the original?

Answer

Rank-$k$ storage is $k(m+n+1)$ numbers (the $k$ left vectors, $k$ right vectors, and $k$ singular values), versus $mn$ for the original. *Explanation:* compression pays off whenever $k(m+n+1) < mn$, i.e. for $k$ well below $mn/(m+n+1)$.

Q4. The relative Frobenius error of $A_k$ and the "energy captured" are related by a single equation. What is it?

Answer

$\left(\dfrac{\lVert A - A_k\rVert_F}{\lVert A\rVert_F}\right)^2 = 1 - (\text{energy captured}) = \dfrac{\sum_{i>k}\sigma_i^2}{\sum_i\sigma_i^2}$. *Explanation:* squared relative error equals one minus the fraction of energy kept — capture $99\%$ of the energy and the relative error is $\sqrt{0.01}=10\%$.

Q5. Why is a rank-1 matrix "small" even though it has the same $m \times n$ shape as $A$?

Answer

It has the same number of *entries* but tiny *information content*: every entry is determined by just $m+n+1$ numbers (the vectors $\mathbf{u}, \mathbf{v}$ and the scalar $\sigma$). *Explanation:* confusing shape with storage cost hides the whole point of compression — $A_k$ looks full-size on screen but is cheap to store and transmit.

Q6. In the visual progression of image compression, why does the picture sharpen so quickly at first and so slowly later?

Answer

Because the singular values decrease, so the first few layers carry most of the energy and the long tail carries little. *Explanation:* going from rank 5 to 10 adds a lot (big singular values); going from 50 to 200 adds little (tiny singular values) — diminishing returns built into the spectrum.

Q7. SVD compression versus JPEG: which is better for photographs, and why?

Answer

JPEG is better for photographs. *Explanation:* JPEG exploits human vision (discarding high-frequency detail the eye misses, via the DCT on $8\times8$ blocks plus entropy coding); SVD has no model of vision. SVD's value is as a *general-purpose, provably optimal low-rank approximation* for any matrix — signals, data tables, covariance matrices — where there is no visual system to exploit.

Q8. Why does truncating the SVD remove noise from a low-rank signal?

Answer

Signal is coherent and concentrates in a few large singular values; random noise is incoherent and spreads thinly across all singular values. *Explanation:* truncating below the noise "bulk" keeps the signal layers and discards the noise — and Eckart-Young guarantees the truncation is the optimal rank-$k$ summary.

Q9. What is a scree plot, and how do you use it to choose the rank?

Answer

A plot of the singular values (or their squares) against their index. *Explanation:* for compressible data it drops steeply then flattens into a tail; the "elbow" where it flattens is a natural truncation point, because layers beyond it add almost no energy.

Q10. Eckart-Young guarantees the truncated SVD is optimal. Does that mean low-rank denoising always works well? Explain.

Answer

No. Eckart-Young says $A_k$ is the best *rank-$k$* approximation, but it does not say a low rank is appropriate. *Explanation:* if the clean signal is itself full-rank (no spectral gap), truncating discards real signal along with noise — you blur, not denoise. Always check the scree plot for a clear gap before truncating.

Q11. How is dimensionality reduction the same operation as image compression?

Answer

Both keep the top $k$ singular layers and discard the rest. *Explanation:* stacking data as a matrix, $A_k$ is the best rank-$k$ representation — projecting each point onto the top $k$ right singular vectors. The only difference from compression is interpretation: thin axes are "redundancy/noise" rather than "fine detail."

Q12. What single refinement turns raw SVD dimensionality reduction into PCA (Chapter 32)?

Answer

Centering the data — subtracting the mean of each feature before taking the SVD. *Explanation:* then the right singular vectors are the *principal components*, the squared singular values are the *variances*, and the energy fraction becomes the *fraction of variance explained*. PCA is the SVD of centered data.