Chapter 32 Quiz — Principal Component Analysis
Twelve conceptual questions. Try each before opening the answer. One-line explanations follow each.
Q1. In one sentence, what is the geometric meaning of the first principal component?
Answer
It is the direction in which the data spreads out the most — the long axis of the data ellipse, the unit direction $\mathbf{w}$ maximizing the projected variance $\mathbf{w}^{\mathsf{T}}C\mathbf{w}$. *Geometrically PC1 is the grain of the cloud.*Q2. Why is centering the data the essential first step of PCA?
Answer
Variance is spread *about the mean*, and the covariance formula $C = \tfrac{1}{n-1}\tilde X^{\mathsf{T}}\tilde X$ only measures that spread when the mean is zero. *Skip centering and PC1 points toward the cloud's center of mass — an artifact of location, not shape.*Q3. What are the two structural properties of the covariance matrix, and what does each one buy you?
Answer
It is **symmetric** (which unlocks the spectral theorem of Chapter 27: real eigenvalues, perpendicular eigenvectors) and **positive semidefinite** (which guarantees the eigenvalues — the variances — are non-negative, Chapter 28). *Symmetry gives the perpendicular axes; positive semidefiniteness gives meaningful non-negative variances.*Q4. The eigenvalues of a covariance matrix are $6, 3, 1$. What is the explained-variance ratio of the first principal component?
Answer
The total variance is $6 + 3 + 1 = 10$, so PC1 explains $6/10 = 60\%$. *The eigenvalue divided by the sum of all eigenvalues is the fraction of variance that component captures.*Q5. Why are the principal components always mutually perpendicular?
Answer
They are eigenvectors of the *symmetric* covariance matrix, and the spectral theorem (Chapter 27, §27.5) guarantees a symmetric matrix has an orthonormal eigenbasis. *PCA's orthogonality is the spectral theorem's orthogonality, inherited directly.*Q6. State the two equivalent routes to the principal components and say which is numerically preferred.
Answer
Route 1: eigenvectors of the covariance matrix $C = \tfrac{1}{n-1}\tilde X^{\mathsf{T}}\tilde X$ (spectral theorem). Route 2: right singular vectors of the centered data matrix $\tilde X = U\Sigma V^{\mathsf{T}}$ (SVD), with eigenvalues $\lambda_i = \sigma_i^2/(n-1)$. **The SVD route is preferred** because it never forms the covariance matrix, avoiding the squaring that doubles the condition number (Chapters 20, 38).Q7. Why is forming the covariance matrix a bad idea numerically, even though it gives the "right" answer in exact arithmetic?
Answer
Computing $\tilde X^{\mathsf{T}}\tilde X$ *squares the condition number* (Chapter 38): a data matrix with singular values spanning $10^4$ produces a covariance spanning $10^8$, and in finite precision the small eigenvalues — the subtle components — can be swamped by rounding error. *The SVD of $\tilde X$ never squares the data, so it keeps full precision.*Q8. True or false: PCA can recover the structure of data that lies on a curved 2D sheet rolled up in 3D (a "Swiss roll").
Answer
**False.** PCA finds *flat* subspaces (lines, planes through the origin); it cannot unroll a curved manifold and will fit a poor flat approximation. *Nonlinear methods (kernel PCA, t-SNE, UMAP, autoencoders) exist for curved structure.*Q9. You have features "age in years" and "income in dollars." Why might PCA on the raw data be misleading, and what is the fix?
Answer
Income has vastly larger numerical variance (dollars are tiny units), so PC1 chases income purely because of its scale, not its importance. **The fix is to standardize** — subtract the mean and divide by the standard deviation so every feature has variance 1 (equivalently, run PCA on the correlation matrix). *PCA's "importance" is variance, and variance has units.*Q10. What does the variance of the data projected onto the $k$-th principal component equal?
Answer
Exactly the $k$-th eigenvalue $\lambda_k$ of the covariance matrix. *This is the precise content of "the eigenvalue is the variance along its component," proved by the weighted-average argument of §32.4.*Q11. When you reduce data to its top $k$ components and reconstruct, what is the squared reconstruction error?
Answer
$(n-1)$ times the sum of the *discarded* eigenvalues, $\sum_{i>k}\lambda_i$ — the variance in the directions you threw away. *Keeping the high-variance components minimizes reconstruction error; this is the Pearson "best-fit subspace" view of PCA.*Q12. PCA rests on which two theorems from earlier in the book, and what does each contribute?
Answer
The **spectral theorem** (Chapter 27) — the covariance matrix is symmetric, so its eigenvectors (the components) are real and perpendicular; this gives PCA its *meaning*. And the **SVD** (Chapter 30) — the components are the right singular vectors of the centered data; this gives PCA its *computation*. *PCA is these two theorems applied to a covariance matrix.*Q13. (Bonus) What is the difference between scores and loadings?