Chapter 32 Exercises — Principal Component Analysis
Work the tiers in order; each builds on the last. ⭐ checks understanding, ⭐⭐ asks for hand computation, ⭐⭐⭐ asks for a proof (A) or code (C), and ⭐⭐⭐⭐ is an open-ended application. Problems marked [code] want a short numpy snippet (use np.linalg.eigh for the covariance matrix or np.linalg.svd for the centered data); [proof] wants a rigorous argument in the §10 four-part style. Throughout, $X$ is a data matrix with samples in rows and features in columns, $\tilde X$ is its centered version, and $C = \tfrac{1}{n-1}\tilde X^{\mathsf{T}}\tilde X$ is the covariance matrix. Always center before computing covariance, and state when you invoke the spectral theorem (it needs symmetry).
Tier 1 — ⭐ Conceptual
32.1. State in one sentence what PCA does geometrically, using the words "variance," "perpendicular," and "ellipse" (or "ellipsoid").
32.2. Why must you center the data before computing the covariance matrix? Describe in one sentence what goes wrong if you skip centering.
32.3. The covariance matrix $C$ has two key properties that make PCA work. Name them and say which earlier theorem each one unlocks.
32.4. What does the eigenvalue $\lambda_k$ of the covariance matrix represent, physically, about the data? What does the eigenvector $\mathbf{q}_k$ represent?
32.5. Give the two equivalent routes to the principal components — one through Chapter 27 and one through Chapter 30 — in a sentence each. Which is preferred for computation, and why?
32.6. Why are the principal components always mutually perpendicular? Name the chapter and theorem responsible.
32.7. Define the explained-variance ratio of the $k$-th component. What do these ratios sum to over all components, and why?
32.8. True or false, each with a one-line reason: (a) PCA can find structure in data that lies on a curved surface; (b) PCA is unaffected by rescaling one feature from meters to millimeters; (c) the first principal component is the direction of least variance; (d) the variance of the data projected onto a principal component equals that component's eigenvalue.
32.9. What is the difference between a score and a loading in PCA? Which describes a sample and which describes a feature?
32.10. When you reduce data to $k$ components and reconstruct it, what does the reconstruction error equal in terms of the eigenvalues you dropped?
Tier 2 — ⭐⭐ Hand computation
32.11. A dataset has three centered samples in $\mathbb{R}^2$: $(2, 0)$, $(-1, 1)$, $(-1, -1)$. Confirm the columns sum to zero, then compute the covariance matrix $C = \tfrac{1}{n-1}\tilde X^{\mathsf{T}}\tilde X$ (use $n - 1 = 2$).
32.12. For the covariance matrix $C = \begin{psmallmatrix}5 & 0\\ 0 & 2\end{psmallmatrix}$, write down the principal components and their variances by inspection (no computation needed — explain why). What is the explained-variance ratio of each?
32.13. Orthogonally diagonalize the covariance matrix $C = \begin{psmallmatrix}3 & 1\\ 1 & 3\end{psmallmatrix}$ by hand: find the eigenvalues, the orthonormal eigenvectors (the principal components), and verify they are perpendicular. Which direction is PC1?
32.14. Using your answer to 32.13, compute the explained-variance ratio of each component, and state how much variance you would retain by keeping only PC1.
32.15. Given the raw data $X = \begin{psmallmatrix}4 & 1\\ 2 & 3\\ 0 & 5\end{psmallmatrix}$, center it (compute the column means and subtract), then compute the covariance matrix.
32.16. A covariance matrix has eigenvalues $\lambda_1 = 9, \lambda_2 = 4, \lambda_3 = 1, \lambda_4 = 0.5$. (a) What is the total variance? (b) What fraction does PC1 explain? (c) How many components do you need to explain at least $90\%$ of the variance?
32.17. For the data in §32.5 (covariance $C = \begin{psmallmatrix}2.5 & 1.5\\ 1.5 & 2.5\end{psmallmatrix}$, components $\tfrac{1}{\sqrt2}(1,1)$ and $\tfrac{1}{\sqrt2}(1,-1)$), compute the score of the centered point $(2, 2)$ on each principal component by hand. Which component does it load on?
32.18. A centered data matrix $\tilde X$ has SVD with singular values $\sigma_1 = 6, \sigma_2 = 3$ and $n = 10$ samples. What are the two principal-component variances (eigenvalues of the covariance matrix)?
32.19. Compute the variance of the data along the direction $\mathbf{w} = \tfrac{1}{\sqrt5}(1, 2)$ for the covariance matrix $C = \begin{psmallmatrix}4 & 1\\ 1 & 2\end{psmallmatrix}$, using the quadratic form $\mathbf{w}^{\mathsf{T}}C\mathbf{w}$.
32.20. A quadratic-form fact you will use constantly: show by hand that for $C = \begin{psmallmatrix}2 & 1\\ 1 & 2\end{psmallmatrix}$, the variance $\mathbf{w}^{\mathsf{T}}C\mathbf{w}$ along the unit vector $\mathbf{w} = (\cos\theta, \sin\theta)$ equals $2 + \sin(2\theta)$, and confirm it is maximized at $\theta = 45°$.
Tier 3 — ⭐⭐⭐ Proof (A) / Coding (C)
32.21. [proof] Prove that the covariance matrix $C = \tfrac{1}{n-1}\tilde X^{\mathsf{T}}\tilde X$ is symmetric and positive semidefinite, and explain in one sentence each why each property is needed for PCA. Use the four-part proof shape (§10).
32.22. [proof] Prove that the maximum of the variance $\mathbf{w}^{\mathsf{T}}C\mathbf{w}$ over all unit vectors $\mathbf{w}$ equals the largest eigenvalue $\lambda_1$ of $C$, attained at the top eigenvector. (This is the variance-maximization theorem of §32.4 — reproduce the Rayleigh-quotient argument, stating where you use symmetry.)
32.23. [proof] Prove that the right singular vectors of the centered data matrix $\tilde X = U\Sigma V^{\mathsf{T}}$ are the eigenvectors of the covariance matrix $C$, and that the eigenvalues are $\lambda_i = \sigma_i^2 / (n-1)$. (Show $C = V\bigl(\tfrac{1}{n-1}\Sigma^2\bigr)V^{\mathsf{T}}$ and invoke uniqueness of the spectral decomposition.)
32.24. [proof] Prove that the variance of the data projected onto a unit principal component $\mathbf{q}_j$ equals its eigenvalue $\lambda_j$. (Hint: the projected values are $\tilde X\mathbf{q}_j$; compute their variance using $C\mathbf{q}_j = \lambda_j\mathbf{q}_j$.)
32.25. [code] Write a function covariance(X) that centers $X$ and returns $\tfrac{1}{n-1}\tilde X^{\mathsf{T}}\tilde X$. Verify it matches np.cov(X, rowvar=False) on a random $20\times4$ matrix.
32.26. [code] Implement PCA two ways on a random $50\times3$ dataset: (a) eigh of the covariance matrix, sorted descending; (b) np.linalg.svd of the centered data. Confirm the explained-variance ratios match to machine precision, and that the components agree up to sign.
32.27. [code] For a random $100\times5$ dataset, compute the explained-variance ratios and the cumulative explained variance. Print the smallest $k$ such that the top $k$ components explain at least $95\%$ of the variance.
32.28. [code] Demonstrate the centering pitfall: run "PCA" on a dataset without centering (use $X^{\mathsf{T}}X$ directly) and with centering, for data whose mean is far from the origin (e.g. add $(100, 100)$ to every point). Show that the uncentered top component points toward the mean while the centered one captures the true spread direction.
32.29. [code] Demonstrate the scaling pitfall: build a 2-feature dataset where feature 2 is feature 1 plus noise but multiplied by 1000. Run PCA on the raw data and on standardized data (each feature scaled to unit variance), and show how the top component changes.
Tier 4 — ⭐⭐⭐⭐ Application
32.30. [code] Reconstruction error. For a random $40\times6$ dataset, project onto the top $k$ components and reconstruct, for each $k = 1, \dots, 6$. Plot the squared reconstruction error against $k$, and verify it equals $(n-1)$ times the sum of the discarded eigenvalues at each $k$.
32.31. [code] Eigen-images. Generate 200 synthetic "images" of $10\times10$ pixels, each a random blend of 4 fixed prototype patterns plus small noise (as in §32.7.1). Run PCA, plot the scree of explained-variance ratios, and confirm the elbow falls at $k = 4$. Reshape the top 4 components back into $10\times10$ "eigen-images" and display them.
32.32. [code] Whitening. Take a correlated 2D dataset, compute its PCA, and produce the whitened data $\tilde X V\Lambda^{-1/2}$. Verify the whitened data has covariance equal to the identity (to tolerance), and scatter-plot the data before and after to show the tilted ellipse becoming a round cloud.
32.33. [code, toolkit] Implement pca(X, k) in toolkit/pca.py per the Build Your Toolkit callout: center, compute components via both the covariance route and the SVD route (and assert they match up to sign), sort descending, and return the top-$k$ components, their variances, the explained-variance ratio, and the projected scores. Verify your components against np.linalg.svd of the centered data and your explained-variance ratios against sklearn.decomposition.PCA(n_components=k).explained_variance_ratio_.
32.34. [application] Choosing $k$ honestly. You run PCA on a 50-feature survey dataset and find the cumulative explained variance reaches $80\%$ at $k=3$, $90\%$ at $k=8$, and $95\%$ at $k=20$. Write a short paragraph advising a colleague on how to choose $k$, naming at least two considerations beyond the raw threshold (e.g. interpretability, downstream task, the scree elbow).
32.35. [application] When not to use PCA. Describe a concrete dataset on which PCA would give a misleading or useless answer, explain why (linear assumption, scaling, or interpretability), and name an alternative technique that addresses the problem.