Chapter 19 — Exercises
Work the ⭐ and ⭐⭐ problems by hand first; reach for numpy only to check. The ⭐⭐⭐ problems split into proofs (math track) and coding (computational track), and the ⭐⭐⭐⭐ problems are open-ended applications. Throughout, $P = A(A^{\mathsf{T}}A)^{-1}A^{\mathsf{T}}$ denotes the projection matrix onto $C(A)$, and "projection" always means orthogonal projection unless stated otherwise.
⭐ Conceptual (warm-ups)
19.1 In one sentence each, say what the projection $\mathbf{p}$, the error $\mathbf{e}$, and the coefficient vector $\hat{\mathbf{x}}$ are when you project $\mathbf{b}$ onto $C(A)$. Which of the three is a scalar (or vector of scalars), and which live in the output space?
19.2 True or false, with a one-line reason: (a) the error $\mathbf{e}$ is orthogonal to every vector in $C(A)$; (b) the projection of a vector already in the subspace is the vector itself; (c) the projection matrix $P$ is invertible; (d) every idempotent matrix is an orthogonal projection.
19.3 A projection matrix satisfies $P^2 = P$. Explain in plain English why projecting twice gives the same result as projecting once. What does this say geometrically about where $P\mathbf{b}$ lands?
19.4 What condition must the matrix $A$ satisfy for $(A^{\mathsf{T}}A)^{-1}$ to exist? Give a small concrete $3\times 2$ matrix $A$ for which it fails, and say what goes wrong with the projection coefficients (not the projection itself).
19.5 The two defining properties of an orthogonal projection matrix are idempotence and symmetry. Which one fails for an oblique projection, and what does its failure mean for the error vector?
19.6 Why are the only possible eigenvalues of an orthogonal projection matrix $0$ and $1$? Tie each eigenvalue to a geometric fate of a vector under the projection.
19.7 Explain the sentence "least squares is orthogonal projection" to someone who has taken Chapter 17 but not this chapter. What subspace are we projecting onto, and what vector are we projecting?
⭐⭐ Computation (by hand, then check with numpy)
19.8 Project $\mathbf{b} = (5, 6)$ onto the line through $\mathbf{a} = (1, 0)$ (the $x$-axis). Find $\hat c$, $\mathbf{p}$, and $\mathbf{e}$, and verify $\mathbf{a}\cdot\mathbf{e} = 0$.
19.9 Project $\mathbf{b} = (3, 5)$ onto the line through $\mathbf{a} = (1, 1)$. Then find the projection matrix $P$ for this line, and confirm $P\mathbf{b}$ gives the same $\mathbf{p}$. What is $\operatorname{tr}(P)$, and why?
19.10 For the line through $\mathbf{a} = (1, 2, 2)$ in $\mathbb{R}^3$, write down the projection matrix $P = \mathbf{a}\mathbf{a}^{\mathsf{T}}/(\mathbf{a}^{\mathsf{T}}\mathbf{a})$ explicitly. Verify $P^2 = P$ on paper for at least one entry, and check $\operatorname{tr}(P) = 1$.
19.11 Project $\mathbf{b} = (1, 2, 3)$ onto the $xy$-plane, written as $C(A)$ with $A = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ 0 & 0 \end{bmatrix}$. Find $\hat{\mathbf{x}}$, $\mathbf{p}$, and $\mathbf{e}$, and identify which fundamental subspace $\mathbf{e}$ lives in.
19.12 Let $A = \begin{bmatrix} 1 & 0 \\ 1 & 1 \\ 1 & 2 \end{bmatrix}$ and $\mathbf{b} = (1, 1, 3)$. Solve the normal equations by hand to get $\hat{\mathbf{x}}$, then compute the residual $\mathbf{e}$ and the sum of squared errors $\lVert\mathbf{e}\rVert^2$. (This is fitting a line $y = c_0 + c_1 x$ to the points $(0,1),(1,1),(2,3)$.)
19.13 For the matrix $A$ of Exercise 19.12, compute the full projection matrix $P = A(A^{\mathsf{T}}A)^{-1}A^{\mathsf{T}}$. Confirm $P\mathbf{b}$ equals $A\hat{\mathbf{x}}$ and that $\operatorname{tr}(P) = 2$.
19.14 Project $\mathbf{b} = (1, 1, 1)$ onto the plane spanned by the orthonormal vectors $\mathbf{q}_1 = (1, 0, 0)$ and $\mathbf{q}_2 = (0, 1, 0)$ using the formula $\mathbf{p} = (\mathbf{q}_1\cdot\mathbf{b})\mathbf{q}_1 + (\mathbf{q}_2\cdot\mathbf{b})\mathbf{q}_2$. Why was no matrix inverse needed?
19.15 For the line through $\mathbf{a} = (1, 1)$, find the complementary projector $I - P$. Apply it to $\mathbf{b} = (3, 5)$ and confirm the result equals the error $\mathbf{e}$ from projecting $(3,5)$ onto the line. Onto what subspace does $I - P$ project?
19.16 Verify by hand that $A^{\mathsf{T}}A$ is symmetric for $A = \begin{bmatrix} 2 & 1 \\ 0 & 3 \\ 1 & 1 \end{bmatrix}$ by computing it and checking $(A^{\mathsf{T}}A)^{\mathsf{T}} = A^{\mathsf{T}}A$. Is it invertible? (Compute its determinant.)
⭐⭐⭐ Proof (math track)
19.17 (Proof) Prove that if $A$ has full column rank, then $N(A^{\mathsf{T}}A) = N(A)$, and hence $A^{\mathsf{T}}A$ is invertible. (Hint: for the hard direction, multiply $A^{\mathsf{T}}A\mathbf{x} = \mathbf{0}$ on the left by $\mathbf{x}^{\mathsf{T}}$ and recognize $\lVert A\mathbf{x}\rVert^2$.)
19.18 (Proof) Prove directly from $P = A(A^{\mathsf{T}}A)^{-1}A^{\mathsf{T}}$ that $P^{\mathsf{T}} = P$. State every transpose rule you use and where the symmetry of $A^{\mathsf{T}}A$ enters.
19.19 (Proof) Prove the closest-point theorem: if $\mathbf{p}$ is the orthogonal projection of $\mathbf{b}$ onto a subspace $S$, then $\lVert\mathbf{b} - \mathbf{y}\rVert > \lVert\mathbf{b} - \mathbf{p}\rVert$ for every $\mathbf{y}\in S$ with $\mathbf{y}\ne\mathbf{p}$. Identify exactly where you use the Pythagorean theorem and where you use that $S$ is a subspace.
19.20 (Proof) Show that if $P$ is an orthogonal projection ($P^2 = P$, $P^{\mathsf{T}} = P$), then $I - P$ is also an orthogonal projection, and that $P(I - P) = 0$. Interpret $P(I-P) = 0$ geometrically.
19.21 (Proof) Prove that an orthogonal projection $P$ (other than the identity) is singular. Then prove its eigenvalues are all $0$ or $1$. (Hint for the eigenvalue part: if $P\mathbf{v} = \lambda\mathbf{v}$ with $\mathbf{v}\ne\mathbf{0}$, apply $P$ again and use $P^2 = P$ to get $\lambda^2 = \lambda$.)
19.22 (Proof) Let $\mathbf{q}_1, \dots, \mathbf{q}_n$ be orthonormal and $Q$ the matrix with these columns. Prove that $Q^{\mathsf{T}}Q = I_n$ and hence that the projection matrix onto $C(Q)$ is $P = QQ^{\mathsf{T}}$. Then show $\mathbf{p} = QQ^{\mathsf{T}}\mathbf{b}$ expands to $\sum_i (\mathbf{q}_i\cdot\mathbf{b})\mathbf{q}_i$. (Note that $QQ^{\mathsf{T}} \ne I_m$ in general when $n < m$ — explain why not.)
⭐⭐⭐ Coding (computational track)
19.23 (Code) Finish the toolkit/projection.py functions project_onto(b, A) and projection_matrix(A) (pure Python, reusing matmul/transpose and your gaussian_elimination). Write a test that checks, for $A = \begin{bmatrix} 1 & 0 \\ 1 & 1 \\ 1 & 2 \end{bmatrix}$: (a) project_onto([6,0,0], A) == [5, 2, -1]; (b) your P matches A @ np.linalg.inv(A.T @ A) @ A.T; (c) P @ P equals P; (d) P.T equals P. Then feed it a rank-deficient $A$ (two equal columns) and confirm your solver raises rather than returning nonsense.
19.24 (Code) Write a function distance_to_subspace(b, A) that returns the distance from b to $C(A)$ — i.e. $\lVert\mathbf{b} - P\mathbf{b}\rVert$. Verify the closest-point theorem empirically: generate 1000 random points $\mathbf{y} = A\mathbf{x}$ in the subspace and confirm none is closer to b than the projection. Report the smallest random distance found and compare it to the true minimum.
19.25 (Code) Using toolkit/visualizer.py from Chapter 1, plot the action of the projection matrix onto the line through $\mathbf{a} = (3, 4)$. Confirm visually that the unit square collapses onto the line, print $\det(P)$ (should be $0$) and the eigenvalues (should be $\{0, 1\}$), and overlay the projection of $\mathbf{b} = (1, 4)$ to show the perpendicular drop.
⭐⭐⭐⭐ Application (open-ended)
19.26 (Application — data science) You measure a sensor's raw output $r$ at known reference values and want a calibration line $y = c_0 + c_1 r$. Given the data $r = (1,2,3,4,5)$ and references $y = (1.2, 1.9, 3.2, 3.9, 5.3)$, build the design matrix, solve the least-squares problem two ways (normal equations and np.linalg.lstsq), and report the fitted coefficients, the residual vector, and the coefficient of determination $R^2$. Then interpret: is the sensor reading high or low, and roughly by how much per unit? (Cross-check your numbers against Case Study 1.)
19.27 (Application — signals) A measured signal is contaminated by a known constant (DC) offset and a known low-frequency hum. Model the contaminants as the columns of a matrix $A$, build $P$, and use $I - P$ to remove them. Demonstrate on a synthetic signal: add a constant and a sinusoid of one frequency to a "clean" sinusoid of a different frequency, then show that $(I-P)\mathbf{y}$ recovers the clean component to near machine precision. Explain why the recovery is exact when the clean component is orthogonal to the contaminant subspace, and only approximate otherwise.
19.28 (Application — your choice) Pick any domain (economics factor models, recommender systems, computer graphics, geodesy) and write a half-page explaining one real use of orthogonal projection there. Identify the data vector $\mathbf{b}$, the subspace $C(A)$, what the projection $\mathbf{p}$ represents, and what the error $\mathbf{e}$ represents. End by stating which would break if the columns of $A$ were not independent.