Chapter 19 — Quiz
Twelve conceptual checks on orthogonal projection. Answer before expanding each solution. These are understanding questions, not computation drills — if you can explain the why, the formulas follow.
Q1. When you project $\mathbf{b}$ orthogonally onto a subspace $S$, what is the defining geometric property of the error vector $\mathbf{e} = \mathbf{b} - \mathbf{p}$?
Answer
The error $\mathbf{e}$ is **orthogonal to the subspace** $S$ — perpendicular to *every* vector in it, not just to some of them. Algebraically, if $S = C(A)$, this is the normal equation $A^{\mathsf{T}}\mathbf{e} = \mathbf{0}$. This single property is what defines the orthogonal projection and what forces $\mathbf{p}$ to be the closest point.Q2. Why is the formula $\mathbf{p} = (\mathbf{a}\cdot\mathbf{b})\,\mathbf{a}$ wrong for projecting $\mathbf{b}$ onto the line through $\mathbf{a}$, and when does it accidentally work?
Answer
It is missing the normalization: the correct formula is $\mathbf{p} = \dfrac{\mathbf{a}\cdot\mathbf{b}}{\mathbf{a}\cdot\mathbf{a}}\,\mathbf{a}$. It works only when $\mathbf{a}\cdot\mathbf{a} = 1$, i.e. when $\mathbf{a}$ is a **unit vector**. The denominator $\mathbf{a}\cdot\mathbf{a} = \lVert\mathbf{a}\rVert^2$ rescales for the length of $\mathbf{a}$; forgetting it overshoots by a factor of $\lVert\mathbf{a}\rVert^2$.Q3. The projection matrix satisfies $P^2 = P$. State this property's name and explain why it holds geometrically.
Answer
It is **idempotence**. Geometrically: projecting $\mathbf{b}$ lands it inside the subspace; projecting an already-in-subspace vector leaves it fixed (it is its own closest point). So projecting a second time does nothing, and $P(P\mathbf{b}) = P\mathbf{b}$ for all $\mathbf{b}$, i.e. $P^2 = P$.Q4. Both $M = \begin{bmatrix} 1 & 1 \\ 0 & 0 \end{bmatrix}$ and an orthogonal projection $P$ satisfy $X^2 = X$. What distinguishes the orthogonal projection, and why does the distinction matter?
Answer
The orthogonal projection is **symmetric** ($P^{\mathsf{T}} = P$); the matrix $M$ is not. $M$ is an *oblique* projection — it still flattens onto a line, but along a slanted direction, so its error is *not* perpendicular to the target subspace and its result is *not* the closest point. Symmetry is exactly the condition that makes a projection orthogonal (drop a true perpendicular) and hence distance-minimizing.Q5. Why is an orthogonal projection matrix $P$ (other than $I$) never invertible?
Answer
A projection throws information away: the entire orthogonal complement of the subspace is collapsed to $\mathbf{0}$, so distinct vectors can share a projection and $P$ cannot be undone. Algebraically, $\det(P) = 0$ because $P$ has eigenvalue $0$ (any vector perpendicular to the subspace maps to $\mathbf{0}$). The only invertible projection is $I$ itself, which projects onto the whole space and discards nothing.Q6. What condition on the columns of $A$ guarantees that $(A^{\mathsf{T}}A)^{-1}$ exists, and what breaks if it fails?
Answer
$A$ must have **full column rank** — its columns must be linearly independent. If they are dependent (e.g. collinear features), $A^{\mathsf{T}}A$ is singular, the inverse does not exist, and the normal equations have infinitely many solutions $\hat{\mathbf{x}}$. Note: the *projection* $\mathbf{p}$ still exists and is unique (the closest point always exists); only the *coefficients* become non-unique.Q7. In what precise sense is least-squares regression an orthogonal projection?
Answer
As $\mathbf{x}$ ranges over $\mathbb{R}^n$, the predictions $A\mathbf{x}$ range over the column space $C(A)$. Minimizing $\lVert A\mathbf{x} - \mathbf{b}\rVert$ therefore means finding the point of $C(A)$ closest to the data vector $\mathbf{b}$ — which *is* the orthogonal projection $\mathbf{p} = P\mathbf{b}$. The fitted values are $\hat{\mathbf{b}} = P\mathbf{b}$ (so $P$ is called the **hat matrix**), the residuals are the orthogonal error, and "residuals uncorrelated with predictors" is exactly $A^{\mathsf{T}}\mathbf{e} = \mathbf{0}$.Q8. Why is projecting onto an orthonormal basis so much simpler than projecting onto a general basis?
Answer
For an orthonormal basis, $Q^{\mathsf{T}}Q = I$, so the inverse in $P = Q(Q^{\mathsf{T}}Q)^{-1}Q^{\mathsf{T}}$ disappears, leaving $P = QQ^{\mathsf{T}}$. The projection becomes $\mathbf{p} = \sum_i (\mathbf{q}_i\cdot\mathbf{b})\,\mathbf{q}_i$ — each coefficient is just a dot product, and the one-dimensional projections simply add because the orthogonal directions do not interfere. No system to solve. This is why Gram–Schmidt (Chapter 20) bothers to manufacture orthonormal bases.Q9. The trace of a projection matrix $P$ equals what geometric quantity, and why?
Answer
$\operatorname{tr}(P) = \dim S$, the **dimension of the subspace** projected onto. A projection's eigenvalues are all $0$ or $1$; the number of $1$'s is the dimension of the kept subspace, and the trace equals the sum of the eigenvalues. So the trace literally counts the surviving (eigenvalue-$1$) directions.Q10. If $P$ projects onto $S$, what does $I - P$ do, and how are $P$ and $I - P$ related?
Answer
$I - P$ is the orthogonal projection onto the **orthogonal complement** $S^{\perp}$ — it returns the error $\mathbf{e} = \mathbf{b} - \mathbf{p}$. The two are *complementary*, not inverse: $P + (I - P) = I$ splits every $\mathbf{b}$ into its in-$S$ part and its perpendicular part, and $P(I - P) = 0$. Neither recovers what the other discards; together they partition $\mathbf{b}$ into two perpendicular pieces.Q11. A vector $\mathbf{b}$ is decomposed as $\mathbf{b} = \mathbf{p} + \mathbf{e}$ via projection onto $C(A)$. Which two of the four fundamental subspaces do $\mathbf{p}$ and $\mathbf{e}$ live in, and what is the relationship between those subspaces?
Answer
$\mathbf{p}$ lives in the **column space** $C(A)$; the error $\mathbf{e}$ lives in the **left null space** $N(A^{\mathsf{T}})$ (since $A^{\mathsf{T}}\mathbf{e} = \mathbf{0}$). These two subspaces are **orthogonal complements** in $\mathbb{R}^m$ — perpendicular to each other and together filling the whole output space — which is exactly why the decomposition $\mathbf{b} = \mathbf{p} + \mathbf{e}$ exists and is unique.Q12. When you project $\mathbf{b}$ onto a subspace, the squared lengths satisfy $\lVert\mathbf{b}\rVert^2 = \lVert\mathbf{p}\rVert^2 + \lVert\mathbf{e}\rVert^2$. What theorem is this, and why does it hold here?