Chapter 19 — Quiz

Q: When you project orthogonally onto a subspace , what is the defining geometric property of the error vector ?

The error is orthogonal to the subspace — perpendicular to every vector in it, not just to some of them. Algebraically, if , this is the normal equation . This single property is what defines the orthogonal projection and what forces to be the closest point.

Q: Why is the formula *wrong* for projecting onto the line through , and when does it accidentally work?

It is missing the normalization: the correct formula is . It works only when , i.e. when is a unit vector. The denominator rescales for the length of ; forgetting it overshoots by a factor of .

Q: The projection matrix satisfies . State this property's name and explain why it holds geometrically.

It is idempotence. Geometrically: projecting lands it inside the subspace; projecting an already-in-subspace vector leaves it fixed (it is its own closest point). So projecting a second time does nothing, and for all , i.e. .

Q: The trace of a projection matrix equals what geometric quantity, and why?

, the dimension of the subspace projected onto. A projection's eigenvalues are all or ; the number of 's is the dimension of the kept subspace, and the trace equals the sum of the eigenvalues. So the trace literally counts the surviving (eigenvalue-) directions.

Q: If projects onto , what does do, and how are and related?

is the orthogonal projection onto the orthogonal complement — it returns the error . The two are complementary, not inverse: splits every into its in- part and its perpendicular part, and . Neither recovers what the other discards; together they partition into two perpendicular pieces.

Q: A vector is decomposed as via projection onto . Which two of the four fundamental subspaces do and live in, and what is the relationship between those subspaces?

lives in the column space ; the error lives in the left null space (since ). These two subspaces are orthogonal complements in — perpendicular to each other and together filling the whole output space — which is exactly why the decomposition exists and is unique.

Q: When you project onto a subspace, the squared lengths satisfy . What theorem is this, and why does it hold here?

It is the Pythagorean theorem, and it holds because and are orthogonal (, since and ). Expanding , the cross term vanishes. This is the same orthogonality that drives the closest-point proof, and in statistics it is the decomposition of total variation into "explained" plus "residual."

DataField.Dev

Chapter 19 — Quiz

Twelve conceptual checks on orthogonal projection. Answer before expanding each solution. These are understanding questions, not computation drills — if you can explain the why, the formulas follow.

Q1. When you project $\mathbf{b}$ orthogonally onto a subspace $S$, what is the defining geometric property of the error vector $\mathbf{e} = \mathbf{b} - \mathbf{p}$?

Answer

The error $\mathbf{e}$ is **orthogonal to the subspace** $S$ — perpendicular to *every* vector in it, not just to some of them. Algebraically, if $S = C(A)$, this is the normal equation $A^{\mathsf{T}}\mathbf{e} = \mathbf{0}$. This single property is what defines the orthogonal projection and what forces $\mathbf{p}$ to be the closest point.

Q2. Why is the formula $\mathbf{p} = (\mathbf{a}\cdot\mathbf{b})\,\mathbf{a}$ wrong for projecting $\mathbf{b}$ onto the line through $\mathbf{a}$, and when does it accidentally work?

Answer

It is missing the normalization: the correct formula is $\mathbf{p} = \dfrac{\mathbf{a}\cdot\mathbf{b}}{\mathbf{a}\cdot\mathbf{a}}\,\mathbf{a}$. It works only when $\mathbf{a}\cdot\mathbf{a} = 1$, i.e. when $\mathbf{a}$ is a **unit vector**. The denominator $\mathbf{a}\cdot\mathbf{a} = \lVert\mathbf{a}\rVert^2$ rescales for the length of $\mathbf{a}$; forgetting it overshoots by a factor of $\lVert\mathbf{a}\rVert^2$.

Q3. The projection matrix satisfies $P^2 = P$. State this property's name and explain why it holds geometrically.

Answer

It is **idempotence**. Geometrically: projecting $\mathbf{b}$ lands it inside the subspace; projecting an already-in-subspace vector leaves it fixed (it is its own closest point). So projecting a second time does nothing, and $P(P\mathbf{b}) = P\mathbf{b}$ for all $\mathbf{b}$, i.e. $P^2 = P$.

Q4. Both $M = \begin{bmatrix} 1 & 1 \\ 0 & 0 \end{bmatrix}$ and an orthogonal projection $P$ satisfy $X^2 = X$. What distinguishes the orthogonal projection, and why does the distinction matter?

Answer

The orthogonal projection is **symmetric** ($P^{\mathsf{T}} = P$); the matrix $M$ is not. $M$ is an *oblique* projection — it still flattens onto a line, but along a slanted direction, so its error is *not* perpendicular to the target subspace and its result is *not* the closest point. Symmetry is exactly the condition that makes a projection orthogonal (drop a true perpendicular) and hence distance-minimizing.

Q5. Why is an orthogonal projection matrix $P$ (other than $I$) never invertible?

Answer

A projection throws information away: the entire orthogonal complement of the subspace is collapsed to $\mathbf{0}$, so distinct vectors can share a projection and $P$ cannot be undone. Algebraically, $\det(P) = 0$ because $P$ has eigenvalue $0$ (any vector perpendicular to the subspace maps to $\mathbf{0}$). The only invertible projection is $I$ itself, which projects onto the whole space and discards nothing.

Q6. What condition on the columns of $A$ guarantees that $(A^{\mathsf{T}}A)^{-1}$ exists, and what breaks if it fails?

Answer

$A$ must have **full column rank** — its columns must be linearly independent. If they are dependent (e.g. collinear features), $A^{\mathsf{T}}A$ is singular, the inverse does not exist, and the normal equations have infinitely many solutions $\hat{\mathbf{x}}$. Note: the *projection* $\mathbf{p}$ still exists and is unique (the closest point always exists); only the *coefficients* become non-unique.

Q7. In what precise sense is least-squares regression an orthogonal projection?

Answer

As $\mathbf{x}$ ranges over $\mathbb{R}^n$, the predictions $A\mathbf{x}$ range over the column space $C(A)$. Minimizing $\lVert A\mathbf{x} - \mathbf{b}\rVert$ therefore means finding the point of $C(A)$ closest to the data vector $\mathbf{b}$ — which *is* the orthogonal projection $\mathbf{p} = P\mathbf{b}$. The fitted values are $\hat{\mathbf{b}} = P\mathbf{b}$ (so $P$ is called the **hat matrix**), the residuals are the orthogonal error, and "residuals uncorrelated with predictors" is exactly $A^{\mathsf{T}}\mathbf{e} = \mathbf{0}$.

Q8. Why is projecting onto an orthonormal basis so much simpler than projecting onto a general basis?

Answer

For an orthonormal basis, $Q^{\mathsf{T}}Q = I$, so the inverse in $P = Q(Q^{\mathsf{T}}Q)^{-1}Q^{\mathsf{T}}$ disappears, leaving $P = QQ^{\mathsf{T}}$. The projection becomes $\mathbf{p} = \sum_i (\mathbf{q}_i\cdot\mathbf{b})\,\mathbf{q}_i$ — each coefficient is just a dot product, and the one-dimensional projections simply add because the orthogonal directions do not interfere. No system to solve. This is why Gram–Schmidt (Chapter 20) bothers to manufacture orthonormal bases.

Q9. The trace of a projection matrix $P$ equals what geometric quantity, and why?

Answer

$\operatorname{tr}(P) = \dim S$, the **dimension of the subspace** projected onto. A projection's eigenvalues are all $0$ or $1$; the number of $1$'s is the dimension of the kept subspace, and the trace equals the sum of the eigenvalues. So the trace literally counts the surviving (eigenvalue-$1$) directions.

Q10. If $P$ projects onto $S$, what does $I - P$ do, and how are $P$ and $I - P$ related?

Answer

$I - P$ is the orthogonal projection onto the **orthogonal complement** $S^{\perp}$ — it returns the error $\mathbf{e} = \mathbf{b} - \mathbf{p}$. The two are *complementary*, not inverse: $P + (I - P) = I$ splits every $\mathbf{b}$ into its in-$S$ part and its perpendicular part, and $P(I - P) = 0$. Neither recovers what the other discards; together they partition $\mathbf{b}$ into two perpendicular pieces.

Q11. A vector $\mathbf{b}$ is decomposed as $\mathbf{b} = \mathbf{p} + \mathbf{e}$ via projection onto $C(A)$. Which two of the four fundamental subspaces do $\mathbf{p}$ and $\mathbf{e}$ live in, and what is the relationship between those subspaces?

Answer

$\mathbf{p}$ lives in the **column space** $C(A)$; the error $\mathbf{e}$ lives in the **left null space** $N(A^{\mathsf{T}})$ (since $A^{\mathsf{T}}\mathbf{e} = \mathbf{0}$). These two subspaces are **orthogonal complements** in $\mathbb{R}^m$ — perpendicular to each other and together filling the whole output space — which is exactly why the decomposition $\mathbf{b} = \mathbf{p} + \mathbf{e}$ exists and is unique.

Q12. When you project $\mathbf{b}$ onto a subspace, the squared lengths satisfy $\lVert\mathbf{b}\rVert^2 = \lVert\mathbf{p}\rVert^2 + \lVert\mathbf{e}\rVert^2$. What theorem is this, and why does it hold here?

Answer

It is the **Pythagorean theorem**, and it holds because $\mathbf{p}$ and $\mathbf{e}$ are orthogonal ($\mathbf{p}\cdot\mathbf{e} = 0$, since $\mathbf{p}\in S$ and $\mathbf{e}\perp S$). Expanding $\lVert\mathbf{p} + \mathbf{e}\rVert^2 = \lVert\mathbf{p}\rVert^2 + 2\,\mathbf{p}\cdot\mathbf{e} + \lVert\mathbf{e}\rVert^2$, the cross term vanishes. This is the same orthogonality that drives the closest-point proof, and in statistics it is the decomposition of total variation into "explained" plus "residual."