Quiz — Chapter 20: Gram-Schmidt and QR Decomposition

Q: Gram-Schmidt is best described as which earlier operation, applied repeatedly?

Orthogonal projection (Chapter 19). Each step subtracts a vector's projection onto the directions already built, leaving a perpendicular remainder. Gram-Schmidt is repeated projection — it contains no new mathematics beyond the projection theorem.

Q: What is the precise condition on the input vectors for Gram-Schmidt to succeed (produce an orthonormal basis of the same size)?

The vectors must be linearly independent. If they are dependent, some remainder is exactly , and you cannot normalize the zero vector — the process signals the dependence by failing at that step.

Q: In the factorization , what two properties define and ?

has orthonormal columns (so ) and is upper-triangular. In the canonical form, additionally has a positive diagonal, which makes the factorization unique.

Q: Why is upper-triangular and not lower-triangular?

Because Gram-Schmidt processes vectors in order, and the -th original vector is rebuilt using only — never a later . "No looking ahead" puts zeros below the diagonal; the sequential structure is the source of triangularity.

Q: You compute QR by hand and get a positive diagonal in ; numpy gives one negative diagonal entry. Who is right?

Both. QR is unique only up to the signs of 's columns (and matching rows of ); numpy makes no positive-diagonal promise. Flip the sign of any -column whose is negative — and flip that row of — to reach the canonical form and match.

Q: What is the QR-based formula for the least-squares solution, and how is it solved?

, solved by back substitution (one upward pass, since is upper-triangular). Same projection-onto-the-column-space answer as Chapter 17, but computed without forming .

Q: Why is solving least squares via QR more numerically stable than the normal equations ?

Forming squares the condition number (), amplifying roundoff; QR never forms it, so its conditioning stays at . For correlated predictors, can destroy half your significant digits before you even solve.

Q: For a tall matrix (), what are the shapes of and in the *reduced* QR (numpy's default)?

is (orthonormal columns) and is (square, upper-triangular). The full QR instead gives an orthogonal and an ; ask numpy for `mode='complete'` to get it.

Q: Classical and modified Gram-Schmidt produce the same answer in exact arithmetic. Why prefer modified in practice?

In floating point, classical GS loses orthogonality when the inputs are nearly dependent (catastrophic cancellation); modified GS — subtracting each shadow from the running remainder — stays accurate. Same formula, different order of operations, vastly different stability.

Q: What does subtracting a vector's projection onto the all-ones direction do to that vector, statistically?

It centers the vector (subtracts its mean), so the result sums to zero. Orthogonality to means , i.e. the entries sum to zero — the geometric and statistical statements coincide, which is why PCA centers data first.

DataField.Dev

Quiz — Chapter 20: Gram-Schmidt and QR Decomposition

Twelve conceptual checks. Try each before opening the answer. One-line explanations follow each answer.

Q1. Gram-Schmidt is best described as which earlier operation, applied repeatedly?

Answer

**Orthogonal projection** (Chapter 19). Each step subtracts a vector's projection onto the directions already built, leaving a perpendicular remainder. *Gram-Schmidt is repeated projection — it contains no new mathematics beyond the projection theorem.*

Q2. What is the precise condition on the input vectors for Gram-Schmidt to succeed (produce an orthonormal basis of the same size)?

Answer

The vectors must be **linearly independent**. *If they are dependent, some remainder $\mathbf{u}_k$ is exactly $\mathbf{0}$, and you cannot normalize the zero vector — the process signals the dependence by failing at that step.*

Q3. In the factorization $A = QR$, what two properties define $Q$ and $R$?

Answer

$Q$ has **orthonormal columns** (so $Q^{\mathsf{T}}Q = I$) and $R$ is **upper-triangular**. *In the canonical form, $R$ additionally has a positive diagonal, which makes the factorization unique.*

Q4. Why is $R$ upper-triangular and not lower-triangular?

Answer

Because Gram-Schmidt processes vectors *in order*, and the $k$-th original vector $\mathbf{a}_k$ is rebuilt using only $\mathbf{q}_1, \dots, \mathbf{q}_k$ — never a later $\mathbf{q}$. *"No looking ahead" puts zeros below the diagonal; the sequential structure is the source of triangularity.*

Q5. You compute QR by hand and get a positive diagonal in $R$; numpy gives one negative diagonal entry. Who is right?

Answer

**Both.** QR is unique only up to the signs of $Q$'s columns (and matching rows of $R$); numpy makes no positive-diagonal promise. *Flip the sign of any $Q$-column whose $R_{kk}$ is negative — and flip that row of $R$ — to reach the canonical form and match.*

Q6. What is the QR-based formula for the least-squares solution, and how is it solved?

Answer

$R\hat{\mathbf{x}} = Q^{\mathsf{T}}\mathbf{b}$, solved by **back substitution** (one upward pass, since $R$ is upper-triangular). *Same projection-onto-the-column-space answer as Chapter 17, but computed without forming $A^{\mathsf{T}}A$.*

Q7. Why is solving least squares via QR more numerically stable than the normal equations $A^{\mathsf{T}}A\hat{\mathbf{x}} = A^{\mathsf{T}}\mathbf{b}$?

Answer

Forming $A^{\mathsf{T}}A$ **squares the condition number** ($\kappa \to \kappa^2$), amplifying roundoff; QR never forms it, so its conditioning stays at $\kappa$. *For correlated predictors, $\kappa^2$ can destroy half your significant digits before you even solve.*

Q8. For a tall $m \times n$ matrix ($m > n$), what are the shapes of $Q$ and $R$ in the reduced QR (numpy's default)?

Answer

$Q$ is $m \times n$ (orthonormal columns) and $R$ is $n \times n$ (square, upper-triangular). *The **full** QR instead gives an $m \times m$ orthogonal $Q$ and an $m \times n$ $R$; ask numpy for `mode='complete'` to get it.*

Q9. Classical and modified Gram-Schmidt produce the same answer in exact arithmetic. Why prefer modified in practice?

Answer

In floating point, **classical GS loses orthogonality** when the inputs are nearly dependent (catastrophic cancellation); modified GS — subtracting each shadow from the *running* remainder — stays accurate. *Same formula, different order of operations, vastly different stability.*

Q10. What does subtracting a vector's projection onto the all-ones direction $\mathbf{1} = (1,1,\dots,1)$ do to that vector, statistically?

Answer

It **centers** the vector (subtracts its mean), so the result sums to zero. *Orthogonality to $\mathbf{1}$ means $\mathbf{1}\cdot\mathbf{u} = 0$, i.e. the entries sum to zero — the geometric and statistical statements coincide, which is why PCA centers data first.*

Q11. Iterating QR — factor $A = QR$, then form $RQ$, factor again, repeat — converges (under mild conditions) to a matrix whose diagonal reveals what?

Answer

The **eigenvalues** of $A$. *This is the QR algorithm (Chapters 24, 29); each step $RQ = Q^{\mathsf{T}}AQ$ is a similarity transformation, so eigenvalues are preserved while the matrix is rotated toward upper-triangular form.*

Q12. Does every finite-dimensional subspace of $\mathbb{R}^m$ have an orthonormal basis? How do you know?

Answer

**Yes.** Take any basis (Chapter 15 guarantees one) and run Gram-Schmidt. *The process is constructive — it not only proves existence but hands you the basis, which is why later results like the Spectral Theorem and SVD may freely assume orthonormal bases exist.*