Exercises — Chapter 20: Gram-Schmidt and QR Decomposition
Problems are graded by difficulty: ⭐ conceptual · ⭐⭐ computation by hand · ⭐⭐⭐ proof (math) or coding · ⭐⭐⭐⭐ application. Problems marked [proof] ask for a rigorous argument; [code] ask for a short numpy program. Throughout, "orthonormalize" means run Gram-Schmidt and normalize, and "QR" means the reduced factorization with a positive $R$-diagonal unless stated otherwise.
⭐ Conceptual
20.1 In one sentence, state what each Gram-Schmidt step does to a new vector, in terms of the projection idea from Chapter 19.
20.2 True or false, with a one-line reason each: (a) Gram-Schmidt works on any spanning set, independent or not. (b) In $A = QR$, the matrix $R$ is always square. (c) If $Q$ has orthonormal columns then $Q^{\mathsf{T}}Q = I$. (d) If $Q$ has orthonormal columns then $QQ^{\mathsf{T}} = I$.
20.3 Explain why the matrix $R$ in $A = QR$ comes out upper-triangular rather than lower-triangular. Tie your answer to the order in which Gram-Schmidt processes the vectors.
20.4 Why is the QR factorization not unique? Describe the full family of QR factorizations of a fixed matrix $A$ (with independent columns), and state the standard extra condition that makes it unique.
20.5 A colleague computes QR in numpy and finds the first column of their $Q$ is the negative of yours, though both factor the same $A$. Are either of you wrong? Explain.
20.6 In words, why does building an orthonormal basis make finding a vector's coordinates a matter of dot products rather than solving a linear system?
20.7 What does a small diagonal entry $R_{kk}$ tell you about the original columns of $A$? Why is it a warning sign for numerical computation?
⭐⭐ Computation (by hand)
20.8 Orthonormalize the two vectors $\mathbf{a}_1 = (3, 0)$ and $\mathbf{a}_2 = (1, 2)$ in $\mathbb{R}^2$. Give $\mathbf{q}_1$ and $\mathbf{q}_2$, and verify $\mathbf{q}_1 \cdot \mathbf{q}_2 = 0$.
20.9 Orthonormalize $\mathbf{a}_1 = (1, 1, 1)$ and $\mathbf{a}_2 = (1, 0, -1)$ in $\mathbb{R}^3$. (Notice $\mathbf{a}_1 \cdot \mathbf{a}_2 = 0$ already — what does that mean for Step 2?)
20.10 Orthonormalize the three vectors $\mathbf{a}_1 = (2, 0, 0)$, $\mathbf{a}_2 = (1, 1, 0)$, $\mathbf{a}_3 = (1, 1, 1)$. Then write down the full $3 \times 3$ matrix $R$ in $A = QR$ by reading off the lengths and dot products.
20.11 For the matrix $A = \begin{bmatrix} 1 & 1 \\ 1 & -1 \\ 1 & 1 \\ 1 & -1 \end{bmatrix}$, find the reduced QR factorization by hand. (Hint: the two columns are already orthogonal — only normalization and reading off $R$ remain.)
20.12 Using your QR from the running example in §20.6, where $R = \begin{bmatrix} \sqrt2 & 1/\sqrt2 & 1/\sqrt2 \\ 0 & \sqrt6/2 & 1/\sqrt6 \\ 0 & 0 & 2/\sqrt3 \end{bmatrix}$, compute $\det(A)$ two ways: (a) as the product of $R$'s diagonal entries, and (b) directly as the determinant of the original $3\times3$ matrix with columns $(1,1,0),(1,0,1),(0,1,1)$. Confirm they agree in absolute value.
20.13 Let $A = \begin{bmatrix} 1 & 0 \\ 1 & 1 \\ 1 & 2 \\ 1 & 3 \end{bmatrix}$ (the design matrix of §20.7) with $Q, R$ as computed there. For data $\mathbf{b} = (2, 2, 4, 5)$, solve $R\hat{\mathbf{x}} = Q^{\mathsf{T}}\mathbf{b}$ by back substitution to find the best-fit line $y = c_0 + c_1 t$.
20.14 Given $R = \begin{bmatrix} 2 & 4 & 2 \\ 0 & 3 & 6 \\ 0 & 0 & 5 \end{bmatrix}$ and $Q^{\mathsf{T}}\mathbf{b} = (8, 9, 5)$, solve $R\hat{\mathbf{x}} = Q^{\mathsf{T}}\mathbf{b}$ by back substitution. (Practice the triangular solve in isolation.)
20.15 Show by hand that the vectors $\mathbf{q}_1 = \tfrac{1}{\sqrt2}(1,1,0)$, $\mathbf{q}_2 = \tfrac{1}{\sqrt6}(1,-1,2)$, $\mathbf{q}_3 = \tfrac{1}{\sqrt3}(-1,1,1)$ from §20.3 are orthonormal by computing all three pairwise dot products and all three norms.
⭐⭐⭐ Proof (math track)
20.16 [proof] Prove that if $\mathbf{q}_1, \dots, \mathbf{q}_k$ are orthonormal, then for any vector $\mathbf{a}$, the vector $\mathbf{u} = \mathbf{a} - \sum_{i=1}^{k}(\mathbf{q}_i\cdot\mathbf{a})\mathbf{q}_i$ is orthogonal to every $\mathbf{q}_j$ ($j \le k$). (This is the single fact the Gram-Schmidt proof of §20.4 leans on.)
20.17 [proof] Prove that the columns of a matrix $Q$ are orthonormal if and only if $Q^{\mathsf{T}}Q = I$. (Interpret the $(i,j)$ entry of $Q^{\mathsf{T}}Q$ as a dot product.)
20.18 [proof] Suppose $A = QR$ is a reduced QR factorization with $R$ having a positive diagonal, and $A = Q'R'$ is another with $R'$ also positive-diagonal. Prove $Q = Q'$ and $R = R'$ (uniqueness). (Hint: $Q^{\mathsf{T}}Q' = R(R')^{-1}$ is simultaneously orthogonal and upper-triangular with positive diagonal; show such a matrix must be $I$.)
20.19 [proof] Prove that if $A$ is $m \times n$ with linearly independent columns, then $A^{\mathsf{T}}A$ is invertible, and use $A = QR$ to show $A^{\mathsf{T}}A = R^{\mathsf{T}}R$. (This is the Cholesky factorization of $A^{\mathsf{T}}A$; you have produced it for free.)
20.20 [proof] Show that for a square matrix $Q$ with orthonormal columns, $\det(Q) = \pm 1$. Conclude that for a square $A = QR$, $|\det(A)| = \prod_k |R_{kk}|$.
⭐⭐⭐ Coding (computation track)
20.21 [code] Implement classical Gram-Schmidt as a function classical_gs(A) taking a numpy array whose columns are the input vectors and returning Q. Test it on the §20.3 example and verify Q.T @ Q is close to the identity.
20.22 [code] Implement modified Gram-Schmidt as modified_gs(A) (subtract each shadow from the running remainder). On the near-dependent matrix from §20.11 of the chapter (with eps = 1e-8), print max|Q^T Q - I| for both your classical and modified versions and confirm modified is dramatically better.
20.23 [code] Write qr_from_gs(A) returning (Q, R) built from your modified Gram-Schmidt, reading $R$'s diagonal from remainder lengths and above-diagonal from dot products. Verify max|A - Q@R| is near machine zero, then compare to np.linalg.qr(A) after fixing signs with np.sign(np.diag(R)).
⭐⭐⭐⭐ Application
20.24 [code] Polynomial fitting, stable vs. unstable. Generate 30 points from $y = \sin(2\pi t)$ on $[0,1]$ with small noise. Fit a degree-10 polynomial two ways: (a) build the Vandermonde design matrix and solve the normal equations $A^{\mathsf{T}}A\hat{\mathbf{x}} = A^{\mathsf{T}}\mathbf{b}$; (b) solve via QR, $R\hat{\mathbf{x}} = Q^{\mathsf{T}}\mathbf{b}$. Print cond(A) and cond(A.T@A), and compare the residuals. Comment on what you observe as the degree rises.
20.25 [code] Orthonormal camera frame (graphics). Given a camera forward direction $\mathbf{f} = (1, 0, 2)$ and an approximate up vector $\mathbf{u}_{\text{approx}} = (0, 1, 0)$ (which is not perpendicular to $\mathbf{f}$), use one Gram-Schmidt step to produce a true up vector perpendicular to $\mathbf{f}$, normalize both, and form the right vector via the cross product. Verify the three resulting axes are orthonormal ($Q^{\mathsf{T}}Q = I$).
20.26 [code/proof] Why orthogonal predictors decouple. Build a design matrix with two predictors. First make them correlated ($\mathbf{a}_2 \approx \mathbf{a}_1 + \text{noise}$); fit and record the coefficients. Then Gram-Schmidt the predictors to orthogonal ones, refit, and observe that each orthogonal coefficient is just $\mathbf{q}_i \cdot \mathbf{b}$ — independent of the other. Explain in two sentences why orthogonality removes the coupling between regression coefficients (connect to the "non-interfering directions" theme of Part IV).