Exercises — Chapter 38: Numerical Linear Algebra

Work the tiers in order; each builds on the last. Problems marked [hand] are pencil-and-paper, [code] want a short numpy experiment, and [proof] ask for an argument. Solutions to the conceptual tier appear in quiz.md; the rest are for self-checking against numpy.


⭐ Tier 1 — Conceptual (understand)

1.1 In one sentence each, state the difference between the conditioning of a problem and the stability of an algorithm. Which one can you fix by choosing a better method, and which one is fixed by the problem itself?

1.2 What is machine epsilon $\varepsilon_{\text{mach}}$ for IEEE double precision, and what does it physically represent? (Give both the numerical value and a one-line interpretation.)

1.3 True or false, with a reason: "If np.linalg.solve(A, b) returns an answer with a tiny residual $\lVert A\hat{\mathbf{x}}-\mathbf{b}\rVert$, then the answer is accurate to nearly full precision."

1.4 Why is $\kappa(A) \ge 1$ for every matrix? What kind of matrix achieves $\kappa(A) = 1$, and what does that say about its geometric action?

1.5 Explain in plain English why testing a == b for two floating-point numbers is dangerous. What should you write instead?

1.6 Order these three operations from safest to most dangerous for floating-point precision, and justify: (a) multiplying two numbers, (b) adding two positive numbers, (c) subtracting two nearly-equal numbers.

1.7 Your colleague says "I'll just compute $A^{-1}$ once and multiply by it whenever I need to solve $A\mathbf{x}=\mathbf{b}$." Give two distinct reasons (one about accuracy, one about cost) why np.linalg.solve is the better habit.

1.8 A least-squares problem has $\kappa(A) = 10^7$. Estimate the condition number of the normal-equations system $A^{\mathsf{T}}A$, and explain why this makes the normal equations a poor choice in double precision.


⭐⭐ Tier 2 — Computation (apply, by hand or short code)

2.1 [hand] A matrix has singular values $\sigma_1 = 50$, $\sigma_2 = 2$, $\sigma_3 = 0.001$. Compute $\kappa(A)$ in the 2-norm. Roughly how many decimal digits of accuracy would you expect to lose when solving $A\mathbf{x}=\mathbf{b}$ in double precision?

2.2 [hand] You solve $A\mathbf{x}=\mathbf{b}$ where $\kappa(A) = 10^8$, and your right-hand side $\mathbf{b}$ is known to a relative accuracy of $10^{-12}$. Use the perturbation bound to give the worst-case relative error in $\mathbf{x}$. How many correct digits can you guarantee?

2.3 [hand] For the matrix $A = \begin{bmatrix} 1 & 1 \\ 1 & 1.001 \end{bmatrix}$, perform Gaussian elimination by hand to solve $A\mathbf{x} = (2, 2.001)^{\mathsf{T}}$. Then solve $A\mathbf{x}' = (2, 2.002)^{\mathsf{T}}$. By what factor did the relative change in $\mathbf{b}$ get amplified into a relative change in $\mathbf{x}$?

2.4 [code] Write the Hilbert matrix $H_5$ in numpy (from scipy.linalg import hilbert). Report np.linalg.cond(H) and verify it equals $\sigma_{\max}/\sigma_{\min}$ from np.linalg.svd. They should match to all printed digits.

2.5 [code] Construct the problem $H_8 \mathbf{x} = \mathbf{b}$ with $\mathbf{x}_{\text{true}} = (1,1,\ldots,1)$ and $\mathbf{b} = H_8\mathbf{x}_{\text{true}}$. Solve it and report (i) the relative error $\lVert\hat{\mathbf{x}}-\mathbf{x}_{\text{true}}\rVert/\lVert\mathbf{x}_{\text{true}}\rVert$ and (ii) the relative residual $\lVert H_8\hat{\mathbf{x}}-\mathbf{b}\rVert/\lVert\mathbf{b}\rVert$. Comment on the gap between them.

2.6 [code] Evaluate (1 - np.cos(x))/x**2 for x = 1e-6, 1e-7, 1e-8. The true limit as $x\to 0$ is $1/2$. At which $x$ does catastrophic cancellation visibly ruin the answer? Then recompute using $1-\cos x = 2\sin^2(x/2)$ and show the stable version stays near $0.5$.

2.7 [hand] Explain, with the numbers, what goes wrong if you eliminate without pivoting in $\begin{bmatrix} 10^{-18} & 1 \\ 1 & 1\end{bmatrix}\mathbf{x} = \begin{bmatrix} 1 \\ 2\end{bmatrix}$. What is the multiplier, and which entry gets "lost" to rounding?

2.8 [code] Build the $4\times 4$ diagonally dominant matrix from Section 38.6 and run 10 Jacobi sweeps and 10 Gauss–Seidel sweeps from $\mathbf{x}^{(0)} = \mathbf{0}$. Plot or print the error at each iteration. Confirm Gauss–Seidel reaches a smaller error faster.

2.9 [code] For the same matrix, form the Jacobi iteration matrix $T = -D^{-1}(L+U)$ and compute its spectral radius (max(abs(np.linalg.eigvals(T)))). Verify it is less than 1, and relate its value to how fast the iteration converged in 2.8.


⭐⭐⭐ Tier 3 — Proof (A) and Coding (C)

3.1 [proof] Prove that for invertible $A$, $\kappa(A) = \kappa(A^{-1})$. (Hint: the singular values of $A^{-1}$ are the reciprocals of those of $A$.)

3.2 [proof] Prove that $\kappa(cA) = \kappa(A)$ for any nonzero scalar $c$. Explain why this means a near-singular determinant is not a reliable indicator of ill-conditioning.

3.3 [proof] Show that for an orthogonal matrix $Q$ (with $Q^{\mathsf{T}}Q = I$), every singular value equals 1, and hence $\kappa(Q) = 1$. Interpret this geometrically: why does an orthogonal transformation never amplify error?

3.4 [proof] Derive the relationship $\kappa(A^{\mathsf{T}}A) = \kappa(A)^2$ using the SVD $A = U\Sigma V^{\mathsf{T}}$. Then explain in one sentence why this single fact dooms the normal equations for ill-conditioned least squares.

3.5 [proof] Starting from the iteration $\mathbf{x}^{(k+1)} = T\mathbf{x}^{(k)} + \mathbf{c}$ with fixed point $\mathbf{x}^*$, show that the error satisfies $\mathbf{e}^{(k)} = T^k \mathbf{e}^{(0)}$. Conclude that the iteration converges for every starting guess if and only if the spectral radius $\rho(T) < 1$.

3.6 [code] Implement condition_number(A) using only your own SVD (or, if you must, np.linalg.svd) — return $\sigma_{\max}/\sigma_{\min}$. Test it against np.linalg.cond on five random matrices and on $H_6$; they should agree to ~10 digits.

3.7 [code] Write a function lost_digits(A) that returns $\log_{10}\kappa(A)$ and prints "you can trust about $16 - \lceil\log_{10}\kappa\rceil$ digits in double precision." Run it on $H_n$ for $n = 2, \ldots, 14$ and tabulate the predicted trustworthy digits against the measured error from solving the all-ones problem.

3.8 [code] Reproduce the normal-equations-vs-QR comparison from Section 38.5 on a $50\times 12$ Vandermonde matrix. Report $\kappa(A)$, $\kappa(A^{\mathsf{T}}A)$, and the distance of each method's coefficients from np.linalg.lstsq. Confirm the normal-equations answer is dramatically worse.

3.9 [code] Implement Gauss–Seidel from scratch and use it to solve a $100\times 100$ tridiagonal diagonally dominant system. Stop when the residual falls below $10^{-10}$ and report the iteration count. Compare the cost (number of operations touched) to what a dense direct solve would require.


⭐⭐⭐⭐ Tier 4 — Application & Synthesis

4.1 [code, application] The regression that blew up. Generate 40 data points $(x_i, y_i)$ from $y = 1 + 2x + 3x^2$ plus small noise, with $x_i$ clustered in a narrow interval $[5, 5.2]$ (so the polynomial columns are nearly collinear). Fit a quadratic by (a) the normal equations and (b) QR. Report $\kappa$ of the design matrix and the fitted coefficients from each method. Which gives sane coefficients? Then re-center the $x$ values to $[-0.1, 0.1]$ and show that centering lowers the condition number and rescues the normal-equations fit. Write two sentences on the lesson for practitioners.

4.2 [code, application] Precision budget. Solve $H_n \mathbf{x} = \mathbf{b}$ (all-ones truth) in both float32 and float64 for $n = 4, 6, 8, 10$. Tabulate the relative error for each precision against $\kappa(H_n)$. At what condition number does float32 lose all useful digits? Does it match the rule "single precision dies near $\kappa \approx 10^7$"?

4.3 [code, application] Iterative scaling. Take the $N\times N$ 1D-Laplacian (tridiagonal, $2$ on the diagonal, $-1$ off) for $N = 100, 500, 1000$. Solve $A\mathbf{x} = \mathbf{1}$ with scipy.sparse.linalg.cg and record the iteration count and the number of stored nonzeros versus $N^2$. Comment on why a direct dense solver becomes impractical as $N$ grows but CG does not.

4.4 [proof + reflection] When is a small residual enough? Suppose you solve $A\mathbf{x} = \mathbf{b}$ and obtain a relative residual of $10^{-14}$. Using the bound (relative error) $\le \kappa(A) \times$ (relative residual), state the largest $\kappa(A)$ for which you can still guarantee at least 6 correct digits. Then describe a realistic scenario (name the field) where you would not know $\kappa(A)$ in advance, and explain what you would compute to find out before trusting the answer.

4.5 [code, application] Stabilizing on purpose (ridge). For the ill-conditioned Vandermonde problem from 3.8, add a small multiple $\lambda I$ to $A^{\mathsf{T}}A$ (ridge regression) for $\lambda = 0, 10^{-8}, 10^{-4}, 10^{-1}$. Report $\kappa(A^{\mathsf{T}}A + \lambda I)$ and the coefficient error for each $\lambda$. Explain how regularization lowers the condition number and why that trades a little bias for a lot of stability.