Quiz — Chapter 38: Numerical Linear Algebra
Twelve conceptual checks. Try each before opening the answer. One-line explanations follow.
Q1. What does the condition number $\kappa(A)$ measure?
Answer
The worst-case factor by which a *relative* error in the input ($\mathbf{b}$, or $A$ itself) is amplified into a *relative* error in the solution $\mathbf{x}$. In the 2-norm it equals $\sigma_{\max}/\sigma_{\min}$ from the SVD. *It is a property of the problem, not of how you solve it.*Q2. A matrix has $\kappa(A) = 10^{10}$. Roughly how many of your ~16 double-precision digits should you expect to lose when solving $A\mathbf{x}=\mathbf{b}$?
Answer
About $\log_{10}(10^{10}) = 10$ digits, leaving roughly 6 trustworthy digits. The rule of thumb: you lose about $\log_{10}\kappa(A)$ digits.Q3. True or false: a tiny residual $\lVert A\hat{\mathbf{x}} - \mathbf{b}\rVert$ guarantees a tiny error $\lVert\hat{\mathbf{x}} - \mathbf{x}_{\text{true}}\rVert$.
Answer
**False.** A small residual only certifies that $\hat{\mathbf{x}}$ solves a *nearby* problem (backward stability). Whether the nearby problem has a nearby answer depends on conditioning: (relative error) $\le \kappa(A) \times$ (relative residual). For the Hilbert matrix, the residual is $\sim 10^{-16}$ while the error can be near 1.Q4. Why is $0.1 + 0.2 == 0.3$ False in Python (and essentially every language)?
Answer
None of $0.1$, $0.2$, $0.3$ is representable exactly in binary floating point — each is an infinite repeating binary fraction, rounded on input. The rounded sum lands one machine tick above the rounded $0.3$. The error ($\sim 5.6\times 10^{-17}$) is tiny, but `==` demands *exact* equality, which is the wrong question for floats. Compare with a tolerance instead.Q5. What is machine epsilon for IEEE double precision, and what does it represent?
Answer
$\varepsilon_{\text{mach}} = 2^{-52} \approx 2.22\times 10^{-16}$. It is the gap between $1$ and the next larger representable double — equivalently, the smallest number you can add to $1$ and get something different from $1$. It bounds the relative rounding error of a single arithmetic operation (which is at most $\tfrac12\varepsilon_{\text{mach}}$).Q6. Why do we pivot in LU decomposition, in numerical terms?
Answer
A tiny pivot forces a huge multiplier ($1/\text{pivot}$), and multiplying a row by a huge number swamps its other entries, losing them to rounding — *even when the matrix is perfectly well-conditioned*. Partial pivoting puts the largest available entry on the diagonal, keeping multipliers $\le 1$ and making LU backward stable. Pivoting fixes an *algorithm* instability, not a *problem* ill-conditioning.Q7. Why do QR and SVD beat the normal equations for least squares?
Answer
Forming $A^{\mathsf{T}}A$ squares the condition number: $\kappa(A^{\mathsf{T}}A) = \kappa(A)^2$, which doubles the digits you lose. QR ($A=QR$) and SVD ($A=U\Sigma V^{\mathsf{T}}$) solve the least-squares problem without ever forming $A^{\mathsf{T}}A$, so they work with the original $\kappa(A)$. $Q$ and $U$ are orthogonal ($\kappa=1$), so they never amplify error.Q8. What is catastrophic cancellation, and what operation causes it?
Answer
The loss of significant digits when *subtracting two nearly-equal numbers*: the matching leading digits cancel, exposing the rounding error that lived in the trailing digits. The subtraction doesn't create error — it deletes the good digits in front of the existing error. The fix is algebraic (rewrite to avoid the subtraction), e.g., the stable form of the quadratic formula.Q9. Why is the Hilbert matrix so ill-conditioned despite having only innocent-looking unit-fraction entries?
Answer
Its columns are *nearly linearly dependent* — it is the Gram matrix of the monomials $1, t, t^2, \ldots$ on $[0,1]$, and high-degree monomials look almost identical there. Near-parallel columns make $\sigma_{\min}$ nearly zero, so $\kappa = \sigma_{\max}/\sigma_{\min}$ is enormous (growing exponentially with size, reaching $\sim 10^{16}$ at $n=12$).Q10. What does it mean for an algorithm to be backward stable?
Answer
The computed answer $\hat{\mathbf{x}}$ is the *exact* solution of a slightly perturbed problem — e.g., $(A+\Delta A)\hat{\mathbf{x}} = \mathbf{b}$ with $\lVert\Delta A\rVert/\lVert A\rVert \sim \varepsilon_{\text{mach}}$. In words: the right answer to nearly the right question. Combined with conditioning, it gives forward error $\lesssim \kappa(A)\,\varepsilon_{\text{mach}}$.Q11. When should you use an iterative method (Jacobi, Gauss–Seidel, conjugate gradient) instead of a direct method (LU, QR)?
Answer
When the matrix is *large and sparse* — millions of rows but mostly zeros (finite-element models, fluid simulations, web graphs). Direct methods would need to store and factor a dense matrix that doesn't fit in memory; iterative methods only need cheap sparse matrix-vector products and stop when "close enough."Q12. When you call np.linalg.solve(A, b), what is actually doing the work, and which algorithm does it use?