Quiz — Chapter 38: Numerical Linear Algebra

Q: What does the condition number measure?

The worst-case factor by which a relative error in the input (, or itself) is amplified into a relative error in the solution . In the 2-norm it equals from the SVD. It is a property of the problem, not of how you solve it.

Q: A matrix has . Roughly how many of your ~16 double-precision digits should you expect to lose when solving ?

About digits, leaving roughly 6 trustworthy digits. The rule of thumb: you lose about digits.

Q: True or false: a tiny residual guarantees a tiny error .

False. A small residual only certifies that solves a nearby problem (backward stability). Whether the nearby problem has a nearby answer depends on conditioning: (relative error) (relative residual). For the Hilbert matrix, the residual is while the error can be near 1.

Q: What is machine epsilon for IEEE double precision, and what does it represent?

. It is the gap between and the next larger representable double — equivalently, the smallest number you can add to and get something different from . It bounds the relative rounding error of a single arithmetic operation (which is at most ).

Q: Why do QR and SVD beat the normal equations for least squares?

Forming squares the condition number: , which doubles the digits you lose. QR () and SVD () solve the least-squares problem without ever forming , so they work with the original . and are orthogonal (), so they never amplify error.

Q: Why is the Hilbert matrix so ill-conditioned despite having only innocent-looking unit-fraction entries?

Its columns are nearly linearly dependent — it is the Gram matrix of the monomials on , and high-degree monomials look almost identical there. Near-parallel columns make nearly zero, so is enormous (growing exponentially with size, reaching at ).

Q: What does it mean for an algorithm to be *backward stable*?

The computed answer is the exact solution of a slightly perturbed problem — e.g., with . In words: the right answer to nearly the right question. Combined with conditioning, it gives forward error .

DataField.Dev

Quiz — Chapter 38: Numerical Linear Algebra

Twelve conceptual checks. Try each before opening the answer. One-line explanations follow.

Q1. What does the condition number $\kappa(A)$ measure?

Answer

The worst-case factor by which a *relative* error in the input ($\mathbf{b}$, or $A$ itself) is amplified into a *relative* error in the solution $\mathbf{x}$. In the 2-norm it equals $\sigma_{\max}/\sigma_{\min}$ from the SVD. *It is a property of the problem, not of how you solve it.*

Q2. A matrix has $\kappa(A) = 10^{10}$. Roughly how many of your ~16 double-precision digits should you expect to lose when solving $A\mathbf{x}=\mathbf{b}$?

Answer

About $\log_{10}(10^{10}) = 10$ digits, leaving roughly 6 trustworthy digits. The rule of thumb: you lose about $\log_{10}\kappa(A)$ digits.

Q3. True or false: a tiny residual $\lVert A\hat{\mathbf{x}} - \mathbf{b}\rVert$ guarantees a tiny error $\lVert\hat{\mathbf{x}} - \mathbf{x}_{\text{true}}\rVert$.

Answer

**False.** A small residual only certifies that $\hat{\mathbf{x}}$ solves a *nearby* problem (backward stability). Whether the nearby problem has a nearby answer depends on conditioning: (relative error) $\le \kappa(A) \times$ (relative residual). For the Hilbert matrix, the residual is $\sim 10^{-16}$ while the error can be near 1.

Q4. Why is $0.1 + 0.2 == 0.3$ False in Python (and essentially every language)?

Answer

None of $0.1$, $0.2$, $0.3$ is representable exactly in binary floating point — each is an infinite repeating binary fraction, rounded on input. The rounded sum lands one machine tick above the rounded $0.3$. The error ($\sim 5.6\times 10^{-17}$) is tiny, but `==` demands *exact* equality, which is the wrong question for floats. Compare with a tolerance instead.

Q5. What is machine epsilon for IEEE double precision, and what does it represent?

Answer

$\varepsilon_{\text{mach}} = 2^{-52} \approx 2.22\times 10^{-16}$. It is the gap between $1$ and the next larger representable double — equivalently, the smallest number you can add to $1$ and get something different from $1$. It bounds the relative rounding error of a single arithmetic operation (which is at most $\tfrac12\varepsilon_{\text{mach}}$).

Q6. Why do we pivot in LU decomposition, in numerical terms?

Answer

A tiny pivot forces a huge multiplier ($1/\text{pivot}$), and multiplying a row by a huge number swamps its other entries, losing them to rounding — *even when the matrix is perfectly well-conditioned*. Partial pivoting puts the largest available entry on the diagonal, keeping multipliers $\le 1$ and making LU backward stable. Pivoting fixes an *algorithm* instability, not a *problem* ill-conditioning.

Q7. Why do QR and SVD beat the normal equations for least squares?

Answer

Forming $A^{\mathsf{T}}A$ squares the condition number: $\kappa(A^{\mathsf{T}}A) = \kappa(A)^2$, which doubles the digits you lose. QR ($A=QR$) and SVD ($A=U\Sigma V^{\mathsf{T}}$) solve the least-squares problem without ever forming $A^{\mathsf{T}}A$, so they work with the original $\kappa(A)$. $Q$ and $U$ are orthogonal ($\kappa=1$), so they never amplify error.

Q8. What is catastrophic cancellation, and what operation causes it?

Answer

The loss of significant digits when *subtracting two nearly-equal numbers*: the matching leading digits cancel, exposing the rounding error that lived in the trailing digits. The subtraction doesn't create error — it deletes the good digits in front of the existing error. The fix is algebraic (rewrite to avoid the subtraction), e.g., the stable form of the quadratic formula.

Q9. Why is the Hilbert matrix so ill-conditioned despite having only innocent-looking unit-fraction entries?

Answer

Its columns are *nearly linearly dependent* — it is the Gram matrix of the monomials $1, t, t^2, \ldots$ on $[0,1]$, and high-degree monomials look almost identical there. Near-parallel columns make $\sigma_{\min}$ nearly zero, so $\kappa = \sigma_{\max}/\sigma_{\min}$ is enormous (growing exponentially with size, reaching $\sim 10^{16}$ at $n=12$).

Q10. What does it mean for an algorithm to be backward stable?

Answer

The computed answer $\hat{\mathbf{x}}$ is the *exact* solution of a slightly perturbed problem — e.g., $(A+\Delta A)\hat{\mathbf{x}} = \mathbf{b}$ with $\lVert\Delta A\rVert/\lVert A\rVert \sim \varepsilon_{\text{mach}}$. In words: the right answer to nearly the right question. Combined with conditioning, it gives forward error $\lesssim \kappa(A)\,\varepsilon_{\text{mach}}$.

Q11. When should you use an iterative method (Jacobi, Gauss–Seidel, conjugate gradient) instead of a direct method (LU, QR)?

Answer

When the matrix is *large and sparse* — millions of rows but mostly zeros (finite-element models, fluid simulations, web graphs). Direct methods would need to store and factor a dense matrix that doesn't fit in memory; iterative methods only need cheap sparse matrix-vector products and stop when "close enough."

Q12. When you call np.linalg.solve(A, b), what is actually doing the work, and which algorithm does it use?

Answer

LAPACK (compiled Fortran, often via an optimized BLAS like OpenBLAS or MKL) — specifically a *pivoted LU* decomposition followed by triangular solves (`*gesv`). The stable algorithm choice is already made for you. This is also why a from-scratch toolkit that skips pivoting, or computes the SVD via $A^{\mathsf{T}}A$, can disagree with numpy in the last digits: production software is the math plus decades of stability engineering.