Case Study 30.2 — The Pseudoinverse: Solving the Unsolvable in Engineering

Field: engineering & scientific computing (sensor calibration, structural measurement, control). Connection: this case study turns the pseudoinverse of §30.10 into the everyday workhorse of engineering — the tool that solves systems with no exact solution, and the most numerically reliable way to do least squares, made trustworthy by the singular values (and the condition number that tees up Chapter 38).

The problem: more measurements than unknowns

Engineering is drowning in overdetermined systems — situations with more equations than unknowns, where no exact solution exists and you must find the best approximate one. A surveyor takes a dozen angle measurements to locate three points. A structural engineer reads six strain gauges to estimate three deflection parameters of a loaded beam. A GPS receiver hears from eight satellites to pin down four unknowns (three coordinates and a clock offset). In every case the measurements are noisy and slightly inconsistent, so the system $A\mathbf{x} = \mathbf{b}$ has no exact solution: there is no $\mathbf{x}$ that satisfies all the equations at once. Yet the engineer must still produce an answer — the single best estimate of the unknowns. This is the least-squares problem, and the singular value decomposition solves it completely, for any matrix, through the pseudoinverse $A^{+}$ (§30.10).

We met least squares as projection onto the column space in Chapter 19, and as line-fitting in Chapter 17. The SVD unifies and generalizes both: the pseudoinverse $A^{+} = V\Sigma^{+}U^{\mathsf{T}}$ produces the least-squares solution $\hat{\mathbf{x}} = A^{+}\mathbf{b}$ for every matrix $A$ — overdetermined or underdetermined, full-rank or rank-deficient. And because it is built from the SVD, it inherits the SVD's numerical reliability, which is why it is the method engineering software actually uses when accuracy matters.

Why is least squares — minimizing the sum of squared residuals — the right notion of "best"? The geometric answer is the one from Chapter 19, and the SVD makes it crisp. The vector $\mathbf{b}$ of measurements lives in $\mathbb{R}^m$ (one dimension per equation), but the achievable outputs $A\mathbf{x}$ fill only the column space of $A$ — a lower-dimensional subspace (here a 3-dimensional subspace of $\mathbb{R}^6$, spanned by the first three left singular vectors). When $\mathbf{b}$ lies off that subspace — as noisy measurements always do — there is no exact solution, and the closest reachable point is the orthogonal projection of $\mathbf{b}$ onto the column space. Minimizing $\lVert A\mathbf{x} - \mathbf{b}\rVert$ (the Euclidean distance, hence squared residuals) finds exactly that projection, and the pseudoinverse computes the $\mathbf{x}$ that lands there. So least squares is not an arbitrary choice of objective; it is the unique answer to "what is the nearest consistent system to the one my noisy data gave me?" The left singular vectors $\mathbf{u}_i$ are the orthonormal basis of the column space onto which the projection happens — the SVD and the projection of Chapter 19 are the same geometry.

A concrete example: fitting a plane to sensor readings

Suppose you are calibrating a flat surface — a machined plate, a solar panel, or the deflection field of a loaded floor. You have a sensor that reads a height $z$ at known $(x, y)$ positions, and you model the surface as a tilted plane $z = a + bx + cy$ with three unknown parameters $a, b, c$. You take six readings (more than the three unknowns, for robustness against noise) at positions $(0,0), (1,0), (0,1), (1,1), (2,1), (1,2)$, getting heights $\mathbf{z} = (1.1, 2.8, 4.15, 5.9, 8.05, 8.95)$. Each reading gives one equation $a + bx_i + cy_i = z_i$, so the system is $A\mathbf{x} = \mathbf{z}$ with the $6\times 3$ design matrix

$$A = \begin{bmatrix} 1 & 0 & 0 \\ 1 & 1 & 0 \\ 1 & 0 & 1 \\ 1 & 1 & 1 \\ 1 & 2 & 1 \\ 1 & 1 & 2 \end{bmatrix}, \qquad \mathbf{x} = \begin{bmatrix} a \\ b \\ c \end{bmatrix}.$$

Six equations, three unknowns: overdetermined. Because the readings carry measurement noise, no plane passes exactly through all six points — the system has no exact solution. We want the plane that comes closest in the least-squares sense, minimizing $\sum_i (a + bx_i + cy_i - z_i)^2 = \lVert A\mathbf{x} - \mathbf{z}\rVert^2$. The pseudoinverse delivers it in one line.

# Least squares via the SVD pseudoinverse: fit a plane z = a + bx + cy to 6 readings.
import numpy as np
pts = np.array([[0,0], [1,0], [0,1], [1,1], [2,1], [1,2]], dtype=float)
A = np.column_stack([np.ones(len(pts)), pts[:, 0], pts[:, 1]])   # 6x3 design matrix
z = np.array([1.1, 2.8, 4.15, 5.9, 8.05, 8.95])                  # noisy height readings

U, S, Vt = np.linalg.svd(A, full_matrices=False)                 # reduced SVD
A_plus = Vt.T @ np.diag(1.0 / S) @ U.T                           # A⁺ = V Σ⁺ Uᵀ
coef = A_plus @ z                                                # least-squares [a, b, c]

print("singular values:", np.round(S, 4))
print("condition number:", round(np.linalg.cond(A), 4))
print("best-fit plane: z =", round(coef[0],4), "+", round(coef[1],4), "x +",
      round(coef[2],4), "y")
print("residual ‖A x̂ − z‖ =", round(np.linalg.norm(A @ coef - z), 4))
print("matches np.linalg.lstsq?", np.allclose(coef, np.linalg.lstsq(A, z, rcond=None)[0]))
singular values: [4.0843 1.4142 1.1484]
condition number: 3.5564
best-fit plane: z = 1.0295 + 1.9148 x + 3.0398 y
residual ‖A x̂ − z‖ = 0.2601
matches np.linalg.lstsq? True

The pseudoinverse recovers the best-fit plane $z \approx 1.03 + 1.91x + 3.04y$. (The data were generated from a true plane $z = 1 + 2x + 3y$ plus small measurement noise, so the estimate is satisfyingly close — the least-squares fit has averaged out the noise across the six readings.) The residual norm $0.26$ is the total mismatch that no plane could eliminate, and the answer matches numpy's dedicated lstsq solver exactly. Notice what the pseudoinverse did: it took an unsolvable system and returned the provably best approximate solution, with nothing more than the SVD.

Why the singular values certify the answer

Here is where the SVD does more than any other method could. Look at the singular values, $\{4.08, 1.41, 1.15\}$, and the condition number, $\kappa(A) = \sigma_1/\sigma_3 = 4.08/1.15 \approx 3.56$. That small condition number is a certificate of trust: it says the three columns of the design matrix (the three "directions" the measurements probe) are well separated, so the least-squares estimate is stable — small errors in the readings produce only small errors in $a, b, c$. The engineer can report the answer with confidence.

Contrast that with a poorly designed experiment. Suppose all your sensor positions were nearly collinear — clustered along a line instead of spread across the plane. Then two columns of $A$ would be nearly proportional, the smallest singular value would be tiny, and the condition number would explode:

# A badly designed experiment: nearly collinear sample points → ill-conditioned.
import numpy as np
B = np.array([[1, 1.000],
              [1, 1.001],
              [1, 0.999],
              [1, 1.0005]])                    # second column nearly constant
print("singular values:", np.round(np.linalg.svd(B, compute_uv=False), 4))
print("condition number:", round(np.linalg.cond(B), 1))
singular values: [2.8286e+00 1.0000e-03]
condition number: 2704.8

A condition number of $2705$ warns that the fit is dangerously sensitive: the data barely constrain the slope, so tiny measurement errors get amplified roughly $2700$-fold into the estimated parameters. The SVD has diagnosed a flaw in the experiment before any conclusion is drawn — it tells the engineer to spread the sensors out and re-measure. No other tool gives this warning so cleanly; the singular values turn "is my measurement design any good?" from intuition into a number (the theme we develop fully in Chapter 38).

Rank deficiency and the minimum-norm solution

The pseudoinverse handles an even harder case that defeats the normal equations entirely: a rank-deficient system, where the unknowns are not all determined by the data. Imagine two of your three plane parameters are not separately identifiable — perhaps every sensor sits at $x = y$, so the data can only determine $b + c$, not $b$ and $c$ individually. Then $A$ has rank 2, not 3; the normal-equations matrix $A^{\mathsf{T}}A$ is singular and cannot be inverted, and the least-squares problem has infinitely many solutions (any split of $b + c$ fits equally well).

The SVD pseudoinverse calmly returns the minimum-norm solution among those infinitely many — the one with the smallest $\lVert\mathbf{x}\rVert$, which spreads the under-determined quantity as evenly as possible and refuses to invent large, unsupported parameter values. This happens automatically: the rule "reciprocate the nonzero singular values, leave the zeros at zero" (§30.10) means the pseudoinverse never adds any component along the under-determined directions (the null space). Where Gaussian elimination throws an error and the normal equations blow up, the SVD gives the sanest possible answer. This is why control engineers, roboticists, and computer-graphics programmers reach for the pseudoinverse: redundant robot arms (more joints than task dimensions) and ill-posed inverse problems are exactly the rank-deficient case, and the minimum-norm pseudoinverse solution is the principled choice.

A caution that mirrors Case Study 30.1: when a singular value is small but nonzero, reciprocating it ($1/\sigma$ becomes huge) amplifies noise catastrophically. Production engineering code uses the truncated SVD pseudoinverse — discarding singular values below a tolerance — to trade a little bias for enormous stability. This is the regularization idea (kin to ridge regression and Tikhonov regularization) that makes least squares usable on real, noisy, ill-conditioned data, and it is invisible without the SVD.

Takeaways

  • Overdetermined systems (more measurements than unknowns) are everywhere in engineering and have no exact solution; the least-squares solution is the best approximation, minimizing $\lVert A\mathbf{x} - \mathbf{b}\rVert$.
  • The SVD pseudoinverse $A^{+} = V\Sigma^{+}U^{\mathsf{T}}$ gives the least-squares solution $\hat{\mathbf{x}} = A^{+}\mathbf{b}$ for any matrix — and is more numerically reliable than the normal equations, because it never forms the condition-number-squaring $A^{\mathsf{T}}A$ (§30.10).
  • The condition number $\sigma_1/\sigma_{\min}$ certifies the fit: small means a stable, trustworthy estimate; huge means the experiment is poorly designed and the answer is fragile (the diagnostic of Chapter 38).
  • For rank-deficient systems (under-determined unknowns), the pseudoinverse returns the minimum-norm solution automatically — the principled choice where the normal equations fail outright. This is the everyday tool of control, robotics, and inverse problems.
  • One decomposition, the SVD, unifies projection (Chapter 19), regression (Chapter 17), and the pseudoinverse — learn it once, use it everywhere, the theme of Part VI.