Case Study 2 — Calibrating a Sensor: Least Squares Turns Voltage into Temperature

DataField.Dev

Case Study 2 — Calibrating a Sensor: Least Squares Turns Voltage into Temperature

Field: engineering / experimental physics (instrumentation). Concepts used: overdetermined systems, design matrix, normal equations, projection onto $C(A)$, residual norm, $R^2$, polynomial regression, model selection. Why it matters: every sensor on Earth — the thermometer in your thermostat, the pressure transducer in a jet engine, the strain gauge on a bridge — outputs a raw electrical signal that must be converted to a physical quantity. That conversion rule is found by least squares, and its residuals define the instrument's accuracy. Calibration is regression.

The problem: a sensor speaks in volts, not degrees

A thermistor is a cheap, rugged temperature sensor whose electrical behavior changes with heat; wired into a simple circuit, it produces an output voltage that rises as the temperature rises. But the sensor does not know what a "degree" is — it only emits volts. To make it useful, an engineer must find the rule that converts voltage $V$ to temperature $T$. The cleanest such rule is linear, $$ T \approx c_0 + c_1 V, $$ where $c_1$ is the sensor's sensitivity (degrees per volt) and $c_0$ is an offset. Finding $c_0$ and $c_1$ is calibration: place the sensor in baths of known temperature, record the voltage at each, and fit the best line through the (voltage, temperature) pairs. With more calibration points than parameters — which we always want, to average out measurement noise — this is an overdetermined system, and the best line is the least-squares projection of the temperature vector onto the column space of the voltage design matrix.

Suppose we run the thermistor through six reference baths and record:

Bath	Voltage $V$ (V)	True temp $T$ (°C)
1	0.50	10.2
2	0.95	20.5
3	1.55	35.1
4	2.05	47.8
5	2.60	61.0
6	3.10	73.5

The points look very nearly straight — as a well-behaved sensor should — but not perfectly straight, because each temperature reading carries small measurement error. No single line passes through all six points, so we fit the best one.

Setting up the projection

Each calibration point gives one equation $c_0 + c_1 V_i = T_i$. Stack them: the design matrix $A$ has an all-ones column (for the offset $c_0$) and a voltage column (for the sensitivity $c_1$); the target $\mathbf{b}$ holds the six known temperatures. $$ A = \begin{bmatrix} 1 & 0.50 \\ 1 & 0.95 \\ 1 & 1.55 \\ 1 & 2.05 \\ 1 & 2.60 \\ 1 & 3.10 \end{bmatrix}, \qquad \mathbf{b} = \begin{bmatrix} 10.2 \\ 20.5 \\ 35.1 \\ 47.8 \\ 61.0 \\ 73.5 \end{bmatrix}, \qquad \mathbf{x} = \begin{bmatrix} c_0 \\ c_1 \end{bmatrix}. $$ This is a $6 \times 2$ tall system. The column space $C(A)$ is a 2-dimensional plane in $\mathbb{R}^6$ — every temperature vector a straight-line calibration could produce. The actual temperature vector $\mathbf{b}$ lies just slightly off that plane (because of the small reading errors), and least squares drops the perpendicular: it finds the line whose six predicted temperatures collectively sit closest to the six true temperatures.

Solving the normal equations

The Gram matrix and right-hand side are quick to assemble, and the $2\times 2$ system is the same kind we solved by hand in §17.6.

# Linear sensor calibration: fit T = c0 + c1 * V by least squares.
import numpy as np
V = np.array([0.50, 0.95, 1.55, 2.05, 2.60, 3.10])   # volts
T = np.array([10.2, 20.5, 35.1, 47.8, 61.0, 73.5])   # degrees C
A = np.column_stack([np.ones_like(V), V])             # design matrix [1, V]

c = np.linalg.solve(A.T @ A, A.T @ T)                 # normal equations (2x2)
print("A^T A =\n", A.T @ A)            # [[ 6.    10.75 ], [10.75 24.1275]]
print("A^T b =", np.round(A.T @ T, 2)) # [248.1  563.42]
print("c0, c1 =", np.round(c, 4))      # [-2.4221 24.431 ]

r = T - A @ c
R2 = 1 - (r @ r) / np.sum((T - T.mean())**2)
print("max |residual| (deg C) =", round(np.max(np.abs(r)), 3))  # 0.407
print("R^2 =", round(R2, 6))                                    # 0.999852

The fitted calibration is $$ \widehat{T} = -2.42 + 24.43\,V \quad (^{\circ}\mathrm{C}), $$ so the sensor's sensitivity is about 24.43 °C per volt with a small −2.42 °C offset. The fit is superb: $R^2 = 0.99985$, and the largest residual over all six baths is only about 0.41 °C. That residual is the headline accuracy number an instrumentation engineer cares about — it says "trust this sensor to roughly half a degree across its calibrated range." The residual norm $\lVert\mathbf{r}\rVert$, not the prettiness of the coefficients, is the deliverable of a calibration.

With the calibration in hand, converting a new reading is one line. A fresh measurement of $V = 1.80$ V becomes:

print("V = 1.80 V  ->  T = %.2f C" % (np.array([1, 1.80]) @ c))   # 41.55 C

— about 41.6 °C. The sensor now speaks in degrees. Geometrically, we evaluated the fitted line (the projection of the calibration temperatures onto $C(A)$) at a new voltage and read off the predicted temperature.

Should we add a quadratic term?

Real sensors are rarely exactly linear; physics often introduces gentle curvature. A natural next question — and a chance to use §17.7 — is whether a quadratic calibration $T = c_0 + c_1 V + c_2 V^2$ fits meaningfully better. We add a $V^2$ column to the design matrix (making it $6\times 3$) and re-project.

# Quadratic calibration: add a V^2 column, same normal-equations machinery.
Aq = np.column_stack([np.ones_like(V), V, V**2])      # [1, V, V^2]
cq, *_ = np.linalg.lstsq(Aq, T, rcond=None)
rq = T - Aq @ cq
print("quad coeffs =", np.round(cq, 4))               # [-1.7824 23.4772  0.2659]
print("quad R^2 =", round(1 - (rq @ rq)/np.sum((T-T.mean())**2), 6))  # 0.999917
print("quad max|resid| =", round(np.max(np.abs(rq)), 3))             # 0.337

The quadratic fit nudges $R^2$ from $0.99985$ to $0.99992$ and the worst-case residual from $0.41$ °C down to $0.34$ °C — a real but tiny improvement. This is exactly the model-selection judgment §17.7 warned about: adding the $V^2$ column enlarges $C(A)$, so the residual can only shrink, but here it shrinks almost imperceptibly. The small quadratic coefficient ($c_2 \approx 0.27$) confirms the sensor is very nearly linear over this range. A good engineer would keep the linear calibration: it is simpler, its coefficients have clear physical meaning, and the half-degree it costs is well within the sensor's noise. Reaching for the quadratic here would be chasing noise — fitting the measurement errors rather than the physics, the over-fitting trap in miniature.

Why this is the four-subspaces picture at work

Step back and notice that nothing in this calibration used statistics beyond the word "noise." Everything was linear algebra: a tall system with no exact solution, a column space of achievable calibrations, a data vector hovering just off it, and a perpendicular dropped to find the closest line. The orthogonality condition $A^{\mathsf{T}}\mathbf{r} = \mathbf{0}$ guarantees the residuals carry no remaining linear trend against voltage — the line has extracted all the straight-line information the data contains, and what is left is genuine scatter the model cannot reduce.

The same procedure scales across all of instrumentation. A pressure transducer is calibrated against a deadweight tester the identical way; an accelerometer against a shaker table; a pH probe against buffer solutions. Whenever the response is approximately polynomial in the raw signal, the design matrix is the Vandermonde matrix of §17.7 and the calibration constants are a least-squares projection. When the relationship is genuinely nonlinear (a thermocouple's voltage, say, follows a high-order polynomial of temperature), engineers fit a higher-degree polynomial — still linear least squares, still a projection onto a column space — being careful, as here, not to let the degree run away into over-fitting. The condition-number caution of §17.9 becomes practical at this point: high-degree Vandermonde matrices are notoriously ill-conditioned, so production calibration software fits them via QR or with orthogonal polynomials rather than by inverting $A^{\mathsf{T}}A$. The geometry is the constant; only the columns and the solver change. For the statistical machinery layered on top — confidence intervals on the calibration constants and tests of fit — see regression in statistics, and for the broader data-fitting toolkit, linear regression.