Case Study 2 — Denoising a Signal: Subtracting What You Can Project

DataField.Dev

Case Study 2 — Denoising a Signal: Subtracting What You Can Project

Field: signal processing / audio. Where Case Study 1 used projection to keep the part of a vector inside a subspace, this one uses the complementary projector $I - P$ to remove a known contaminant — the everyday engineering act of subtracting hum, drift, and interference from a measurement.

The problem: a recording contaminated by known interference

You are recording a faint signal — say a sensor trace, an audio clip, or a biomedical waveform — in an electrically noisy room. Two contaminants are corrupting it, and crucially you know their shape:

A DC offset: a constant baseline added to every sample (a steady bias from the amplifier).
A low-frequency hum: a single-frequency interference (think of the $50$ or $60$ Hz buzz that mains electricity injects into everything), which over your sampling window looks like one cosine and one sine of a known frequency.

What you want is the underlying signal — here, for a clean demonstration, a component oscillating at a different frequency than the hum. The contaminants and the desired signal are tangled together in your recorded samples. The question is how to pull them apart, and the answer is orthogonal projection.

The geometry: the interference lives in a subspace

Model the recorded signal as a vector $\mathbf{y}\in\mathbb{R}^n$ of $n$ samples. The key realization is that the known interference lives in a small, known subspace. A DC offset is any multiple of the constant vector $\mathbf{1} = (1,1,\dots,1)$. A hum at a fixed frequency $f$ is any combination of the sampled cosine $\cos(2\pi f\, t)$ and sine $\sin(2\pi f\, t)$. So every possible contaminant of these three kinds is a vector in the interference subspace $$S = \operatorname{span}\{\,\mathbf{1},\ \cos(2\pi f\,t),\ \sin(2\pi f\,t)\,\} = C(A),$$ where $A$ is the $n\times 3$ matrix whose columns are those three sampled waveforms. We do not know the amounts of each contaminant — that depends on the room — but we know the directions, and that is enough.

Now the strategy is pure Chapter 19. The recorded signal splits orthogonally as $$\mathbf{y} = \underbrace{\mathbf{p}}_{\text{interference: in } S} + \underbrace{\mathbf{e}}_{\text{everything else: } \perp\, S}.$$ The projection $\mathbf{p} = P\mathbf{y}$ is the best reconstruction of the contaminant from its known shapes; the error $\mathbf{e} = (I - P)\mathbf{y}$ is what remains after the contaminant is subtracted. To denoise, we keep the error and throw away the projection — the exact opposite of regression, where we kept the projection and threw away the error. The complementary projector $I - P$ is the denoising filter.

The same projection idea, two opposite uses: regression keeps $\mathbf{p}$ (the part explained by the model) and discards $\mathbf{e}$; interference removal keeps $\mathbf{e}$ (the part not explained by the known junk) and discards $\mathbf{p}$. Which you keep depends on whether your subspace models the signal or the noise.

A pleasant gift: the interference basis is orthogonal

The three contaminant waveforms — the constant, the cosine, and the sine — are, over a full number of periods, mutually orthogonal. This is the seed of the entire theory of Fourier series (Chapter 22), and here it makes the computation almost trivial: because the columns of $A$ are orthogonal, the Gram matrix $A^{\mathsf{T}}A$ is diagonal, so the normal equations decouple and recovering the interference amounts to three independent one-dimensional projections (exactly the simplification of §19.9). Let us watch it work on eight samples.

# Removing a known interference (DC + one-frequency hum) by projecting it out.
# (Chapter 19, Case Study 2)
import numpy as np
n = 8
t = np.arange(n)

# The KNOWN interference subspace: constant (DC) + cosine/sine at frequency k=1.
dc  = np.ones(n)
cos = np.cos(2 * np.pi * 1 * t / n)
sin = np.sin(2 * np.pi * 1 * t / n)
A = np.column_stack([dc, cos, sin])        # n-by-3 interference design matrix

# The CLEAN signal we want to recover: a different frequency (k=2).
clean = 2.0 * np.cos(2 * np.pi * 2 * t / n)
# The recording = clean signal + contamination (DC=3, +4*cos -1*sin at freq 1).
y = clean + 3 * dc + 4 * cos - 1 * sin

# Build the projection matrix onto the interference subspace and its complement.
P = A @ np.linalg.inv(A.T @ A) @ A.T
interference = P @ y                       # p : best reconstruction of the junk
cleaned      = (np.eye(n) - P) @ y         # e = (I - P) y : the recovered signal

print("A^T A (diagonal -> orthogonal columns):\n", np.round(A.T @ A, 6))
print("recovered interference coeffs:",
      np.round(np.linalg.solve(A.T @ A, A.T @ y), 6))   # [ 3.  4. -1.]
print("cleaned matches clean signal?", np.allclose(cleaned, clean))   # True
print("max |cleaned - clean| =", float(np.max(np.abs(cleaned - clean))))  # ~1e-15
print("trace(P) =", round(float(np.trace(P)), 6))      # 3.0  = dim of interference space
print("P^2 == P?", np.allclose(P @ P, P), "  P^T == P?", np.allclose(P, P.T))

The output confirms the story. The Gram matrix A^T A prints as the diagonal matrix $\operatorname{diag}(8, 4, 4)$ — the columns are orthogonal, just as the Fourier structure promises. The recovered interference coefficients are [3. 4. -1.], exactly the DC level, cosine amplitude, and sine amplitude we injected; the projection has perfectly identified the contaminant from its known shapes. The cleaned signal matches the true clean component to about $10^{-15}$ — machine precision. And trace(P) = 3.0 equals the dimension of the three-dimensional interference subspace, while P^2 == P and P^T == P confirm $P$ is a genuine orthogonal projection.

Why the recovery is exact here (and only approximate in general)

The denoising worked perfectly — not approximately, but to the last decimal — and the reason is worth dwelling on, because it pinpoints exactly when subtraction-by-projection is lossless. The clean signal we wanted to keep was a frequency-$2$ cosine, and the interference subspace was spanned by frequency-$0$ (DC) and frequency-$1$ waveforms. By the orthogonality of Fourier components, the clean signal is orthogonal to the entire interference subspace: a frequency-$2$ wave dotted with a frequency-$1$ wave, or with the constant, is zero over a full window.

When the thing you want to keep is orthogonal to the thing you are removing, projection separates them with surgical precision. The recording $\mathbf{y} = \text{clean} + \text{interference}$ is already its own orthogonal decomposition relative to $S$: the interference is the entire $S$-component (so $P\mathbf{y} = \text{interference}$ exactly) and the clean signal is the entire $S^{\perp}$-component (so $(I-P)\mathbf{y} = \text{clean}$ exactly). No leakage, because the two pieces never overlapped in the first place.

If instead the signal you wanted to keep had shared a frequency with the interference — say the clean signal also had a frequency-$1$ component — then projecting out the frequency-$1$ subspace would have removed that part of the genuine signal too. The removal would be approximate: you would lose whatever fraction of the real signal happened to point into the interference subspace. This is the fundamental limit of subtraction-by-projection, and it is just the orthogonal decomposition being honest: $I - P$ removes the entire $S$-component, signal and noise alike. You can cleanly remove only what is orthogonal to what you wish to keep.

The lesson in one line. Projection-based denoising is exact precisely when signal $\perp$ noise; otherwise it removes the noise and the signal's shadow on the noise subspace. Designing a good filter means choosing an interference subspace that captures the contaminant while staying as orthogonal as possible to the true signal.

From this toy to real signal processing

The eight-sample demonstration is the seed of techniques used everywhere. Detrending a time series — removing a linear or polynomial drift before analysis — is projecting onto the span of $\{\mathbf{1}, t, t^2, \dots\}$ and keeping the residual; econometric and climate data are routinely detrended this way. Notch filtering mains hum out of audio or ECG recordings is removing the span of the offending sinusoids, exactly our $I - P$. Background subtraction in spectroscopy fits and removes a smooth baseline (a low-dimensional subspace of slow shapes) to expose sharp peaks. In each case the engineer identifies a subspace that the contaminant lives in and projects it away with $I - P$.

The connection to Chapter 22 is direct and is the reason this case study sits where it does in the book. Computing the amount of each frequency present in a signal — the Fourier coefficients — is projecting the signal onto the orthogonal basis of sinusoids, one frequency at a time, with the simple formula $\mathbf{p} = \sum_k (\mathbf{q}_k\cdot\mathbf{y})\mathbf{q}_k$ from §19.9. Filtering is then nothing but keeping some of those projected components and discarding others. The Fast Fourier Transform is a clever way to compute all these projections at once. When you reach Chapter 22 and treat functions as vectors, you will recognize that you have already been doing Fourier analysis here — in finite dimensions, with three basis vectors — under the name "removing interference."

Takeaway

Denoising by projection inverts the logic of regression: you model the noise as a low-dimensional subspace and use the complementary projector $I - P$ to subtract it, keeping the orthogonal remainder as your cleaned signal. The recovery is exact exactly when the signal is orthogonal to the noise subspace — the orthogonal decomposition $\mathbf{y} = \mathbf{p} + \mathbf{e}$ separating the two cleanly. The orthogonality of sinusoids that made our Gram matrix diagonal is the same orthogonality that powers Fourier analysis and image-compression algorithms, a thread the chapter ties forward to dimensionality reduction and the broader idea that the most informative representations of data are built from perpendicular directions.