Case Study 21.2 — Rotating Data Without Distorting It: Decorrelation and Whitening

Field: data science & machine learning (non-physics). Forward links: the orthogonal matrix here is the eigenvector matrix from the spectral theorem (Chapter 27); the full pipeline becomes Principal Component Analysis (Chapter 32).

The problem

A data scientist at a credit-scoring firm has a dataset where two features — say, annual income and total credit limit — are strongly correlated. Correlated features are a nuisance: they make many algorithms unstable, they obscure which direction in the data actually carries information, and they confuse distance-based methods. The standard fix is to rotate the data into new axes along which the features are uncorrelated — a step called decorrelation, and when followed by rescaling, whitening. The transformation that does the rotating is an orthogonal matrix, and this case study is about why that choice is exactly right.

The non-negotiable requirement is this: rotating the data must not change the relationships between data points. Two customers who were similar (close together) must stay equally close after the rotation; two who were dissimilar must stay equally far. If the transformation distorted distances, every downstream model — clustering, $k$-nearest-neighbors, anomaly detection — would be quietly corrupted, because those models read meaning directly from distances. We need a transformation that re-expresses the data in better coordinates while preserving its geometry exactly. That is precisely the defining property of an orthogonal matrix (§21.3–21.4).

Distances survive a rotation — exactly

First, the guarantee that makes everything else legitimate. Apply a rotation $Q$ to every data point (each row of the data matrix). Because $Q$ is orthogonal, it preserves the length of every vector, and therefore the distance between every pair of points: $\lVert Q\mathbf{x}_i - Q\mathbf{x}_j\rVert = \lVert Q(\mathbf{x}_i - \mathbf{x}_j)\rVert = \lVert\mathbf{x}_i - \mathbf{x}_j\rVert$. The middle step is linearity; the last is §21.3. Let us confirm it on a handful of points:

# A rotation preserves every pairwise distance between data points — exactly.
import numpy as np
rng = np.random.default_rng(0)
X = rng.standard_normal((5, 2))                       # five 2-D data points (rows)
theta = np.deg2rad(37)
Q = np.array([[np.cos(theta), -np.sin(theta)],
              [np.sin(theta),  np.cos(theta)]])
Xr = X @ Q.T                                          # rotate each row

def pairwise(M):
    return np.array([np.linalg.norm(M[i] - M[j])
                     for i in range(len(M)) for j in range(i+1, len(M))])

print("original distances:", np.round(pairwise(X), 4))
print("rotated  distances:", np.round(pairwise(Xr), 4))
print("all preserved:", np.allclose(pairwise(X), pairwise(Xr)))
original distances: [0.5666 0.8253 1.5978 1.4044 1.2038 1.0722 1.9195 1.9306 1.6357 2.9877]
rotated  distances: [0.5666 0.8253 1.5978 1.4044 1.2038 1.0722 1.9195 1.9306 1.6357 2.9877]
all preserved: True

Every pairwise distance is identical to four decimals — to all digits, in fact, since the equality is exact. The rotation has reoriented the cloud of points without deforming it at all. This is the green light: we may rotate the data into any orientation we like, confident that its internal geometry is untouched.

Finding the right rotation: the eigenvectors of the covariance

Which rotation decorrelates the features? The directions we want are the eigenvectors of the covariance matrix, and — by a theorem we will prove in full as the spectral theorem of Chapter 27 — those eigenvectors form an orthonormal set, so stacking them as columns produces an orthogonal matrix. Rotating the data into those eigen-axes makes the new features uncorrelated, which shows up as a diagonal covariance matrix (all off-diagonal correlations driven to zero):

# The covariance eigenvectors form an orthogonal matrix; rotating into them decorrelates.
import numpy as np
rng = np.random.default_rng(0)
data = rng.multivariate_normal([0, 0], [[3, 1.5], [1.5, 1]], size=1000)   # correlated
cov = np.cov(data.T)
evals, evecs = np.linalg.eigh(cov)                    # symmetric -> orthonormal eigenvectors

print("eigenvector matrix orthogonal?", np.allclose(evecs.T @ evecs, np.eye(2)))
print("det(eigenvectors) =", round(np.linalg.det(evecs), 4))

rot = data @ evecs                                    # rotate data into the eigen-axes
print("covariance AFTER rotation:\n", np.round(np.cov(rot.T), 4))
eigenvector matrix orthogonal? True
det(eigenvectors) = -1.0
covariance AFTER rotation:
 [[ 0.1899 -0.    ]
 [-0.      3.9684]]

Two things to notice. First, the eigenvector matrix is orthogonal — $Q^{\mathsf{T}}Q = I$ — exactly as this chapter's theory and the spectral theorem (Chapter 27) require. Second, its determinant here is $-1$, which means numpy handed us a reflection rather than a pure rotation. That is perfectly fine: a reflection is just as much an isometry as a rotation (§21.5), so it preserves distances and decorrelates the data equally well; the only difference is a flip of one axis, which does not affect any distance-based analysis. (If your application specifically needs a proper rotation, flip the sign of one eigenvector to force $\det = +1$ — a free move that costs nothing geometrically.) After the orthogonal transformation, the off-diagonal covariance is zero to four decimals: the features are decorrelated, and the two diagonal entries ($\approx 0.19$ and $\approx 3.97$) are the variances along the new axes — the eigenvalues of the covariance.

Why orthogonal, and not just any invertible matrix?

You could decorrelate with a non-orthogonal matrix — there are infinitely many invertible matrices that diagonalize a covariance. But almost all of them distort distances, and that distortion silently rewrites the meaning of your data. An orthogonal change of basis is the unique kind that re-expresses the data in new coordinates while guaranteeing that lengths, angles, and pairwise distances are all preserved exactly. It is the data scientist's "safe move": you get to choose friendlier axes — uncorrelated, ranked by variance — without paying any price in geometric fidelity. This is the same reason the discrete Fourier transform (§21.11) can safely move a signal to the frequency domain, and the same reason the SVD (Chapter 30) can decompose any matrix using orthogonal factors. Rotate into good coordinates, keep all the geometry.

This pipeline — center the data, find the orthogonal eigenvector matrix of the covariance, rotate — is the heart of Principal Component Analysis, which we build in full in Chapter 32. Whitening adds one final step: after the orthogonal rotation, divide each new axis by the square root of its variance, so that every direction has unit variance and the data becomes "spherical." The rotation is the part that preserves geometry; the rescaling is the part that standardizes it. But the rotation must come first, and it must be orthogonal, because that is the only step you can trust not to corrupt the relationships your models depend on.

Takeaways

  • Decorrelating or whitening data requires rotating it into new axes without distorting the relationships between points — exactly the job of an orthogonal matrix (an isometry, §21.3–21.4).
  • A rotation preserves every pairwise distance exactly, which is what makes it safe to apply before any distance-based model.
  • The decorrelating directions are the orthonormal eigenvectors of the covariance matrix (guaranteed orthonormal by the spectral theorem, Chapter 27); numpy may return them as a reflection ($\det = -1$), which is an equally valid isometry.
  • This "rotate with an orthogonal matrix, then optionally rescale" pattern is Principal Component Analysis (Chapter 32) and echoes the SVD (Chapter 30) — orthogonal transformations are the data scientist's trustworthy change of coordinates.