Part VI — Matrix Decompositions

In Part V you learned to see a transformation's true character by finding the directions it leaves invariant. But that gift came with fine print: diagonalization works only for square matrices, and only when they have enough eigenvectors. Part VI removes the fine print entirely. Its centerpiece, the Singular Value Decomposition, applies to every matrix that exists — tall, wide, square, singular, real or complex — and it is, by a wide margin, the most useful single result in all of applied linear algebra. If you remember one theorem from this book a decade from now, make it the SVD.

The big question: is there a way to understand any matrix as a sequence of the simplest possible motions? The answer is a clean and almost startling yes. Every linear transformation, no matter how it distorts space, factors into exactly three steps: rotate, then stretch along perpendicular axes, then rotate again. That is the geometric content of $A = U\Sigma V^{\mathsf{T}}$ — $V^{\mathsf{T}}$ rotates the input, $\Sigma$ stretches each axis by a singular value, and $U$ rotates the result into place. Our 2D visualizer makes its final and most satisfying appearance here, decomposing a single deformation of the unit square into this rotate–stretch–rotate sequence you can watch frame by frame.

Chapter 30, The Singular Value Decomposition, builds $A = U\Sigma V^{\mathsf{T}}$ from the spectral theorem applied to $A^{\mathsf{T}}A$, proving it exists for every matrix and revealing the singular values as the true stretch factors of a transformation. Chapter 31, SVD Applications, delivers the book's signature "wow" moment — the image-compression anchor teased in Chapter 30. Keep only the largest few singular values and you get a low-rank approximation: a rank-10 image is a blurry ghost, but a rank-200 reconstruction is visually indistinguishable from the original, using a fraction of the data. The same low-rank idea denoises signals and fills in missing entries. Chapter 32, Principal Component Analysis, reveals that the workhorse of data science is the SVD in disguise: center your data, form its covariance, and the principal components — the directions of maximum variance — are exactly its singular directions. This is how high-dimensional data gets compressed to its essential few dimensions.

Part VI closes with Chapter 33, Application: Machine Learning, where these decompositions power the systems reshaping the world. A neural-network layer is a matrix multiplication followed by a nonlinearity; word and image embeddings are learned vectors in high-dimensional space; and the recommender systems behind streaming and shopping are, at their core, matrix factorizations — the very low-rank idea from Chapter 31, applied to a giant table of who-likes-what.

The theme linear algebra is the most applied branch of pure mathematics reaches full force here: one decomposition — the SVD — compresses images, denoises data, extracts principal components, solves least-squares, and underlies recommender systems. Learn it once, use it everywhere is not a slogan in this part; it is a literal description of what the SVD does. And the geometry-algebra unity holds to the end: a singular value is at once an entry of a diagonal matrix, the length of an axis of a deformed sphere, and the amount of variance captured along a principal direction.

A note on what makes this part click rather than merely compute. It is tempting to treat the SVD as a black box you call np.linalg.svd on. Resist that. The reward of Part VI is geometric — every matrix is a rotation, a set of independent stretches, and another rotation — and once you hold that picture, low-rank approximation, PCA, and recommender systems stop being separate tricks and reveal themselves as one idea seen from different angles.

By the end of Part VI you will be able to: compute and interpret the SVD $A=U\Sigma V^{\mathsf{T}}$ as rotate–stretch–rotate, for any matrix; construct low-rank approximations and apply them to compress images and denoise data; perform PCA and explain it as variance maximization along singular directions; and describe how neural-network layers, embeddings, and matrix-factorization recommenders rest on these decompositions. Your toolkit gains svd.py and pca.py, verified against numpy.

You have now seen the core of the subject end to end. Part VII opens the doors that lead onward — abstract inner product spaces, the Jordan form for the matrices diagonalization couldn't tame, the matrix exponential that solves differential equations, and the numerical realities of doing all this on a finite-precision machine.

Chapters in This Part