Chapter 30 — Further Reading

DataField.Dev

Chapter 30 — Further Reading

Annotated pointers for going deeper on the singular value decomposition — the geometry, the existence proof via $A^{\mathsf{T}}A$, the four-subspaces connection, the pseudoinverse, and the road to applications. Page and section numbers are approximate and edition-dependent; treat them as a guide.

Core textbooks

Gilbert Strang, Introduction to Linear Algebra (5th/6th ed.), Chapter 7 ("The Singular Value Decomposition"). The closest match to this chapter's spirit, and the source of the framing we have adopted. Strang calls the SVD the climax of the course — the place all of linear algebra has been heading — and presents it exactly as we did: $A = U\Sigma V^{\mathsf{T}}$ built from the eigen-decomposition of $A^{\mathsf{T}}A$ and $AA^{\mathsf{T}}$, the singular values as $\sqrt{\lambda_i}$, the four-subspaces picture realized by $U$ and $V$, and the geometry of the ellipse. His Section on the SVD flows directly into low-rank approximation (our Chapter 31) and PCA (our Chapter 32). If you read one other treatment, read this one.
Lloyd N. Trefethen & David Bau III, Numerical Linear Algebra, Lectures 4–5 (and the surrounding lectures on the SVD). The definitive numerical and geometric treatment. Trefethen and Bau open their book with the SVD (not Gaussian elimination), arguing it is the right foundation for the whole subject. Lecture 4 develops the geometry — the image of the unit sphere is a hyperellipse whose semi-axes are the singular values, precisely our §30.1–30.2 — and Lecture 5 proves existence and uniqueness rigorously. Read this for the cleanest modern proof, the careful distinction between full and reduced SVD (our §30.9.1), and the link to the condition number and stability (our §30.9 and Chapter 38).
Sheldon Axler, Linear Algebra Done Right (3rd/4th ed.), the sections on the singular value decomposition and the polar decomposition (Chapter 7). The proof-led, coordinate-free view. Axler derives the SVD abstractly for operators on inner-product spaces and pairs it with the polar decomposition $A = QP$ (our Exercise 30.26), illuminating the SVD as "a rotation times a positive-semidefinite stretch." Read this for the why at full generality and as preparation for the inner-product spaces of Chapter 34.

On the applications and the pseudoinverse

Stephen Boyd & Lieven Vandenberghe, Introduction to Applied Linear Algebra (VMLS) (free and legal online), the chapters on least squares and the pseudoinverse. The applications-first angle. VMLS treats least squares, the pseudoinverse, and overdetermined systems as the practical core of linear algebra — matching Case Study 30.2 — and connects them to data fitting, control, and estimation. Excellent for engineers and data scientists.
Carl Eckart & Gale Young, "The approximation of one matrix by another of lower rank" (1936). The original paper behind the low-rank-approximation result we tee up for Chapter 31 (the Eckart–Young theorem). Short and historically important; the precise attribution is contested (Schmidt and Weyl had related results) [verify], but this is the standard citation.
Trevor Hastie, Robert Tibshirani & Jerome Friedman, The Elements of Statistical Learning (free online), the sections on PCA and the SVD. Establishes the SVD as the computational core of PCA (our Chapter 32) and of many statistical-learning methods — the rigorous version of the dimensionality-reduction thread, and the natural sequel once you reach our PCA and dimensionality reduction treatments.

On latent semantic analysis and text (Case Study 30.1)

Scott Deerwester, Susan Dumais, et al., "Indexing by Latent Semantic Analysis" (1990). The founding LSA paper — the rigorous source for Case Study 30.1, showing how the truncated SVD of a term-document matrix surfaces latent topics and improves retrieval. A clean, readable demonstration that the SVD extracts meaning from co-occurrence.
Christopher Manning, Prabhakar Raghavan & Hinrich Schütze, Introduction to Information Retrieval (free online), the chapter on matrix decompositions and LSI. A modern, textbook treatment of LSA/LSI and the SVD in information retrieval, connecting it forward to word embeddings and the methods of Chapter 33.

On the SVD's computation and numerics

Gene H. Golub & Charles F. Van Loan, Matrix Computations (4th ed.). The encyclopedic reference on how the SVD is actually computed — the Golub–Kahan bidiagonalization and the implicit-QR-based SVD algorithm — and why real software never forms $A^{\mathsf{T}}A$ (our §30.5 Math-Major Sidebar). Graduate-level; use as a lookup once you care about the algorithm behind np.linalg.svd.
numpy / scipy documentation for numpy.linalg.svd, numpy.linalg.pinv, numpy.linalg.matrix_rank, numpy.linalg.cond, and scipy.linalg.svd. Note the full_matrices argument (full vs reduced SVD, §30.9.1) and that svd returns $V^{\mathsf{T}}$, not $V$ (the convention trap of §30.4). matrix_rank and cond are SVD-based — they count and ratio singular values, exactly as in §30.7 and §30.9.

Visual and intuitive

3Blue1Brown, Essence of Linear Algebra. While there is no single SVD video, the series' treatment of linear transformations, changes of basis, and eigenvectors builds exactly the geometric intuition — "what does this matrix DO to space?" — that makes the rotate–stretch–rotate picture of §30.1 and the visualizer of §30.11 click. Watch the determinant and change-of-basis episodes alongside this chapter.
Gilbert Strang's MIT 18.06 lectures (free on MIT OpenCourseWare and YouTube), the lectures on the SVD. Strang narrates $A = U\Sigma V^{\mathsf{T}}$, the $A^{\mathsf{T}}A$ construction, and the four-subspaces connection with the same geometry-first emphasis as this chapter — and audibly delights in calling it the high point of the subject. An excellent second exposure.

Historical

For the contested early history of the SVD — independent discoveries by Beltrami (1873) and Jordan (1874), with later extensions by Sylvester, Schmidt, and Weyl, and the Eckart–Young low-rank result of 1936 [verify] — and the Golub–Kahan stable algorithm of 1965 [verify], good starting points are the historical surveys by G. W. Stewart, "On the Early History of the Singular Value Decomposition" (SIAM Review, 1993) and the biographical entries in the MacTutor History of Mathematics archive (online). Stewart's survey is the authoritative account; treat single secondary sources' dates with mild caution and corroborate before citing, since the multiple independent discoveries make the precise history genuinely tangled.