Chapter 31 — Further Reading

DataField.Dev

Chapter 31 — Further Reading

Annotated pointers for going deeper on low-rank approximation, the Eckart-Young theorem, image compression, denoising, and the dimensionality-reduction applications. Page and section numbers are approximate and edition-dependent; use them as a guide rather than a precise locator.

Core textbooks

Gilbert Strang, Introduction to Linear Algebra (5th/6th ed.), Chapter 7 ("The Singular Value Decomposition"). The closest match to this chapter's spirit. Strang presents the SVD geometry-first, states the Eckart-Young result that the truncated SVD is the best low-rank approximation, and uses image compression as the motivating example — the same anchor we use. His framing of $A_k = \sum_{i=1}^k\sigma_i\mathbf{u}_i\mathbf{v}_i^{\mathsf{T}}$ and the storage savings is the canonical undergraduate presentation, and Section 7.3 leads directly into our Chapter 32 (PCA).
Sheldon Axler, Linear Algebra Done Right (3rd/4th ed.), the singular value decomposition sections (Chapter 7 in the 4th ed.). The proof-led, coordinate-free view. Axler develops the SVD and the singular value characterization abstractly; pair this with the Math-Major Sidebar in §31.4.1 for a rigorous route to the Eckart-Young optimality via the min-max (Courant-Fischer) characterization of singular values.
Stephen Boyd & Lieven Vandenberghe, Introduction to Applied Linear Algebra (VMLS) (freely and legally available online). The applications-first angle. VMLS emphasizes least squares, data matrices, and low-rank models as practical tools, matching this chapter's recommender and dimensionality-reduction applications and the storage/compression accounting of §31.3.

On the Eckart-Young theorem and matrix approximation

Carl Eckart & Gale Young, "The approximation of one matrix by another of lower rank," Psychometrika 1 (1936), 211–218. The original source for the optimality of the truncated SVD, written — strikingly — in a data-analysis (factor-analysis) context rather than pure mathematics [verify]. Short and readable; of historical interest as the moment low-rank approximation entered applied work.
Leon Mirsky, "Symmetric gauge functions and unitarily invariant norms," Quart. J. Math. (1960). The generalization of Eckart-Young to all unitarily invariant norms (covering both the Frobenius and operator norms at once), which is why the full statement is often called the Eckart-Young-Mirsky theorem [verify].
Roger Horn & Charles Johnson, Matrix Analysis (2nd ed.), Chapter 7 (singular values and the SVD). The definitive reference for the singular-value inequalities behind the Eckart-Young lower bound (Exercises 31.16 and 31.18). Encyclopedic; use as a lookup once you want every detail.
Gene Golub & Charles Van Loan, Matrix Computations (4th ed.). The standard reference on computing the SVD and low-rank approximations stably and efficiently; the bridge to our Chapter 38 (Numerical Linear Algebra) and to the randomized algorithms below.

On image compression and the SVD demo

numpy / scipy documentation for numpy.linalg.svd (use full_matrices=False for the "thin" SVD, as in this chapter), numpy.linalg.norm (the "fro" Frobenius norm and the 2 operator norm), and scipy.linalg.svd. These are the exact routines behind every code block in the chapter.
Pillow (PIL) documentation, especially Image.open(...).convert("L") for loading a grayscale image as a matrix — the first step of toolkit/capstone/image_compression.py (our Build Your Toolkit). For the broader engineering of pixels, color channels, and file formats, see the companion treatment of working with images.
Many free "SVD image compression" tutorials and notebooks demonstrate the rank-10 → rank-200 progression on real photographs; running one (or the chapter's toolkit script) on your own photo is the single most convincing way to see the Eckart-Young theorem. Beware tutorials that confuse the rank-$k$ shape with its storage cost — the §31.1 Common Pitfall.

On denoising and dimensionality reduction

Trevor Hastie, Robert Tibshirani & Jerome Friedman, The Elements of Statistical Learning (free online), the principal-components and matrix-completion sections. The rigorous version of our dimensionality-reduction and recommender material, and the natural sequel once you reach Chapter 32. Connects directly to the broader practice of dimensionality reduction.
Jonathon Shlens, "A Tutorial on Principal Component Analysis" (free online). A famously clear, geometry-first walkthrough of PCA as the SVD of centered data — the exact forward link of §31.8 to Chapter 32.
Emmanuel Candès, Xiaodong Li, Yi Ma & John Wright, "Robust Principal Component Analysis?" (J. ACM, 2011). The paper that put the low-rank-plus-sparse decomposition $A = L + S$ on a rigorous footing — the production method behind the background-removal of Case Study 31.2. Technical, but the introduction is accessible and motivating.
N. Halko, P.-G. Martinsson & J. Tropp, "Finding Structure with Randomness" (SIAM Review, 2011). The modern randomized SVD that computes just the top $k$ singular values/vectors without forming the whole decomposition — what makes low-rank approximation feasible at the scale of real images and datasets. The computational sequel to this chapter; revisited in Chapter 38.

On recommender systems (Case Study 31.1)

Yehuda Koren, Robert Bell & Chris Volinsky, "Matrix Factorization Techniques for Recommender Systems" (IEEE Computer, 2009). The accessible, widely cited account of how matrix factorization (the low-rank idea of this chapter) won the Netflix Prize, including the matrix-completion refinement for missing entries discussed in the case study. The best single starting point; revisited in our Chapter 33.

Visual and intuitive

3Blue1Brown, Essence of Linear Algebra (free), and Steve Brunton's SVD lecture series (free on YouTube). Brunton's videos in particular walk through SVD image compression and denoising with the same geometry-first, application-driven emphasis as this chapter — an excellent second exposure.
Gilbert Strang's MIT 18.06 lectures (free on MIT OpenCourseWare), the SVD lectures. Strang narrates the truncated SVD and image compression with the geometry-first emphasis of this chapter.

Historical

For the history of Eckart-Young (1936), Mirsky's generalization (1960), and the 19th-century origins of the SVD in Beltrami and Jordan [verify], the historical notes in Strang and Golub-Van Loan, and the biographical entries in the MacTutor History of Mathematics archive (online), are reliable starting points. Treat any single secondary source's dates with mild caution and corroborate before citing.