Chapter 19 — Further Reading

DataField.Dev

Chapter 19 — Further Reading

Annotated pointers for going deeper on orthogonal projection, the projection matrix, and least squares. Mapped to the three standard texts this book tracks — Strang (geometric/applied), Axler (abstract/proof-first), Boyd–Vandenberghe (applications/optimization) — plus free online resources.

Primary textbook sections

Gilbert Strang, Introduction to Linear Algebra (6th ed.), §4.2–4.3. The closest match to this chapter. §4.2 ("Projections") derives projection onto a line and onto a subspace and builds $P = A(A^{\mathsf{T}}A)^{-1}A^{\mathsf{T}}$ exactly as we do, emphasizing $P^2 = P$ and $P^{\mathsf{T}} = P$; §4.3 ("Least Squares Approximations") connects it to regression and the normal equations. Strang's geometric framing — projection as the heart of least squares — is the spirit of this entire chapter. Read this first if you want a second pass over the same material.
Gilbert Strang, MIT 18.06 lectures (OpenCourseWare / YouTube), Lectures 15–16. "Projections onto Subspaces" and "Projection Matrices and Least Squares." Strang at the chalkboard deriving the projection matrix and motivating least squares; the closest free supplement to this chapter, and the source many readers find clicks the geometry into place. Free at ocw.mit.edu (course 18.06).
Sheldon Axler, Linear Algebra Done Right (4th ed.), §6.B ("Orthogonal Complements and Minimization Problems"). The abstract, inner-product-space treatment: the orthogonal decomposition $V = U \oplus U^{\perp}$, the orthogonal projection $P_U$ defined coordinate-free, and the minimization theorem (our closest-point theorem) proved in full generality. Axler defines the projection by its properties rather than by a matrix formula, which is the right viewpoint for the function spaces of Chapters 22 and 34. For the math track.
Stephen Boyd & Lieven Vandenberghe, Introduction to Applied Linear Algebra (VMLS) (2018), Chapters 12–13. "Least Squares" and "Least Squares Data Fitting." The applications-and-optimization angle: least squares as the foundation of data fitting, the normal equations, and why you solve them via QR (Chapter 20) rather than by inverting $A^{\mathsf{T}}A$. Excellent on the engineering of least squares and packed with real examples. Free PDF at stanford.edu/~boyd/vmls/.

Going deeper on specific threads

Numerical caution — why not to form $A^{\mathsf{T}}A$: Trefethen & Bau, Numerical Linear Algebra (1997), Lectures 11 and 18–19, explains how squaring $A$ squares its condition number and why QR/SVD-based least squares is the professional standard. This is the rigorous backing for the chapter's Computational Note and a preview of Chapter 38.
The hat matrix in statistics: any regression text (e.g. Hastie, Tibshirani & Friedman, The Elements of Statistical Learning, §3.2, free PDF at hastie.su.domains/ElemStatLearn/) develops $\hat{\mathbf{y}} = H\mathbf{y}$ with $H = X(X^{\mathsf{T}}X)^{-1}X^{\mathsf{T}}$ — our projection matrix under the statistician's name — and its leverage interpretation (the diagonal of $H$). Confirms "least squares is projection" from the data-science side.
Oblique vs. orthogonal projections: Meyer, Matrix Analysis and Applied Linear Algebra (2000), §5.9 and §5.13, gives the cleanest treatment of general (oblique) projectors and the characterization that symmetric idempotents are exactly the orthogonal ones — the rigorous version of §19.7's Warning and Math-Major Sidebar.
3Blue1Brown, Essence of Linear Algebra (YouTube). While there is no episode titled "projection," the chapters on dot products, change of basis, and the geometry of linear maps build the visual intuition that makes projection obvious. Pair with Strang's lectures for the geometry-first experience this book aims for.

Documentation and tools

numpy.linalg.lstsq — the function you should actually use for least squares; solves $\min\lVert A\mathbf{x} - \mathbf{b}\rVert$ via SVD without forming $A^{\mathsf{T}}A$. See the NumPy reference manual.
scipy.linalg.lstsq and numpy.linalg.qr — the QR route of Chapter 20, the recommended path for production least squares.
numpy.linalg.matrix_rank — checking the full-column-rank condition before trusting projection coefficients.

How to use these

If you want one more geometric pass, read Strang §4.2–4.3 and watch 18.06 Lectures 15–16. If you are on the math track and want the abstract picture that generalizes to function spaces, read Axler §6.B. If you care about applications and getting the numerics right, read Boyd–Vandenberghe Ch. 12–13 and skim Trefethen & Bau Lecture 11. All four agree on the central message of this chapter: orthogonal projection is the geometry, and least squares is what it looks like when you apply it to data.