Chapter 28 — Further Reading

DataField.Dev

Chapter 28 — Further Reading

Annotated pointers for going deeper on positive definite matrices, quadratic forms, the definiteness tests, and their applications in optimization and statistics. Page/section numbers are approximate and edition-dependent; use them as a guide rather than a precise locator.

Core textbooks

Gilbert Strang, Introduction to Linear Algebra (5th/6th ed.), Chapter 6.5 ("Positive Definite Matrices"). The closest match to this chapter's spirit and the canonical undergraduate treatment. Strang famously calls positive definite matrices "the high point of linear algebra," and his presentation of the five equivalent tests (we focused on three: eigenvalues, pivots, leading minors), the energy interpretation $\mathbf{x}^{\mathsf{T}}A\mathbf{x} > 0$, and the connection $A = R^{\mathsf{T}}R$ is exactly the geometry-first, application-rich approach of this chapter. His section on the ellipse $\mathbf{x}^{\mathsf{T}}A\mathbf{x} = 1$ and its axes is the source picture for our Figure 28.1. Read this first.
Sheldon Axler, Linear Algebra Done Right (3rd/4th ed.), Chapter 7 ("Operators on Inner Product Spaces"), sections on positive operators and the polar/singular-value decompositions. The proof-led, coordinate-free view. Axler develops positive operators abstractly (self-adjoint with non-negative eigenvalues), proves each has a unique positive square root, and connects them to the SVD — the rigorous backbone behind our Cholesky "square root" intuition and the §28.7 link to $A^{\mathsf{T}}A$. The natural bridge from this chapter to Chapter 30.
Stephen Boyd & Lieven Vandenberghe, Convex Optimization, Appendix A and Chapters 2–3. The definitive applied reference for the optimization half of this chapter. The appendix reviews positive (semi)definiteness, the matrix inequality $A \succeq 0$ notation, and Schur complements; Chapters 2–3 build convexity on the foundation that a twice-differentiable function is convex iff its Hessian is positive semidefinite everywhere (our §28.6.1). Freely and legally available online — the standard text for why optimization loves bowls.
Stephen Boyd & Lieven Vandenberghe, Introduction to Applied Linear Algebra (VMLS), Chapters 10–11 and 15. A gentler applied on-ramp covering quadratic forms, least squares, and the positive semidefinite Gram matrix $A^{\mathsf{T}}A$, with the data-science framing of §28.7. Also free online.

On the statistics and data-science side

Trevor Hastie, Robert Tibshirani & Jerome Friedman, The Elements of Statistical Learning, Chapters 3–4. For covariance matrices, the Mahalanobis distance of Case Study 28.1, and how positive definiteness underwrites ridge regression's $\Sigma + \varepsilon I$ regularization. Freely available online. The forward connection to PCA (their Chapter 14) is our Chapter 32.
Christopher Bishop, Pattern Recognition and Machine Learning, Chapter 2 (especially §2.3 on the Gaussian). The multivariate Gaussian's density is $\exp(-\tfrac12\mathbf{r}^{\mathsf{T}}\Sigma^{-1}\mathbf{r})$ — a positive definite quadratic form — and its contours are exactly the Mahalanobis ellipses of Case Study 28.1. The clearest derivation of why the covariance must be positive definite for the density to be normalizable.
The companion volume's treatment of covariance matrices develops the statistical foundation — variance, covariance, correlation — that this chapter's §28.7 reframes as positive semidefiniteness, and connects directly to the data ellipse.

Numerical and computational

Lloyd N. Trefethen & David Bau, Numerical Linear Algebra, Lecture 23 ("Cholesky Factorization"). The definitive treatment of the Cholesky factorization $A = LL^{\mathsf{T}}$: the algorithm, its roughly-half-the-work cost advantage over $LU$, and its outstanding numerical stability (it needs no pivoting precisely because $A$ is positive definite). The rigorous version of §28.8.
Gene Golub & Charles Van Loan, Matrix Computations (4th ed.), Chapter 4. The encyclopedic reference for Cholesky, the $LDL^{\mathsf{T}}$ factorization, and the modified-Cholesky methods used in optimization to force a Hessian to be positive definite when it is not. Use as a lookup.
numpy / scipy documentation for numpy.linalg.cholesky, numpy.linalg.eigvalsh, scipy.linalg.ldl (the $LDL^{\mathsf{T}}$ factorization giving the pivots), and scipy.linalg.cho_factor / cho_solve (solving $A\mathbf{x}=\mathbf{b}$ via Cholesky, the fast method for positive definite systems). Use eigvalsh, not eig, for symmetric matrices — see the Computational Note in §28.8.

On the optimization connection

Jorge Nocedal & Stephen Wright, Numerical Optimization (2nd ed.), Chapters 2–3. For the second-derivative test, the role of the Hessian's definiteness in classifying critical points (§28.6.2), and conditioning — why a large condition number slows gradient descent and how preconditioning and Newton's method reshape the bowl (Case Study 28.2). The standard graduate reference.
The companion volume's treatment of optimization develops gradient descent, convexity, and conditioning in the data-science setting, picking up exactly where Case Study 28.2 leaves off.
For the calculus prerequisite — the multivariable second-derivative test and the Hessian — see the second-derivative test, which this chapter recasts in the language of definiteness.

Visual and intuitive

3Blue1Brown, Essence of Linear Algebra. While there is no episode dedicated to quadratic forms, the eigenvector/eigenvalue episodes pair naturally with §28.3's claim that the eigenvectors are the principal axes and the eigenvalues are the curvatures. The visualizer's "what does this do to space?" framing is the right mindset for the bowl-and-ellipse picture.
Gilbert Strang's MIT OpenCourseWare lectures (18.06), the positive-definite lecture. A free recorded lecture that walks through the tests and the ellipse picture in Strang's inimitable geometry-first style — the spoken companion to his Chapter 6.5.

Historical

Sylvester's criterion and Sylvester's law of inertia are attributed to James Joseph Sylvester, who studied the invariants of quadratic forms in the 1850s [verify]; the "law of inertia" name reflects that the signs (the inertia $(n_+, n_-, n_0)$) are preserved under change of variables. Quadratic forms themselves were studied much earlier by Gauss, Lagrange, and Cauchy, and the principal-axis theorem has a long history that is easy to garble — corroborate specific attributions before citing.
The Cholesky factorization is named for André-Louis Cholesky, a French military officer and geodesist who developed it for solving the normal equations in surveying; it was published posthumously around 1924, after his death in the First World War [verify]. For the history, the historical notes in Strang and the biographical entries in the MacTutor History of Mathematics archive (online) are reliable starting points; treat any single secondary source's dates with mild caution.