Appendix G — Historical Notes

Linear algebra did not arrive as the clean, geometric subject you have just read. It was assembled over roughly two centuries, out of problems that did not look related: solving equations, computing determinants, understanding rotations, ranking webpages. The transformation viewpoint that organizes this book — "a matrix is a function that transforms space" — is, historically, one of the last ideas to crystallize, not the first. What follows is a narrative of how the pieces came together, in roughly the order the mathematics matured rather than the order we teach it.

A word on the dates. The history of mathematics is genuinely contested at the edges: who "first" had an idea, when a notation was "invented," and what a long-dead mathematician "really meant" are often matters of scholarly debate. Where a date or attribution is the standard textbook account but I cannot vouch for it with confidence, I mark it [verify]. Treat those flags as honest uncertainty, not decoration.

G.1 Elimination before matrices: China, Newton, and Gauss

The oldest idea in this book is also the one we teach first: systematically eliminating unknowns from a system of linear equations. A version of it appears in the Chinese text Jiuzhang Suanshu (The Nine Chapters on the Mathematical Art), whose eighth chapter solves systems by an array-and-elimination procedure essentially identical to modern row reduction — and which predates the European treatment by well over a thousand years (the text is usually dated to the Han dynasty, around 100 BCE–100 CE [verify]). This is worth pausing on: elimination is not a European invention, and it is far older than the word "matrix."

In Europe, Isaac Newton wrote down rules for eliminating unknowns in the late 1600s [verify], and the method that now bears Carl Friedrich Gauss's name was used by Gauss in the early 1800s in the course of his work on celestial mechanics and least squares — famously in computing the orbit of the asteroid Ceres around 1801 [verify]. Crucially, Gauss was not "doing matrix theory." He was solving equations with numbers. The array of coefficients was a bookkeeping device, not yet an object with a life of its own. The algorithm of Chapter 4 is, in spirit, Gauss's — but the picture of it as a sequence of shear-and-scale transformations is a much later gloss. The name Gauss–Jordan, for the reduced form, attaches Wilhelm Jordan (a geodesist, 1842–1899) to the fully reduced version [verify]; note this is a different Jordan from Camille Jordan of §G.6, a confusion worth flagging because both names appear in linear algebra courses.

G.2 Determinants: the answer that came before the question

Strangely, determinants were studied before matrices. The idea of a single number that decides whether a system has a unique solution — and that, geometrically, measures a signed volume — was developed in the 1700s and 1800s by a long list of mathematicians: Leibniz had the germ of it in correspondence (around the 1690s [verify]), and over the next century Cramer (the rule of 1750 [verify]), Vandermonde, Laplace (cofactor expansion, late 1700s [verify]), and Cauchy all contributed. Augustin-Louis Cauchy gave an extensive, systematic treatment in the early 1800s and is often credited with fixing the modern usage of the term déterminant [verify].

So the determinant of Chapter 11 arrived as a computational answer — a formula for solvability — decades before anyone organized the coefficients into a standalone "matrix" to take the determinant of. The geometric reading we lead with, "the determinant is the factor by which the transformation scales area or volume," is the modern synthesis: it unites the old algebraic formula with the transformation viewpoint that did not yet exist when the formula was found.

G.3 Cayley, Sylvester, and the birth of the matrix (1850s)

The word matrix was coined by James Joseph Sylvester, in 1850 [verify], for a rectangular array out of which determinants (minors) could be cut — he used the Latin sense of "matrix" as a womb or source, the thing the determinants are born from. It was his friend and collaborator Arthur Cayley who, in his Memoir on the Theory of Matrices of 1858 [verify], treated the array itself as an algebraic object you can add and, crucially, multiply — and who recognized that matrix multiplication corresponds to the composition of linear substitutions (what we would call composition of transformations, the heart of Chapter 8). Cayley also stated the result now called the Cayley–Hamilton theorem — that a matrix satisfies its own characteristic equation — though he verified it only in small cases; the general proof came later [verify].

This is the moment the central object of our book is born, and it is worth noting why multiplication is defined the "strange" way it is. Cayley did not invent the row-times- column rule arbitrarily; he reverse-engineered it so that the matrix of a composition of two substitutions equals the product of their matrices. That is exactly the derivation Chapter 8 insists on: multiplication is composition first, and the arithmetic rule second. The history vindicates the pedagogy.

G.4 Grassmann, Peano, and the abstract vector space (1844–1888)

While Cayley and Sylvester worked with concrete arrays of numbers, a deeper and stranger idea was developing in parallel: that vectors and the spaces they live in could be defined abstractly, by axioms, without reference to coordinates at all. Hermann Grassmann's Ausdehnungslehre ("Theory of Extension"), first published in 1844 and revised in 1862, laid out a remarkably modern theory of linear combinations, independence, and dimension — but it was written so opaquely that it was largely ignored for decades [verify].

The clean axiomatic definition of a vector space that you met in Chapter 5 is usually credited to Giuseppe Peano, who in 1888 gave essentially the modern axioms [verify]. The abstraction did not become standard, though, until the 20th century, when it was absorbed into functional analysis and made canonical by the influential textbooks of the 1930s and later (van der Waerden's Moderne Algebra, 1930–31, and others) [verify]. The lesson of Chapter 5 — that the same axioms govern arrows, polynomials, functions, and matrices — is Grassmann's and Peano's insight, finally given the prominence it deserved roughly half a century after it was first written down.

G.5 Gram, Schmidt, and orthogonalization (1880s–1900s)

The procedure of Chapter 20 for turning any basis into an orthonormal one carries two names. Jørgen Pedersen Gram, a Danish mathematician (and actuary), developed the underlying ideas in the context of least-squares and series expansions in the 1880s [verify]; Erhard Schmidt presented the explicit step-by-step algorithm in the early 1900s, around 1907 [verify], in his work on integral equations and what became Hilbert space. As is common, the algorithm was known in some form earlier — Laplace and Cauchy had pieces of it — so "Gram–Schmidt" is a label of convenience as much as strict priority [verify]. The closely related QR factorization that Chapter 20 builds from it is a 20th-century numerical-analysis formulation; the modified Gram–Schmidt and Householder approaches that make it numerically stable (Chapter 38) belong to the era of digital computing.

G.6 Camille Jordan and canonical form (1870)

Not every matrix can be diagonalized, and the precise description of what to do instead is the Jordan normal form of Chapter 36. It is named for Camille Jordan (1838–1922), who treated it in his Traité des substitutions of 1870 [verify] — again in the language of substitutions, the 19th-century word for what we call linear maps. (As noted in §G.1, this is a different Jordan from Wilhelm Jordan of Gauss–Jordan elimination — a frequent source of confusion.) The generalized eigenvectors and Jordan blocks of Chapter 36 give the complete answer to "what is the simplest matrix similar to a given one?" — the question diagonalization answers only in the lucky, non-defective case.

G.7 Perron, Frobenius, and positive matrices (1907–1912)

The theorem that guarantees PageRank works — that a suitable nonnegative matrix has a positive dominant eigenvalue with a positive eigenvector — is the Perron–Frobenius theorem. Oskar Perron proved the version for strictly positive matrices around 1907, and Georg Frobenius extended it to nonnegative (irreducible) matrices in a series of papers ending around 1912 [verify]. They had no inkling of webpages; their motivation was the algebra of matrices with nonnegative entries. Yet this is the theorem that, a century later, underwrites the dominant-eigenvector ranking of Chapter 29 and the steady states of Markov chains throughout the book. It is a clean example of the book's recurring theme that linear algebra is the most applied branch of pure mathematics: a result proven for its own sake becomes, much later, the mathematics of everything.

G.8 Beltrami, Jordan, and the SVD (1873–1936)

The singular value decomposition of Chapter 30 — the factorization $A=U\Sigma V^{\mathsf{T}}$ that every matrix admits — has a tangled and genuinely multiple origin. Eugenio Beltrami (1873) and, independently, Camille Jordan (1874) discovered it for square real matrices; James Joseph Sylvester arrived at it again in 1889; and the extension to rectangular and complex matrices, plus the form we now teach, was developed into the 20th century, with Carl Eckart and Gale Young's 1936 theorem on best low-rank approximation (the result behind image compression in Chapter 31) a key modern landmark [verify]. The SVD is thus a case where several people, decades apart, found the same deep fact from different directions — which is itself a hint of how fundamental it is. Its rise to the center of applied linear algebra is recent, driven by the numerically stable algorithms of Gene Golub and William Kahan in the 1960s and by the data-analysis applications that followed [verify].

G.9 Hilbert and the spectrum (early 1900s)

The word spectrum for the set of eigenvalues — and the whole circle of ideas that lets us push eigen-theory into infinite-dimensional function spaces — comes from David Hilbert's work on integral equations in the first decade of the 1900s [verify]. (The term predates its use in physics; the suggestive coincidence with the spectrum of light, and the later discovery that atomic spectra are literally eigenvalue problems in quantum mechanics, was not Hilbert's intent [verify].) The spectral theorem of Chapter 27 — that a symmetric real, or Hermitian complex, operator has an orthonormal eigenbasis of real eigenvalues — is the finite-dimensional heart of this theory, and the inner-product spaces of Chapter 34 and the quantum applications throughout the book are its descendants. "Hilbert space," the setting of Chapter 34's deepest material, is named in his honor.

G.10 Pearson, Hotelling, and PCA (1901–1933)

Principal component analysis (Chapter 32) — finding the orthogonal directions of greatest variance in data — was introduced by Karl Pearson in 1901, geometrically, as fitting the best line or plane to a cloud of points [verify]. Harold Hotelling reframed and named it in 1933, in the statistical language of components that successively capture maximal variance [verify]. The two viewpoints — Pearson's geometric best-fit subspace and Hotelling's variance-maximizing components — are the same computation, and Chapter 32 shows they are both just the eigenvectors of the covariance matrix (equivalently, the SVD of the centered data). PCA is one of the oldest data-science techniques in this book and remains among the most used.

G.11 Page, Brin, and PageRank (1996–1998)

The most recent history in this book is also the most famous. Larry Page and Sergey Brin, then graduate students at Stanford, developed PageRank in 1996–1998, with the algorithm described in the now-classic papers and Page's patent filing of that period [verify]. The idea of Chapter 29 — model the web as a giant stochastic matrix, then take its dominant eigenvector as a ranking of importance, computed by power iteration — turned a 19th-century theorem (Perron–Frobenius, §G.7) and an 1100-page web-link matrix into a company. The "damping factor" that makes the matrix well-behaved is a small, clever patch that guarantees the Perron–Frobenius hypotheses hold. PageRank is the book's clearest demonstration that the classical machinery you are learning is not historical — it is, quite literally, the mathematics that organized the modern internet.


The thread through all of this is the book's first theme. For most of the story above, the mathematicians were not thinking "a matrix is a function that transforms space." They were solving equations, computing volumes, ranking objects, fitting data. The transformation viewpoint is the modern lens that reveals all of these as the same subject — and it is the lens this book hands you on page one, so that two centuries of separately discovered ideas arrive already unified.