Appendix G — Historical Notes
Linear algebra did not arrive as the clean, geometric subject you have just read. It was assembled over roughly two centuries, out of problems that did not look related: solving equations, computing determinants, understanding rotations, ranking webpages. The transformation viewpoint that organizes this book — "a matrix is a function that transforms space" — is, historically, one of the last ideas to crystallize, not the first. What follows is a narrative of how the pieces came together, in roughly the order the mathematics matured rather than the order we teach it.
A word on the dates. The history of mathematics is genuinely contested at the edges:
who "first" had an idea, when a notation was "invented," and what a long-dead
mathematician "really meant" are often matters of scholarly debate. Where a date or
attribution is the standard textbook account but I cannot vouch for it with confidence,
I mark it [verify]. Treat those flags as honest uncertainty, not decoration.
G.1 Elimination before matrices: China, Newton, and Gauss
The oldest idea in this book is also the one we teach first: systematically eliminating
unknowns from a system of linear equations. A version of it appears in the Chinese text
Jiuzhang Suanshu (The Nine Chapters on the Mathematical Art), whose eighth chapter
solves systems by an array-and-elimination procedure essentially identical to modern row
reduction — and which predates the European treatment by well over a thousand years
(the text is usually dated to the Han dynasty, around 100 BCE–100 CE [verify]). This is
worth pausing on: elimination is not a European invention, and it is far older than the
word "matrix."
In Europe, Isaac Newton wrote down rules for eliminating unknowns in the late 1600s
[verify], and the method that now bears Carl Friedrich Gauss's name was used by Gauss
in the early 1800s in the course of his work on celestial mechanics and least squares —
famously in computing the orbit of the asteroid Ceres around 1801 [verify]. Crucially,
Gauss was not "doing matrix theory." He was solving equations with numbers. The array of
coefficients was a bookkeeping device, not yet an object with a life of its own. The
algorithm of Chapter 4 is, in spirit, Gauss's — but the picture of it as a sequence of
shear-and-scale transformations is a much later gloss. The name Gauss–Jordan, for the
reduced form, attaches Wilhelm Jordan (a geodesist, 1842–1899) to the fully reduced
version [verify]; note this is a different Jordan from Camille Jordan of §G.6, a
confusion worth flagging because both names appear in linear algebra courses.
G.2 Determinants: the answer that came before the question
Strangely, determinants were studied before matrices. The idea of a single number
that decides whether a system has a unique solution — and that, geometrically, measures a
signed volume — was developed in the 1700s and 1800s by a long list of mathematicians:
Leibniz had the germ of it in correspondence (around the 1690s [verify]), and over the
next century Cramer (the rule of 1750 [verify]), Vandermonde, Laplace (cofactor
expansion, late 1700s [verify]), and Cauchy all contributed. Augustin-Louis Cauchy gave
an extensive, systematic treatment in the early 1800s and is often credited with fixing
the modern usage of the term déterminant [verify].
So the determinant of Chapter 11 arrived as a computational answer — a formula for solvability — decades before anyone organized the coefficients into a standalone "matrix" to take the determinant of. The geometric reading we lead with, "the determinant is the factor by which the transformation scales area or volume," is the modern synthesis: it unites the old algebraic formula with the transformation viewpoint that did not yet exist when the formula was found.
G.3 Cayley, Sylvester, and the birth of the matrix (1850s)
The word matrix was coined by James Joseph Sylvester, in 1850 [verify], for a
rectangular array out of which determinants (minors) could be cut — he used the Latin
sense of "matrix" as a womb or source, the thing the determinants are born from. It was
his friend and collaborator Arthur Cayley who, in his Memoir on the Theory of Matrices
of 1858 [verify], treated the array itself as an algebraic object you can add and,
crucially, multiply — and who recognized that matrix multiplication corresponds to the
composition of linear substitutions (what we would call composition of transformations,
the heart of Chapter 8). Cayley also stated the result now called the Cayley–Hamilton
theorem — that a matrix satisfies its own characteristic equation — though he verified it
only in small cases; the general proof came later [verify].
This is the moment the central object of our book is born, and it is worth noting why multiplication is defined the "strange" way it is. Cayley did not invent the row-times- column rule arbitrarily; he reverse-engineered it so that the matrix of a composition of two substitutions equals the product of their matrices. That is exactly the derivation Chapter 8 insists on: multiplication is composition first, and the arithmetic rule second. The history vindicates the pedagogy.
G.4 Grassmann, Peano, and the abstract vector space (1844–1888)
While Cayley and Sylvester worked with concrete arrays of numbers, a deeper and stranger
idea was developing in parallel: that vectors and the spaces they live in could be defined
abstractly, by axioms, without reference to coordinates at all. Hermann Grassmann's
Ausdehnungslehre ("Theory of Extension"), first published in 1844 and revised in 1862,
laid out a remarkably modern theory of linear combinations, independence, and dimension —
but it was written so opaquely that it was largely ignored for decades [verify].
The clean axiomatic definition of a vector space that you met in Chapter 5 is usually
credited to Giuseppe Peano, who in 1888 gave essentially the modern axioms [verify]. The
abstraction did not become standard, though, until the 20th century, when it was absorbed
into functional analysis and made canonical by the influential textbooks of the 1930s and
later (van der Waerden's Moderne Algebra, 1930–31, and others) [verify]. The lesson of
Chapter 5 — that the same axioms govern arrows, polynomials, functions, and matrices — is
Grassmann's and Peano's insight, finally given the prominence it deserved roughly half a
century after it was first written down.
G.5 Gram, Schmidt, and orthogonalization (1880s–1900s)
The procedure of Chapter 20 for turning any basis into an orthonormal one carries two
names. Jørgen Pedersen Gram, a Danish mathematician (and actuary), developed the
underlying ideas in the context of least-squares and series expansions in the 1880s
[verify]; Erhard Schmidt presented the explicit step-by-step algorithm in the early
1900s, around 1907 [verify], in his work on integral equations and what became Hilbert
space. As is common, the algorithm was known in some form earlier — Laplace and Cauchy had
pieces of it — so "Gram–Schmidt" is a label of convenience as much as strict priority
[verify]. The closely related QR factorization that Chapter 20 builds from it is a
20th-century numerical-analysis formulation; the modified Gram–Schmidt and Householder
approaches that make it numerically stable (Chapter 38) belong to the era of digital
computing.
G.6 Camille Jordan and canonical form (1870)
Not every matrix can be diagonalized, and the precise description of what to do instead is
the Jordan normal form of Chapter 36. It is named for Camille Jordan (1838–1922), who
treated it in his Traité des substitutions of 1870 [verify] — again in the language of
substitutions, the 19th-century word for what we call linear maps. (As noted in §G.1, this
is a different Jordan from Wilhelm Jordan of Gauss–Jordan elimination — a frequent source
of confusion.) The generalized eigenvectors and Jordan blocks of Chapter 36 give the
complete answer to "what is the simplest matrix similar to a given one?" — the question
diagonalization answers only in the lucky, non-defective case.
G.7 Perron, Frobenius, and positive matrices (1907–1912)
The theorem that guarantees PageRank works — that a suitable nonnegative matrix has a
positive dominant eigenvalue with a positive eigenvector — is the Perron–Frobenius
theorem. Oskar Perron proved the version for strictly positive matrices around 1907, and
Georg Frobenius extended it to nonnegative (irreducible) matrices in a series of papers
ending around 1912 [verify]. They had no inkling of webpages; their motivation was the
algebra of matrices with nonnegative entries. Yet this is the theorem that, a century
later, underwrites the dominant-eigenvector ranking of Chapter 29 and the steady states of
Markov chains throughout the book. It is a clean example of the book's recurring theme that
linear algebra is the most applied branch of pure mathematics: a result proven for its own
sake becomes, much later, the mathematics of everything.
G.8 Beltrami, Jordan, and the SVD (1873–1936)
The singular value decomposition of Chapter 30 — the factorization $A=U\Sigma V^{\mathsf{T}}$
that every matrix admits — has a tangled and genuinely multiple origin. Eugenio Beltrami
(1873) and, independently, Camille Jordan (1874) discovered it for square real matrices;
James Joseph Sylvester arrived at it again in 1889; and the extension to rectangular and
complex matrices, plus the form we now teach, was developed into the 20th century, with
Carl Eckart and Gale Young's 1936 theorem on best low-rank approximation (the result behind
image compression in Chapter 31) a key modern landmark [verify]. The SVD is thus a case
where several people, decades apart, found the same deep fact from different directions —
which is itself a hint of how fundamental it is. Its rise to the center of applied linear
algebra is recent, driven by the numerically stable algorithms of Gene Golub and William
Kahan in the 1960s and by the data-analysis applications that followed [verify].
G.9 Hilbert and the spectrum (early 1900s)
The word spectrum for the set of eigenvalues — and the whole circle of ideas that lets
us push eigen-theory into infinite-dimensional function spaces — comes from David Hilbert's
work on integral equations in the first decade of the 1900s [verify]. (The term predates
its use in physics; the suggestive coincidence with the spectrum of light, and the later
discovery that atomic spectra are literally eigenvalue problems in quantum mechanics, was
not Hilbert's intent [verify].) The spectral theorem of Chapter 27 — that a symmetric
real, or Hermitian complex, operator has an orthonormal eigenbasis of real eigenvalues — is
the finite-dimensional heart of this theory, and the inner-product spaces of Chapter 34 and
the quantum applications throughout the book are its descendants. "Hilbert space," the
setting of Chapter 34's deepest material, is named in his honor.
G.10 Pearson, Hotelling, and PCA (1901–1933)
Principal component analysis (Chapter 32) — finding the orthogonal directions of greatest
variance in data — was introduced by Karl Pearson in 1901, geometrically, as fitting the
best line or plane to a cloud of points [verify]. Harold Hotelling reframed and named it
in 1933, in the statistical language of components that successively capture maximal
variance [verify]. The two viewpoints — Pearson's geometric best-fit subspace and
Hotelling's variance-maximizing components — are the same computation, and Chapter 32
shows they are both just the eigenvectors of the covariance matrix (equivalently, the SVD
of the centered data). PCA is one of the oldest data-science techniques in this book and
remains among the most used.
G.11 Page, Brin, and PageRank (1996–1998)
The most recent history in this book is also the most famous. Larry Page and Sergey Brin,
then graduate students at Stanford, developed PageRank in 1996–1998, with the algorithm
described in the now-classic papers and Page's patent filing of that period [verify]. The
idea of Chapter 29 — model the web as a giant stochastic matrix, then take its dominant
eigenvector as a ranking of importance, computed by power iteration — turned a 19th-century
theorem (Perron–Frobenius, §G.7) and an 1100-page web-link matrix into a company. The
"damping factor" that makes the matrix well-behaved is a small, clever patch that guarantees
the Perron–Frobenius hypotheses hold. PageRank is the book's clearest demonstration that the
classical machinery you are learning is not historical — it is, quite literally, the
mathematics that organized the modern internet.
The thread through all of this is the book's first theme. For most of the story above, the mathematicians were not thinking "a matrix is a function that transforms space." They were solving equations, computing volumes, ranking objects, fitting data. The transformation viewpoint is the modern lens that reveals all of these as the same subject — and it is the lens this book hands you on page one, so that two centuries of separately discovered ideas arrive already unified.