Chapter 8 — Further Reading

DataField.Dev

Chapter 8 — Further Reading

Annotated pointers for going deeper on matrix multiplication as composition and non-commutativity. Each entry says what to read and why it complements this chapter. Start with the 3Blue1Brown video — it animates the exact "composition" picture this chapter is built on.

Watch first (composition, animated)

3Blue1Brown, Essence of Linear Algebra, Chapter 4: "Matrix multiplication as composition" (free, YouTube). The perfect companion to this chapter: Grant Sanderson animates applying one transformation after another and shows, frame by frame, that the product matrix is the combined motion — and that swapping the order changes the result. This is our §8.3–§8.7 in motion. Chapter 5 ("Three-dimensional linear transformations") extends the same idea to 3D, where non-commutativity becomes even more vivid.

Core textbooks (the standard references for this book)

Gilbert Strang, Introduction to Linear Algebra (5th/6th ed.), §§2.4–2.5 ("Matrix operations" and "Inverse matrices" lead-in). Strang introduces matrix multiplication through several of the same readings we use — the column picture, the row picture, and the entry rule — and stresses that $AB$ combines transformations. His "five ways to multiply matrices" discussion is the direct ancestor of our five-readings table (including the outer-product / column-times-row view). His MIT 18.06 Lecture 3 ("Multiplication and inverse matrices") is the matching free video. Best for: the multiple viewpoints and an intuition-first tone that matches ours.
Sheldon Axler, Linear Algebra Done Right (4th ed.), §3.B ("The matrix of a linear map") and §3.C. Axler defines the matrix of a composition of linear maps and derives matrix multiplication so that it represents composition — exactly our §8.3–§8.4, done abstractly and rigorously. He is explicit that the product is defined to make composition work, not the other way around. Best for: math majors who want the proof-first treatment; pairs with our A-track derivation and the associativity proof in §8.10.
Stephen Boyd & Lieven Vandenberghe, Introduction to Applied Linear Algebra (VMLS) (free PDF), Chapter 10 ("Matrix multiplication"). The applied/data-science angle: matrix multiplication as the workhorse of networks, Markov chains, and feature transformations, with careful attention to the cost of multiplication (flop counts) and to associativity as a way to reduce that cost. Best for: CS/data-science readers; directly complements our Markov case study and the matrix-chain efficiency remark.

On specific topics in this chapter

Non-commutativity. For why it matters and when matrices commute, see Strang's discussion of commuting matrices and, for the deeper "shared eigenvectors" condition we previewed, Axler Chapter 5 (or any treatment of simultaneous diagonalization) once you have read our Chapter 23. 3Blue1Brown Chapter 4 gives the cleanest geometric intuition for $AB \ne BA$.
The transpose and the adjoint. Its meaning as $(A\mathbf{x})\cdot\mathbf{y} = \mathbf{x}\cdot(A^{\mathsf{T}}\mathbf{y})$ is developed fully in Part IV; Strang §4.1 and Axler Chapter 7 ("Operators on inner product spaces") are the references. The order-reversal rule $(AB)^{\mathsf{T}} = B^{\mathsf{T}}A^{\mathsf{T}}$ appears in every linear-algebra text's "properties of the transpose" section.
Matrix powers and Markov chains. Grinstead & Snell, Introduction to Probability (free PDF), Chapter 11 ("Markov Chains"), is the classic accessible treatment of the transition-matrix-powers idea behind Case Study 2, including steady states. We make the eigenvalue connection precise in Chapter 23.
Fast matrix multiplication. For the surprising fact that the naive $O(n^3)$ triple loop is not optimal, look up Strassen's algorithm (1969) and the active research on the matrix-multiplication exponent $\omega$. A historical/expository starting point is any algorithms text (e.g. Cormen et al., Introduction to Algorithms, the divide-and-conquer chapter). Best for: CS readers curious why this "simple" operation has a deep complexity theory.

Free, interactive, and visual

MIT 18.06 (Strang), OpenCourseWare — full free lecture videos, problem sets, and exams; the canonical geometry-forward free course. Lectures 3–4 cover multiplication, inverses, and the start of factorization.
The recurring toolkit/visualizer.py in this book's repository — re-run the §8.7 experiment yourself: render R @ S and S @ R side by side, then try your own pair of transforms in both orders. Typing two matrices, multiplying both ways, and looking is the fastest cure for the instinct that $AB = BA$.
Immersive Math, Immersive Linear Algebra (free, interactive) — the chapters on matrices and transformations include manipulable figures that echo our composition pictures.

For the applications in this chapter

Graphics (Case Study 1): Marschner & Shirley, Fundamentals of Computer Graphics, transformation chapters — the model/view/projection chain and homogeneous coordinates built on exactly our composition rules; or the transformations in games material for a game-development framing, including why transform order is a notorious source of bugs.
Markov chains in business/economics (Case Study 2): Grinstead & Snell (above) for the probability; for the linear-algebra-of-iteration view, this book's Chapter 29 (PageRank) is the climax, and the layer-by-layer composition in neural network layers shows the same "compose a learned matrix with the flowing data" pattern at industrial scale.

Where to go next in this book

Chapter 9 (the inverse — undoing a transformation, where the order-reversal rule $(AB)^{-1} = B^{-1}A^{-1}$ mirrors the transpose rule you met here), then Chapter 10 (LU/PLU — factoring a matrix into simpler products) and Chapter 11 (the determinant, where $\det(AB) = \det(A)\det(B)$ makes the "area-scalings multiply" preview rigorous).