Chapter 18 — Further Reading

DataField.Dev

Chapter 18 — Further Reading

Annotated pointers for going deeper on the dot product, norms, angles, Cauchy–Schwarz, and cosine similarity. The three "anchor" textbooks below are referenced throughout this book; we map each chapter to the relevant sections so you can read in parallel. Section numbers follow the most widely circulated editions and may shift slightly between printings.

The three anchor textbooks

Gilbert Strang, Introduction to Linear Algebra (5th ed.), §1.2 (Lengths and Dot Products) and §4.1 (Orthogonality of the Four Subspaces). Strang introduces the dot product and the cosine formula early, in §1.2, with his characteristic geometric framing — the angle, the unit vector, and the Schwarz inequality all appear there. §4.1 then connects orthogonality to the four fundamental subspaces, the right angles this chapter only previewed in §18.11 and that Chapter 19 makes computational. Read §1.2 alongside our §18.1–§18.6; his derivation of $\cos\theta=\frac{\mathbf{u}\cdot\mathbf{v}}{\lVert\mathbf{u}\rVert\lVert\mathbf{v}\rVert}$ is the same law-of-cosines move we use in §18.3. Best matched to all three learning paths.
Sheldon Axler, Linear Algebra Done Right (4th ed.), Chapter 6 (Inner Product Spaces), §6A. Axler is the rigorous, proof-first complement, and Chapter 6 is the place to see this material done abstractly: he defines an inner product by its axioms (the four properties of our §18.2 sidebar), defines the norm as $\lVert v\rVert=\sqrt{\langle v,v\rangle}$ exactly as we do, and proves the Cauchy–Schwarz and triangle inequalities in the general setting — his Cauchy–Schwarz proof uses orthogonal decomposition rather than our quadratic-discriminant argument, so reading both shows you two routes to the same theorem. Math majors should read §6A in parallel with our §18.4 and §18.7; it is the natural bridge to this book's Chapter 34.
Stephen Boyd & Lieven Vandenberghe, Introduction to Applied Linear Algebra (VMLS), Chapters 1–3. The applied, data-oriented view, and the closest in spirit to this chapter's anchor. Chapter 1 covers vectors and the inner product as a "weighted sum"; Chapter 2 covers norms and distance; Chapter 3 is devoted to norm, distance, angle, and correlation, including cosine similarity, the angle between vectors, and the correlation coefficient as a centered cosine — precisely our §18.6, §18.9, and Case Study 2. Their document-and-word-count examples are the textbook ancestors of our search and recommender case studies. Freely and legally downloadable as a PDF (see below); best matched to the CS/data-science path.

Free online resources

MIT OpenCourseWare, 18.06 Linear Algebra (Gilbert Strang), Lecture 1 (and the orthogonality lectures, 14–15). Strang motivates the dot product geometrically from the first lecture; Lectures 14–15 ("Orthogonal Vectors and Subspaces," "Projections onto Subspaces") carry the right-angle idea into the projection of our Chapter 19. Full video, transcripts, and problem sets, free. Watching Lecture 1 is a gentle on-ramp; the orthogonality lectures preview where Part IV is going.
3Blue1Brown, Essence of Linear Algebra, "Dot products and duality" (Chapter 9 of the series). Grant Sanderson animates the dot product as a projection and connects $\mathbf{u}\cdot\mathbf{v}=\lVert\mathbf{u}\rVert\lVert\mathbf{v}\rVert\cos\theta$ to the shadow picture of our §18.1 and §18.7. His "duality" framing (a dot product as applying a linear map) is a beautiful complement and a preview of the row-vector-eats-column-vector idea from Chapter 2. Watch it if the shadow/projection reading has not yet clicked.
Khan Academy, Linear Algebra, "Vectors and spaces" → "Vector dot and cross products." Gentler, exercise-rich coverage of the dot product, length, the angle formula, and the Cauchy–Schwarz and triangle inequalities (which Khan proves at an accessible level), with immediate auto-graded practice. Good for shoring up the ⭐ and ⭐⭐ exercises before the harder tiers.
Boyd & Vandenberghe, VMLS free PDF and Python companion. The full textbook and its companion notebooks (with numpy-friendly examples) are posted by the authors at no cost. The companion shows inner products, norms, distance, angle, and correlation in code — reinforcing the C-track exercises and both case studies, and using the same cosine-similarity-for-documents example as our anchor.

On the applications in this chapter

Cosine similarity, search, and embeddings (Case Study 1). For the data-science framing of how text becomes vectors and how cosine ranks relevance, see Boyd–Vandenberghe Chapter 3 and any introductory information-retrieval treatment (the TF-IDF + cosine "vector space model" is standard). For the modern learned-vector version, the chapter's link on word embeddings gives the conceptual picture, and the broader menu of similarity measures (Euclidean, cosine, Jaccard) explains when each is appropriate. The word-analogy observation $\mathbf{king}-\mathbf{man}+\mathbf{woman}\approx\mathbf{queen}$ comes from Mikolov et al.'s word2vec work (2013) [verify]; it is evaluated, as in our case study, by cosine similarity.
Collaborative filtering and centered cosine (Case Study 2). The use of cosine and Pearson (centered cosine) similarity in memory-based recommenders is covered in any recommender-systems text; the classic reference points are the GroupLens project's work on user-based collaborative filtering in the 1990s [verify]. The migration from memory-based similarity to learned embeddings is matrix factorization, developed in this book's Chapter 33.
Cauchy–Schwarz across mathematics. The inequality recurs far beyond linear algebra — in probability (the correlation of two random variables lies in $[-1,1]$), in analysis, and in optimization. Steele's The Cauchy–Schwarz Master Class is an entire delightful book of proofs and applications of this one inequality, for readers who want to see how deep a single line can go. Our §18.7 proof (via a nonnegative quadratic) is the most common one; Axler's (via orthogonal projection) and the AM–GM-based proofs are worth comparing.
Norms and the curse of dimensionality (§18.6). The near-orthogonality of random high-dimensional vectors, and more generally the strange geometry of high-dimensional space, is treated in the opening chapters of Blum, Hopcroft, and Kannan's Foundations of Data Science (free PDF online) — the right next read if the §18.6 experiment surprised you. It explains why nearest-neighbor methods behave counterintuitively in high dimensions and why cosine is often the measure of choice.

A note on where this is going

The dot product is the foundation of the entire orthogonality story, and this chapter's biggest unfinished promise is projection. We defined the scalar projection of one vector onto another (§18.7) and saw the four subspaces meet at right angles (§18.11), but we have not yet projected a vector onto a whole subspace or used that to solve anything. That is Chapter 19: Strang §4.2–§4.3 (projections, least squares) and Axler §6B (orthonormal bases and orthogonal projections) are the parallel reading. The single sentence to carry forward is that the closest point in a subspace to a given vector is its orthogonal projection, found by making the error perpendicular to the subspace — a stack of the dot-product-equals-zero conditions you now command. Hold that picture; it turns the geometry of this chapter into the least-squares engine of applied mathematics.