Chapter 30 — Further Reading
Annotated pointers for going deeper on the multivariable chain rule, the gradient, directional derivatives, tangent planes, and gradient descent. Section numbers map this chapter onto the two reference texts the book is benchmarked against (see _continuity.md §8 and the appendix chapter mappings).
Standard Coverage — Textbook Mapping
Stewart, Calculus: Early Transcendentals (9th ed., Cengage). - §14.5 — The Chain Rule. Stewart's tree-diagram presentation of the multivariable chain rule (our §30.2), including the several-independent-variable case and implicit differentiation. The closest match to this chapter's Cases 1–3. - §14.6 — Directional Derivatives and the Gradient Vector. The core of this chapter: the gradient, $D_{\mathbf u}f = \nabla f\cdot\mathbf u$, maximum rate of change $\|\nabla f\|$, and perpendicularity to level sets (our §30.3–30.5). Tangent planes to level surfaces (our §30.6) appear here and in §14.4. - §14.4 — Tangent Planes and Linear Approximations. The graph-form tangent plane and linearization (our §30.6–30.7); pairs with Chapter 29. - Exercises: Stewart §14.6 #1–60 closely parallel our Parts B–E; his "maximum rate of change" and "level curve" problems match Part D.
OpenStax, Calculus Volume 3 (Strang & Herman, free). - §4.5 — The Chain Rule (functions of several variables). Tree diagrams and the multi-variable chain rule (our §30.2). Generalized chain-rule statement with worked examples. - §4.6 — Directional Derivatives and the Gradient. Directional derivatives, the gradient, steepest ascent, and gradient ⟂ level curves (our §30.3–30.5). Excellent free worked examples mirroring our temperature-plate and hill problems. - §4.4 — Tangent Planes and Linear Approximations. Tangent planes and differentials (our §30.6–30.7).
Note. Neither Stewart nor OpenStax develops gradient descent as an algorithm — that material (our §30.8–30.9) is this book's bridge into machine learning and is best continued in the ML references below.
Going Deeper — Rigor (for math majors)
- Spivak, Calculus on Manifolds (1965), Ch. 2. The fully rigorous account of differentiability in several variables — why the existence of all directional derivatives does not imply differentiability (the subtlety in our §30.5 Math Major Sidebar). The total derivative as a linear map; the gradient as its representing vector.
- Marsden & Tromba, Vector Calculus (6th ed.), Ch. 2–3. A middle path between Stewart and Spivak: careful proofs of the chain rule and gradient properties with strong geometric pictures. Good on tangent planes and the normal-vector interpretation.
- Apostol, Mathematical Analysis (2nd ed.), Ch. 12. The chain rule and total derivative at analysis level, with clean hypotheses (continuous partials ⟹ differentiable).
Gradient Descent and Machine Learning (the §30.8–30.9 thread)
- Goodfellow, Bengio & Courville, Deep Learning (2016), Ch. 4 and §6.5. Ch. 4 covers gradient-based optimization and the role of the learning rate / conditioning (our §30.8 warnings); §6.5 is back-propagation — the reverse-mode chain rule of §30.2 applied to networks. Free online at deeplearningbook.org.
- Nocedal & Wright, Numerical Optimization (2nd ed.), Ch. 2–3. The serious optimization treatment: line search, step-size selection, and why poor conditioning slows steepest descent (our $f = x^2 + 10y^2$ example). The reference for why momentum and Adam exist.
- Kingma & Ba, "Adam: A Method for Stochastic Optimization" (2014). The paper behind the default optimizer for training transformers; it begins exactly at
x = x - eta * grad(x)and adds per-parameter adaptive step sizes. - 3Blue1Brown, "Gradient descent, how neural networks learn" (video). A visual companion to §30.8–30.9: the loss surface, the negative gradient, and stepping downhill, rendered as animation.
The Gradient in the Sciences (the §30.10 thread)
- Griffiths, Introduction to Electrodynamics (4th ed.), Ch. 1–2. $\mathbf E = -\nabla V$ and the gradient operator in physics; the cleanest introduction to $\nabla$ as a vector operator (foreshadowing divergence and curl in our Chapters 34–37).
- Incropera et al., Fundamentals of Heat and Mass Transfer, Ch. 2. Fourier's law $\mathbf q = -k\nabla T$ in full engineering context — one row of our §30.10 "flux $= -$ conductivity $\times$ gradient" table.
- Gonzalez & Woods, Digital Image Processing (4th ed.), Ch. 10. The image gradient $\nabla I$ and edge detection (Sobel, Canny) — the §30.10 computer-vision application.
Historical
- Crowe, A History of Vector Analysis (1967/1994). The origins of $\nabla$ (Hamilton, Tait's "nabla") and the gradient (Maxwell's usage) referenced in the §30.3 Historical Note. How vector notation slowly displaced quaternions.
How to Use These
If you want more practice, drill Stewart §14.5–14.6 or OpenStax §4.5–4.6. If you want the rigorous foundation behind $D_{\mathbf u}f = \nabla f\cdot\mathbf u$, read Spivak or Marsden & Tromba. If you came for the AI payoff, go straight to Goodfellow §6.5 and watch a gradient descent visualization — then return to Chapter 31 for constrained optimization (Lagrange multipliers) and the Hessian test.