Chapter 6 — Further Reading

DataField.Dev

Chapter 6 — Further Reading

Annotated pointers for going deeper on the derivative as a function, its failure modes, higher derivatives, and the gradient-descent anchor. Full citations live in appendices/bibliography.md; the chapter-by-chapter maps to the two reference texts are in appendices/appendix-h-stewart-chapter-mapping.md and appendices/appendix-i-openstax-chapter-mapping.md.

Companion Textbook Sections

Stewart, Calculus: Early Transcendentals (9th ed.), §2.7–§2.9 and §3.7. §2.7 introduces derivatives and rates of change; §2.8 — "The Derivative as a Function" — is the direct parallel to this chapter, including graphing $f'$ from $f$ and the non-differentiability gallery; §2.9 covers higher derivatives. Stewart's worked problems on sketching $f'$ from a given graph are excellent extra drill for our §6.4. (Our gradient-descent material, §6.8, has no Stewart counterpart — it is one of the ways this book reaches past the standard syllabus.)

OpenStax, Calculus Volume 1 (Strang & Herman), §3.2 and §3.6. §3.2, "The Derivative as a Function," matches our §6.2–§6.4 closely and is free online; it includes the same corner/cusp/vertical-tangent classification as our §6.7. §3.6 treats higher-order derivatives. A no-cost first stop for more examples and exercises.

Spivak, Calculus (4th ed.), Chapter 9 — "Derivatives." For readers who want the rigorous, proof-first treatment behind the Math Major Sidebars. Spivak develops differentiability with full $\varepsilon$–$\delta$ care and is the right place to see why differentiable implies continuous (our §6.3) proven without hand-waving.

On Non-Differentiability and the $C^n$ Hierarchy

Abbott, Understanding Analysis (2nd ed.), Chapter 5. The cleanest undergraduate account of the gap between continuous, differentiable, and continuously differentiable. Abbott constructs the Weierstrass function (our §6.9 Historical Note) and proves it is continuous everywhere and differentiable nowhere — the payoff our chapter only sketches.

Weierstrass's original pathology, recounted in any analysis text, is worth meeting once: it is the moment mathematicians learned that "continuous" and "smooth" are genuinely different demands, and it is why our limit-based definitions look the way they do.

On Higher Derivatives in Physics and Engineering

Halliday, Resnick & Walker, Fundamentals of Physics (10th ed.), Chapter 2. Position, velocity, and acceleration developed from scratch — the physical backbone of our §6.6. Read this to connect $s$, $s'$, $s''$ to motion you can feel.

Pendrill, A.-M. (2013). "Acceleration in one, two, and three dimensions in launched roller coasters," Physics Education 48(3): 376. A delightful applied look at why jerk — the third derivative — is engineered into ride comfort, expanding the elevator example of §6.6.

On the Gradient-Descent Anchor (Case Study 1)

Goodfellow, Bengio & Courville, Deep Learning (2016), Chapters 4 and 8. Free at deeplearningbook.org. Chapter 4 motivates numerical optimization; Chapter 8 is the authoritative treatment of how gradient descent actually trains deep networks. Return to it after Chapter 30, when you have the multivariable gradient.

Ruder, S. (2017). "An overview of gradient descent optimization algorithms," arXiv:1609.04747. A short, readable survey of the learning-rate refinements (momentum, RMSprop, Adam) that our §6.8 warning gestures at. Best read once you have run gradient descent yourself.

Boyd & Vandenberghe, Convex Optimization (2004). Free online. The rigorous foundation of when gradient descent converges cleanly — exactly the convex-bowl situation of the linear-regression loss in Case Study 1.

Karpathy, A., "Neural Networks: Zero to Hero" (YouTube series). Builds gradient descent and automatic differentiation from scratch in Python. The single best way to feel, rather than just read, what Case Study 1 describes.

On Automatic Differentiation (the engine behind training)

Baydin, A. G., et al. (2018). "Automatic differentiation in machine learning: a survey," Journal of Machine Learning Research 18: 1–43. Explains how frameworks compute the millions of derivatives gradient descent needs — a mechanical chain rule (Chapter 7) applied through an entire network.

On Derivatives in Medicine (Case Study 2)

Pan, J., & Tompkins, W. J. (1985). "A Real-Time QRS Detection Algorithm," IEEE Transactions on Biomedical Engineering BME-32(3): 230–236. The classic heartbeat detector built directly on the derivative of the ECG — the slope-threshold idea of Case Study 2, in its original form.

Drew, B. J., et al. (2014). "Insights into the problem of alarm fatigue with physiologic monitor devices," PLOS ONE 9(10): e110274. On the real-world cost of false alarms, which is what sets the derivative thresholds in monitoring software.