Chapter 11 — Key Takeaways

The whole chapter rests on one picture: up close, a smooth curve looks like its tangent line. Every result below is that picture put to work.

Linear Approximation (§11.2–11.3)

The linearization of $f$ at $a$ is $$L(x) = f(a) + f'(a)(x - a), \qquad f(x) \approx L(x) \text{ for } x \text{ near } a.$$
$L$ is just the tangent line written as an evaluable function: it passes through $(a, f(a))$ with slope $f'(a)$.
Recipe: choose an anchor $a$ near your target where $f(a)$ and $f'(a)$ are easy; evaluate $L$ at the target.
Famous special cases, all linearizations at $0$: $\ln(1+x)\approx x$, $\sin\theta\approx\theta$, $e^x\approx 1+x$, $(1+x)^k \approx 1 + kx$.
The approximation degrades as you leave the anchor; pick a closer anchor when the target is far.

For $y = f(x)$, the differential is $dy = f'(x)\,dx$ — the change in $y$ predicted by the tangent line.
Contrast with the actual change $\Delta y = f(x+\Delta x) - f(x)$, which rides the curve. The fundamental approximation: $$\Delta y \approx dy = f'(x)\,\Delta x \qquad (\Delta x \text{ small}).$$
The gap $\Delta y - dy$ shrinks quadratically as the step shrinks — why the linear approximation is so good for small steps.
Differentials turn a messy difference (like $5.02^3 - 5^3$) into a single multiplication ($dV = 3s^2\,ds$).

A measurement uncertainty $\Delta x$ propagates through $y = f(x)$ as $$\Delta y \approx |f'(x)|\,\Delta x.$$ The derivative is the amplification factor.
Relative error $\Delta y / y$ and percentage error ($\times 100\%$) are what scientists usually report — dimensionless and comparable.
Power-law rule: for $y = x^n$, $$\frac{\Delta y}{y} \approx |n|\,\frac{\Delta x}{x}.$$ A cube ($n=3$) triples relative error; a square root ($n=\tfrac12$) halves it.
For a product/quotient of powers, each input's relative error is weighted by its exponent and the (worst-case) errors add: e.g. $g = 4\pi^2\ell\,T^{-2}$ gives $\Delta g/g \approx \Delta\ell/\ell + 2\,\Delta T/T$.

To solve $f(x) = 0$, replace $f$ by its tangent line at $x_n$, solve the linear equation, and iterate: $$x_{n+1} = x_n - \frac{f(x_n)}{f'(x_n)}.$$
Geometry: slide down the tangent line to the $x$-axis; that intercept is the next guess.
Special cases collapse to classic algorithms: $f(x)=x^2-S$ gives the Babylonian $x_{n+1}=\tfrac12(x_n + S/x_n)$; $f(x)=1/x-d$ gives the division-free $x_{n+1}=x_n(2-d\,x_n)$.

Near a simple root ($f'(r)\neq 0$), the error satisfies $$e_{n+1} \approx \frac{f''(r)}{2 f'(r)}\,e_n^2.$$
The new error is proportional to the square of the old: the number of correct digits roughly doubles each step. Three or four steps reach machine precision from a decent start.

Failure	Cause
Bad starting guess	Tangent line points away from the root; no global sense of direction.
Near-zero derivative $f'(x_n)\approx 0$	Step $f/f'$ blows up; a flat spot flings the guess far away.
Cycling	Iterates fall into a repeating loop (e.g. $x^3-2x+2$ from $x_0=0$ cycles $0\to1\to0$).
Multiple (repeated) root	$f'(r)=0$ at the root degrades convergence from quadratic to linear.

Newton's method gives no warning when it fails: always plug the answer back into $f$, cap the iterations, and prefer a bracketing method when reliability matters (§11.12).

Bisection: needs only continuity and a sign-change bracket (Intermediate Value Theorem, Chapter 4); never fails but converges only linearly.
Secant method: Newton with a finite-difference stand-in for $f'$; superlinear (order $\approx 1.618$), no derivative needed.
Production root-finders blend these for robustness plus speed.

$|f(x) - L(x)| \le \dfrac{M}{2}(x-a)^2$ with $M = \max|f''|$ near $a$: the error is quadratic in distance from the anchor and proportional to curvature.
Keeping more terms gives the quadratic approximation and then the Taylor series — linear approximation is the first-order Taylor polynomial (Chapter 23).

Anchoring too far from the target. Approximating $\sqrt 7$ from $a=4$ is poor; use $a=9$. Always check the target is near $a$.
Confusing $dy$ with $\Delta y$. $dy$ is the tangent-line (linear) change; $\Delta y$ is the true change. They agree only approximately, for small steps.
Forgetting the exponent weighting in error propagation. A squared quantity contributes twice its relative error, not once.
Trusting Newton blindly. Verify the output, cap iterations, and watch for $f'\approx 0$, cycling, or repeated roots.
Expecting quadratic convergence at a double root. There convergence is only linear.
Dropping the thin space in differentials. Write $dy = f'(x)\,dx$ and $\int f(x)\,dx$ (continuity §2).

Backward: the tangent line and differentiability come from Chapter 6; the critical-point / optimization use of Newton on $g'$ connects to Chapter 10; bisection rests on the Intermediate Value Theorem from Chapter 4.
Forward — Taylor series (Chapter 23): the linearization is the first-order Taylor polynomial; higher orders give the quadratic approximation and the full series.
Forward — tangent plane (Chapter 29): in two variables the tangent line becomes a tangent plane, $L(x,y)=f(a,b)+f_x(a,b)(x-a)+f_y(a,b)(y-b)$, and differentials become $dz = f_x\,dx + f_y\,dy$.
Forward — differential notation (Chapter 12): the $dx$ introduced here becomes the integration element $\int f(x)\,dx$ that opens Part III.
Anchor example — gradient descent / optimization (Chapters 6 → 30): "Newton's method for optimization," $x_{n+1}=x_n-g'(x_n)/g''(x_n)$, is the curvature-aware cousin of gradient descent.
Anchor example — Euler's formula (Chapter 24): the linearizations $\sin\theta\approx\theta$, $\cos\theta\approx 1$ are the opening terms of the series that link exponentials and trigonometry.