Case Study 1 — When the Limit Meets the Machine: Floating-Point and the Derivative

DataField.Dev

Case Study 1 — When the Limit Meets the Machine: Floating-Point and the Derivative

Field: Numerical computing / scientific software Calculus used: the intuitive limit (Section 3.2), the $0/0$ form (Section 3.3), the difference quotient that becomes the derivative (Section 3.11)

A Number That Isn't There

Open a Python interpreter and type the most innocent arithmetic you can imagine:

>>> 0.1 + 0.2
0.30000000000000004

The answer is not $0.3$. It is $0.3$ plus about $4 \times 10^{-17}$. This is not a bug in Python, and it is not a bug in your computer. It is a direct consequence of the same idea this chapter is about: the gap between an exact mathematical value and the values you can actually get close to with finite resources. A computer cannot store the real number $1/10$ any more than a numerical table can store the exact limit of a function — both can only approach the target. This case study follows that gap all the way into the heart of the limit definition of the derivative, and shows that the machine's failure is, read correctly, a beautiful illustration of the limit concept.

What the Computer Actually Stores

Modern hardware represents real numbers in 64 bits using the IEEE 754 double-precision format. With 64 bits you can name only about $2^{64} \approx 1.8\times 10^{19}$ distinct values — a finite scattering of points along the genuinely continuous real line. When you write 0.1, the machine cannot store $1/10$ exactly, because $1/10$ has no finite binary expansion (just as $1/3$ has no finite decimal expansion). It stores the nearest representable double, which is about

$$0.1000000000000000055511151231257827\ldots,$$

off from $1/10$ by roughly $5.5\times 10^{-18}$. Add two such almost-right numbers and you get an almost-right sum. The tiny error you saw is the rounding of $0.1$ and $0.2$ surfacing in the seventeenth decimal place. The relative error of any single stored number is at most machine epsilon, $\varepsilon_{\text{mach}} \approx 2.2\times 10^{-16}$ — the resolution of the silicon's view of the real line.

The Derivative as a Limit, Computed Honestly

Now bring in calculus. Section 3.11 previews that the derivative is a limit:

$$f'(a) = \lim_{h \to 0}\frac{f(a+h) - f(a)}{h}.$$

The mathematics is unambiguous: take $h$ as small as you like, and the difference quotient homes in on $f'(a)$. A natural first instinct is to simulate the limit — just plug in a very small $h$ and read off the answer. Let us try it for $f(x) = x^2$ at $a = 1$, where the exact derivative is $f'(1) = 2$.

import numpy as np

def difference_quotient(f, a, h):
    return (f(a + h) - f(a)) / h

f = lambda x: x**2
for k in range(1, 19):
    h = 10.0 ** (-k)
    print(f"h = 1e-{k:<2d}  DQ = {difference_quotient(f, 1.0, h):.16f}")

The output tells a story in three acts:

h = 1e-1   DQ = 2.1000000000000001
h = 1e-2   DQ = 2.0100000000000002
h = 1e-3   DQ = 2.0009999999999999
h = 1e-5   DQ = 2.0000099999990418
h = 1e-7   DQ = 2.0000000254181981
h = 1e-8   DQ = 1.9999999878450826   <- best zone
h = 1e-10  DQ = 2.0000001654807317
h = 1e-12  DQ = 2.0001778801510751
h = 1e-14  DQ = 1.9984014443252818
h = 1e-16  DQ = 2.2204460492503131
h = 1e-17  DQ = 0.0000000000000000

Act one (large $h$): the quotient is close to $2$ but not exact, because the mathematical error of a forward difference is proportional to $h$. For $f(x)=x^2$ the algebra is clean: $\frac{(1+h)^2 - 1}{h} = \frac{2h + h^2}{h} = 2 + h$, so the true error is exactly $h$. As $h$ shrinks from $10^{-1}$ to $10^{-3}$ the estimate improves by a factor of $100$ per row, exactly as $2 + h \to 2$ predicts.

Act two (the sweet spot, $h \approx 10^{-8}$): the estimate is as good as double precision allows — correct to about eight digits.

Act three (tiny $h$): the answer gets worse, and at $h = 10^{-17}$ it collapses to exactly $0$. Here the mathematics says the quotient should be nearly perfect, yet the machine returns garbage. What went wrong?

Catastrophic Cancellation: the $0/0$ Form in Silicon

The culprit is the subtraction $f(a+h) - f(a)$ in the numerator. When $h$ is tiny, $f(a+h)$ and $f(a)$ are almost equal numbers, and subtracting two nearly equal floating-point numbers is the single most dangerous operation in numerical computing. Both inputs carry a rounding error of relative size $\varepsilon_{\text{mach}}$, an absolute error of about $\varepsilon_{\text{mach}}\,|f(a)| \approx 10^{-16}$. The true difference $f(a+h)-f(a)\approx 2h$ shrinks as $h$ does, but the rounding error does not — so the relative error in the numerator blows up like $\varepsilon_{\text{mach}}/h$. Dividing by $h$ then inflates the whole thing. By $h = 10^{-17}$, $1 + h$ rounds to exactly $1.0$, the numerator is the genuine $0/0$ this chapter warned about (Section 3.3), and the quotient is $0/10^{-17} = 0$.

This is the lesson of Section 3.3 rendered in hardware. The form $0/0$ is indeterminate: the symbol carries no value, and only by keeping numerator and denominator separately resolvable — never letting them both collapse to the machine's zero — can you recover the true answer. The mathematician resolves $0/0$ with algebra (cancel the $h$ before letting it vanish); the numerical analyst resolves it by not letting $h$ get small enough to trigger cancellation in the first place.

The Optimal Step and a V-Shaped Error

Two competing errors are now visible. The truncation error (the mathematics of the forward difference) is proportional to $h$ and shrinks as $h\to 0$. The rounding error (the machine) is proportional to $\varepsilon_{\text{mach}}/h$ and grows as $h\to 0$. The total error is roughly

$$E(h) \approx C\,h + \frac{\varepsilon_{\text{mach}}}{h}.$$

Minimizing over $h$ (a calculus exercise you will be able to do after Chapter 10) gives an optimum near $h^\star \sim \sqrt{\varepsilon_{\text{mach}}} \approx 10^{-8}$ — precisely the sweet spot the table revealed. Plot the absolute error against $h$ on log-log axes and you get an unmistakable V: error falling as $h$ shrinks (the mathematics winning), bottoming out near $10^{-8}$, then rising again (the machine losing).

import numpy as np
import matplotlib.pyplot as plt

f, a, true = np.sin, 1.0, np.cos(1.0)
hs = 10.0 ** np.arange(-1, -16, -1)
err = [abs((f(a + h) - f(a)) / h - true) for h in hs]
plt.loglog(hs, err, 'o-')
plt.xlabel('h'); plt.ylabel('absolute error')
plt.title('Forward-difference error for the derivative of sin at x = 1')
plt.gca().invert_xaxis(); plt.grid(True); plt.show()
# The curve falls, reaches a minimum near h ~ 1e-8, then rises.

How Real Software Sidesteps the Gap

Production numerical code rarely computes derivatives by naive forward differences, precisely because of this trade-off. Three standard escapes:

Central differences. Use $\dfrac{f(a+h) - f(a-h)}{2h}$. The leading truncation error cancels, leaving an error proportional to $h^2$, so a larger (safer) $h$ achieves higher accuracy. Below, the central estimate is already accurate to eleven digits:

python def central_diff(f, a, h=1e-5): return (f(a + h) - f(a - h)) / (2 * h) print(central_diff(lambda x: x**2, 1.0)) # 2.0000000000131...

Symbolic differentiation (sympy): differentiate the formula exactly, then evaluate. No subtraction, no cancellation — but it needs a closed-form expression.
Automatic differentiation (PyTorch, JAX, TensorFlow): propagate exact derivative rules through the program itself, computing $f'$ to full machine precision with no step size at all. This technique is what made training deep neural networks — our gradient-descent anchor from Chapter 6 — practical at scale.

The Moral

The exact mathematical limit lives in a universe of infinite precision: $\lim_{h\to 0}$ has one perfect value. The computational analog lives in a universe where every operation spends a little precision, and where "let $h\to 0$" eventually stops helping and starts hurting. Recognizing that gap — and choosing $h$, or a smarter algorithm, to respect it — is the founding insight of numerical analysis. And it is, at bottom, this chapter's insight: a limit is about what the nearby values approach, never about evaluating the expression at the forbidden point. The machine, by failing exactly at $h = 0$, proves the point in silicon.

Discussion Questions

Why is $0.1 + 0.2 \neq 0.3$ in floating point? Relate your answer to the fact that $1/10$ has no finite binary expansion.
The optimal forward-difference step is $h^\star \sim \sqrt{\varepsilon_{\text{mach}}}$. Using the model $E(h) \approx C h + \varepsilon_{\text{mach}}/h$, explain where the square root comes from.
The central difference has truncation error $\propto h^2$. Argue why its optimal step is larger than the forward difference's, and why that makes it more accurate in practice.
Symbolic differentiation never suffers cancellation. Why isn't it always used? (Consider functions defined only by a black-box simulation, or by data.)
With true arbitrary-precision arithmetic (no rounding), would the naive difference quotient converge to $f'(a)$ as $h\to 0$? At what cost?

Annotated Further Reading

Goldberg, D. (1991). "What every computer scientist should know about floating-point arithmetic." ACM Computing Surveys 23(1), 5–48. The canonical, careful explanation of IEEE 754 and cancellation. Read sections 1–2 first.
Higham, N. J. (2002). Accuracy and Stability of Numerical Algorithms (2nd ed.). SIAM. The definitive treatment of rounding error; Chapter 1 frames the forward/backward error ideas used above.
Baydin, A. G., et al. (2018). "Automatic differentiation in machine learning: a survey." JMLR 18, 1–43. Explains the AD technique that replaced finite differences in modern ML.

The calculus limit assumes infinite precision is free. The machine charges for every digit. The art of scientific computing is computing the limit anyway — by knowing exactly where the charge comes due.