Four chapters into the book, you finally have the tools to finish what we started in Chapter 1.
Prerequisites
- chapter-03-the-limit
- chapter-04-continuity
Learning Objectives
- Compute the average rate of change of a function over an interval
- Define the instantaneous rate of change as a limit of average rates
- State the formal definition of the derivative $f'(a)$ as a limit
- Compute simple derivatives from the definition
- Interpret the derivative as slope of tangent, velocity, or rate of change in any application
- Identify points where a function has no derivative (corners, cusps, vertical tangents)
In This Chapter
- 5.1 Average Rate of Change
- 5.2 From Average to Instantaneous
- 5.3 The Tangent Line Problem, Revisited
- 5.4 The Velocity Problem
- 5.5 The Derivative as a Function
- 5.6 Notations for the Derivative
- 5.7 When Derivatives Fail to Exist
- 5.8 Interpreting the Derivative
- 5.9 A Few Quick Differentiations
- 5.10 Summary: The End of Part I
Rates of Change: The Idea That Started Everything
Four chapters into the book, you finally have the tools to finish what we started in Chapter 1.
Back then we computed the slope of $y = x^2$ at $x = 1$ informally — by writing the secant slope as $2 + h$ and letting $h$ "approach zero" without ever saying precisely what that meant. Chapter 3 made the meaning of "approach" exact through the limit. Chapter 4 introduced continuity, the regularity condition under which limits behave the way our intuition wants. With those two ideas in hand, the loose argument from Chapter 1 can be made completely rigorous.
This chapter does that. It defines the derivative — the central object of differential calculus — and it is the last chapter of Part I. Everything you have learned about limits was, in a sense, preparation for this single definition. The derivative is built from a limit; it could not exist without one.
A word about what this chapter is and is not. This is the chapter where the derivative is born as a concept: we motivate it, define it, and interpret it from three angles (slope, velocity, rate of change). What this chapter does not do is teach you the fast computational rules — the power rule, product rule, chain rule. Those wait for Chapter 7. Here we compute derivatives the slow, honest way, straight from the definition, because that is how you build genuine understanding of what the symbol $f'(a)$ actually means. Tomorrow we compute fluently; today we understand.
The Key Insight. A derivative is an instantaneous rate of change, and "instantaneous rate" is a contradiction in terms until limits resolve it. A rate needs two moments to compare — but an instant is a single moment. The limit is the device that squeezes a family of two-moment average rates down onto a single instant. That squeeze is the whole idea of the chapter; everything else is consequence.
5.1 Average Rate of Change
Suppose you drive from your house to a restaurant 30 miles away, and the trip takes 45 minutes ($= 0.75$ hours). What was your average speed?
$$\text{average speed} = \frac{\text{distance}}{\text{time}} = \frac{30 \text{ mi}}{0.75 \text{ hr}} = 40 \text{ mph}.$$
This is your average speed over the trip. You almost certainly did not drive at exactly 40 mph the whole way — you accelerated from rest, cruised at 65 on the highway, slowed for traffic, idled at a light. The average over the trip is 40 mph, but the instantaneous speed at any given moment was generally something else. Hold that distinction; it is the engine of the entire chapter.
Generalizing the arithmetic: for any function $f$ and any interval $[a, b]$ with $a \neq b$, the average rate of change of $f$ over $[a, b]$ is
$$\text{average rate of change of } f \text{ on } [a, b] = \frac{f(b) - f(a)}{b - a}.$$
This is a single number. It is the change in output, $\Delta y = f(b) - f(a)$, divided by the change in input, $\Delta x = b - a$. And geometrically it is exactly the slope of the secant line connecting the two endpoints $(a, f(a))$ and $(b, f(b))$ on the graph.
Geometric Intuition. Picture the graph of $f$ and lay a straightedge across it touching the two points $(a, f(a))$ and $(b, f(b))$. That straightedge is the secant line, and its tilt — its slope — is the average rate of change. It captures the overall tilt of the function across the interval, blind to whatever wiggling happens in between. Two wildly different curves through the same two endpoints share the same average rate of change. The average can't see the middle; only the instantaneous rate can.
Three readings of the same number
The strength of the average-rate idea is that it is field-agnostic. The same ratio $\Delta y / \Delta x$ means something concrete in every quantitative discipline.
Position → velocity. If $s(t)$ is the position of an object at time $t$, then the average rate of change of $s$ over $[t_1, t_2]$ is the average velocity: total displacement divided by elapsed time.
Population → growth rate. If $P(t)$ is a population at time $t$, the average rate of change of $P$ over $[t_1, t_2]$ is the average rate of growth (net individuals added per unit time).
Cost → marginal cost. If $C(q)$ is the total cost of producing $q$ units of a good, then $\dfrac{C(q_2) - C(q_1)}{q_2 - q_1}$ is the average cost of each additional unit over the production range — the discrete cousin of marginal cost.
These are the same arithmetic wearing three uniforms. The recurring theme of the book — calculus appears in every quantitative field — starts right here, before we have even taken a limit.
Common Pitfall. The average rate of change is not the average of the values of $f$. The average value of $f$ on $[a,b]$ (which we'll define carefully in Chapter 14) is $\frac{1}{b-a}\int_a^b f$, a totally different quantity. The average rate of change is $\frac{f(b)-f(a)}{b-a}$ — a ratio of two differences. Many students blur "average of the function" and "average rate of change of the function." They are different objects with different units: the first has the units of $f$; the second has the units of $f$ per unit of $x$.
Check Your Understanding. A tank holds $V(t) = 100 - 5t^2$ liters of water at time $t$ (minutes), valid for $0 \le t \le 4$. What is the average rate of change of volume over $[1, 3]$, and what does its sign tell you?
Answer
$V(3) = 100 - 45 = 55$, $V(1) = 100 - 5 = 95$. Average rate $= \dfrac{55 - 95}{3 - 1} = \dfrac{-40}{2} = -20$ liters per minute. The negative sign tells you the tank is draining — on average, the volume falls by 20 L each minute over this interval. The sign of a rate always encodes direction: negative means the quantity is decreasing.
5.2 From Average to Instantaneous
The average rate over an interval is useful, but it is often not what we really want. We want the rate at a single instant. What is your exact speed at the precise moment your speedometer needle crosses 40 mph? What is the population growth rate exactly at midnight on January 1, not averaged over the surrounding month?
There is a genuine conceptual obstacle here, and it is worth feeling it before we resolve it. A rate of change compares two states: where you were, where you are now. But "an instant" is a single state. Put $b = a$ into the average-rate formula and you get $\frac{f(a) - f(a)}{a - a} = \frac{0}{0}$ — undefined, meaningless. So you cannot just plug in. The naive approach collapses.
The resolution is the central move of all of calculus: don't reach the instant — approach it. Take the average rate over a small interval $[a, a+h]$ and shrink $h$ toward zero. Each value of $h$ gives a legitimate average rate (no division by zero, because $h \neq 0$). Then ask what number those average rates approach. That limiting number is the instantaneous rate:
$$\text{instantaneous rate of change at } a = \lim_{h \to 0} \frac{f(a + h) - f(a)}{h}.$$
The expression inside the limit,
$$\frac{f(a + h) - f(a)}{h},$$
is so important it gets a name: the difference quotient of $f$ at $a$. It is just the average rate of change over the interval $[a, a+h]$ — equivalently, the slope of the secant line from $(a, f(a))$ to $(a+h, f(a+h))$. The variable $h$ is the width of that interval. As $h \to 0$, the right-hand secant point slides toward the fixed left point, and the secant line pivots toward a limiting position. That limiting line is the tangent line, and its slope is the limit above.
When this limit exists, we say $f$ is differentiable at $a$, and we call the limit the derivative of $f$ at $a$, written $f'(a)$:
$$\boxed{\,f'(a) = \lim_{h \to 0} \frac{f(a + h) - f(a)}{h}\,}.$$
This is the definition. Memorize it; understand it; revere it a little. Every formula, rule, and theorem in differential calculus is, at bottom, an unpacking of this one limit.
The Key Insight. The derivative converts a family of average rates — one for each nonzero $h$ — into a single instantaneous rate. The limit is the magic that performs the conversion. Without limits, "instantaneous rate of change" is the meaningless $0/0$; with limits, it is a precise, finite number. This is the calculus revolution in one sentence: limits make the instantaneous computable.
An equivalent formulation
The same derivative can be written with the second point named $x$ instead of $a + h$. Let $x = a + h$, so $h = x - a$, and $h \to 0$ becomes $x \to a$:
$$f'(a) = \lim_{x \to a} \frac{f(x) - f(a)}{x - a}.$$
These two forms are identical in content. The first ($h \to 0$ form) is usually easier for algebraic computation, because you expand around a single anchor point. The second ($x \to a$ form) is sometimes cleaner when $f$ is given as a quotient or factors nicely. Keep both in your toolkit; we'll use whichever is more convenient.
Historical Note. The difference quotient predates the limit by decades. Pierre de Fermat (1630s) computed slopes by a method of "adequality" — essentially forming $\frac{f(a+h)-f(a)}{h}$, simplifying, then discarding the leftover $h$ terms — without any rigorous notion of limit. Newton (1660s) and Leibniz (1670s) systematized this into calculus, but their "infinitesimals" drew withering criticism from Bishop George Berkeley, who in 1734 mocked vanishing quantities as "ghosts of departed quantities." The limit, formalized by Cauchy and Weierstrass in the 1800s, finally exorcised the ghosts. You are learning the version that took two centuries to make honest.
5.3 The Tangent Line Problem, Revisited
Recall the tangent line problem from Chapter 1: given a curve $y = f(x)$ and a point $(a, f(a))$ on it, find the slope of the curve at that point. In Chapter 1 we could only gesture at the answer. Now we can state it cleanly.
The slope of the tangent line is exactly the derivative:
$$\text{slope of tangent at } (a, f(a)) = f'(a) = \lim_{h \to 0} \frac{f(a + h) - f(a)}{h}.$$
This is the secant-to-tangent limit made precise. Each secant slope is a difference quotient; the tangent slope is their limit. Once you have the slope and a point, the line itself follows from point-slope form:
$$y - f(a) = f'(a)\,(x - a).$$
This is the equation of the tangent line to $y = f(x)$ at $x = a$.
Geometric Intuition. Imagine zooming in on the graph of a differentiable function at the point $(a, f(a))$ — doubling the magnification again and again. The curve, however wavy it looked from far away, flattens out and becomes indistinguishable from a straight line. That line is the tangent. Differentiability is local straightness: a function is differentiable at a point exactly when, viewed under enough magnification, its graph looks like a single line there. This is the deepest picture in the chapter, and it will return in Chapter 11 as the foundation of linear approximation.
Worked example: the tangent to a parabola
Find the equation of the tangent line to $f(x) = x^2$ at the point $(3, 9)$.
Step 1 — Compute $f'(3)$ from the definition.
$$f'(3) = \lim_{h \to 0} \frac{f(3 + h) - f(3)}{h} = \lim_{h \to 0} \frac{(3 + h)^2 - 9}{h}.$$
Expand the square: $(3+h)^2 = 9 + 6h + h^2$, so
$$f'(3) = \lim_{h \to 0} \frac{9 + 6h + h^2 - 9}{h} = \lim_{h \to 0} \frac{6h + h^2}{h} = \lim_{h \to 0} \frac{h(6 + h)}{h} = \lim_{h \to 0} (6 + h) = 6.$$
The cancellation of $h$ in the numerator and denominator is the crucial step — it is only legal because $h \neq 0$ throughout the limit process (we approach $0$, we never arrive). This is precisely the indeterminate-form resolution from Chapter 3 in action.
Step 2 — Write the tangent line. With slope $6$ through $(3, 9)$:
$$y - 9 = 6(x - 3), \qquad \text{i.e.} \qquad y = 6x - 9.$$
Step 3 — Sanity check. At $x = 3$, the line gives $y = 6(3) - 9 = 9$, so it passes through $(3,9)$. ✓ Its slope is $6$. Compare a nearby secant: from $(3,9)$ to $(3.01, 9.0601)$ the secant slope is $\frac{9.0601 - 9}{0.01} = 6.01$ — within a hair of $6$, exactly as the limit predicts. ✓
Common Pitfall. A tangent line is not "a line that touches the curve at exactly one point." That folk definition works for circles but fails everywhere else. The tangent to $y = x^3$ at the origin is the $x$-axis, which also crosses the curve there; the tangent to $y = \sin x$ touches at infinitely many points. The correct definition is the limit-of-secants one: the tangent is the limiting position of secant lines as the second point slides into the first. Slope, not intersection count, is what defines it.
5.4 The Velocity Problem
Now give the derivative a physical body. Let $s(t)$ be the position of an object moving along a line (say the $x$-axis) at time $t$. What is the object's velocity at a particular instant $t_0$?
Average velocity over $[t_0, t_0 + h]$ is the difference quotient of $s$: $\frac{s(t_0 + h) - s(t_0)}{h}$. The instantaneous velocity at $t_0$ is its limit — which is to say, the derivative of position:
$$v(t_0) = s'(t_0) = \lim_{h \to 0} \frac{s(t_0 + h) - s(t_0)}{h}.$$
In words: velocity is the rate of change of position. Two further objects follow immediately. Speed is the magnitude of velocity, $|v(t)|$ — velocity carries a sign (direction), speed does not. Acceleration is the rate of change of velocity, hence the derivative of the derivative of position:
$$a(t) = v'(t) = s''(t),$$
the second derivative of position. We'll formalize second derivatives in §5.5; for now, note that the chain of physical meaning — position, velocity, acceleration — is just the derivative applied once, twice.
Worked example: a falling object
Galileo established, through experiments rolling balls down inclined planes around 1604–1638, that a body falling freely from rest covers a distance proportional to the square of the elapsed time:
$$s(t) = \tfrac{1}{2}\, g\, t^2,$$
where $g \approx 9.8 \text{ m/s}^2$ is the acceleration due to gravity near Earth's surface. What is the velocity at $t_0 = 2$ s?
Form the difference quotient and take the limit:
$$v(2) = \lim_{h \to 0} \frac{s(2 + h) - s(2)}{h} = \lim_{h \to 0} \frac{\tfrac{1}{2} g (2+h)^2 - \tfrac{1}{2} g (2)^2}{h}.$$
Expand the numerator. Since $(2+h)^2 = 4 + 4h + h^2$,
$$\tfrac{1}{2} g (4 + 4h + h^2) - \tfrac{1}{2} g (4) = \tfrac{1}{2} g (4h + h^2) = 2gh + \tfrac{1}{2} g h^2.$$
Divide by $h$ and take the limit:
$$v(2) = \lim_{h \to 0} \frac{2gh + \tfrac{1}{2} g h^2}{h} = \lim_{h \to 0} \left(2g + \tfrac{1}{2} g h\right) = 2g \approx 19.6 \text{ m/s}.$$
Run the same computation at a general time $t$ and the $2$ becomes $t$: the velocity is $v(t) = g t$. The velocity of a freely falling body grows linearly with time — uniform acceleration $g$ produces a velocity that climbs by $9.8$ m/s every second. We will recover $v(t) = gt$ in one line via the power rule of Chapter 7, never again touching the limit definition. That is the trade Part II offers: the definition gives understanding, the rules give speed.
Real-World Application — Crash safety engineering. Automotive engineers live in the world of second and third derivatives of position. Acceleration $a(t) = s''(t)$ determines the force a crash imposes on an occupant (Newton's $F = ma$); airbag deployment fires when measured acceleration crosses a threshold within milliseconds. The third derivative, $s'''(t)$, is called jerk — the rate of change of acceleration — and minimizing jerk is what makes an elevator stop or a train brake feel smooth rather than lurching. Every one of these quantities is a derivative of position, computed in real time by sensors taking difference quotients over tiny time steps.
5.5 The Derivative as a Function
So far we've defined the derivative at a point: $f'(a)$ is a single number, the slope of $f$ where $x = a$. But the point $a$ was arbitrary. We can run the same limit at any input where it exists. Doing so for every such input produces a brand-new function — the derivative function $f'$ — whose input is a location and whose output is the slope of $f$ at that location:
$$f'(x) = \lim_{h \to 0} \frac{f(x + h) - f(x)}{h}.$$
The shift from "$f'(a)$, a number" to "$f'$, a function" is the bridge into Chapter 6, which is built entirely around treating $f'$ as an object in its own right — something you can graph, analyze, and differentiate again. Here we just take the first step.
In Leibniz notation the derivative function is written
$$f'(x) = \frac{df}{dx} = \frac{d}{dx}\big[f(x)\big].$$
All three expressions name the same thing: the function whose value at $x$ is the slope of $f$ at $x$. The symbol $\frac{d}{dx}$ is an operator — a machine that eats a function and emits its derivative function.
Worked example: differentiating $x^2$ everywhere at once
We computed $f'(3) = 6$ for $f(x) = x^2$. Now do it for a general $x$:
$$f'(x) = \lim_{h \to 0} \frac{(x+h)^2 - x^2}{h} = \lim_{h \to 0} \frac{x^2 + 2xh + h^2 - x^2}{h} = \lim_{h \to 0} \frac{2xh + h^2}{h} = \lim_{h \to 0} (2x + h) = 2x.$$
So $f'(x) = 2x$ for every real $x$. One computation replaces infinitely many. As a check, it reproduces every point value: $f'(1) = 2$, $f'(3) = 6$, $f'(0) = 0$ (the parabola's bottom is flat), $f'(-2) = -4$ (negative slope on the left arm). The single function $f'(x) = 2x$ encodes the slope of the parabola everywhere.
We can verify this symbolically. The three-tier pattern of this book — pose it analytically, solve by hand, then confirm by machine — has the computer agree with our pencil:
import sympy as sp
x = sp.symbols('x')
f = x**2
fprime = sp.diff(f, x) # symbolic derivative
print(f"f(x) = {f}, f'(x) = {fprime}")
# f(x) = x**2, f'(x) = 2*x
# Evaluate the derivative function at specific points:
for a in (1, 3, 0, -2):
print(f"f'({a}) = {fprime.subs(x, a)}")
# f'(1) = 2
# f'(3) = 6
# f'(0) = 0
# f'(-2) = -4
Computational Note.
sympy.diffdoes not approximate — it applies the differentiation rules symbolically and returns an exact expression, here2*x. This is the same machinery we'll use throughout the book to check hand computations. The lesson of the recurring theme hand computation builds understanding; machine computation builds power: you should be able to derive $2x$ from the limit yourself, but once you can, lettingsympyconfirm it frees you to tackle functions too messy to differentiate by hand without error.
Higher derivatives
Because $f'$ is itself a function, it can be differentiated. The derivative of the derivative is the second derivative, written $f''$:
$$f''(x) = \frac{d}{dx}\big[f'(x)\big] = \frac{d^2 f}{dx^2}.$$
For $f(x) = x^2$ we have $f'(x) = 2x$, so $f''(x) = 2$ (the slope of the line $2x$ is the constant $2$), and $f'''(x) = 0$ (the slope of a constant is zero). The hierarchy continues as far as you like. Higher derivatives have parallel notations:
$$f''(x),\quad f'''(x),\quad f^{(4)}(x),\ \ldots,\ f^{(n)}(x); \qquad\qquad \frac{d^n f}{dx^n} = \frac{d^n}{dx^n}\big[f(x)\big].$$
Each derivative reports the rate of change of the one before it. The second derivative will become a workhorse in Chapter 9, where its sign tells us about concavity — whether a graph curves upward like a cup or downward like a cap — and powers the second-derivative test for maxima and minima.
Check Your Understanding. Using the definition, find $f'(x)$ for $f(x) = x^3$. (Hint: $(x+h)^3 = x^3 + 3x^2 h + 3x h^2 + h^3$.)
Answer
$f'(x) = \displaystyle\lim_{h\to 0}\frac{(x+h)^3 - x^3}{h} = \lim_{h\to 0}\frac{3x^2 h + 3x h^2 + h^3}{h} = \lim_{h\to 0}(3x^2 + 3xh + h^2) = 3x^2.$ So $\frac{d}{dx}(x^3) = 3x^2$. Notice the pattern forming with $\frac{d}{dx}(x^2) = 2x$: the exponent drops by one and multiplies out front. That is the power rule, peeking out — proved in general in Chapter 7.
5.6 Notations for the Derivative
Several notations for the derivative coexist in mathematics and science. They denote the same object; fluency means reading all of them without friction.
- Lagrange notation: $f'(x)$. Compact and clean. This book's default, especially when the emphasis is on the function $f$ itself.
- Leibniz notation: $\dfrac{df}{dx}$ or $\dfrac{dy}{dx}$ (when $y = f(x)$). It advertises the variable of differentiation and behaves suggestively like a fraction — which makes it indispensable for the chain rule (Chapter 7), implicit differentiation (Chapter 8), and the whole machinery of integration (Part III). It is not literally a fraction, but it is engineered to act like one in the right circumstances.
- Newton's dot notation: $\dot{f}$ or $\dot{y}$. Reserved almost exclusively for derivatives with respect to time, the native habitat being physics: $\dot{x}$ is velocity, $\ddot{x}$ is acceleration.
- Operator notation: $Df$ or $D_x f$. Treats differentiation as an operator $D$ acting on functions; convenient in differential equations and advanced theory.
This textbook uses Lagrange notation by default and Leibniz notation whenever a chain rule, a differential, or an integral is in play. Get comfortable switching between them; problems and applications will throw all four at you.
5.7 When Derivatives Fail to Exist
We say $f$ is differentiable at $a$ when the limit defining $f'(a)$ exists. A natural question follows: does every reasonable function have a derivative everywhere? The answer is no — and the ways differentiability can fail are themselves instructive, because each one corresponds to a recognizable feature of the graph. There are essentially four.
1. Corner (a "kink")
The textbook example is the absolute value $f(x) = |x|$ at $x = 0$. Examine the one-sided difference quotients separately:
- From the right: $\displaystyle\lim_{h \to 0^+} \frac{|h| - 0}{h} = \lim_{h \to 0^+} \frac{h}{h} = 1.$
- From the left: $\displaystyle\lim_{h \to 0^-} \frac{|h| - 0}{h} = \lim_{h \to 0^-} \frac{-h}{h} = -1.$
The left and right slopes disagree ($-1 \neq 1$), so the two-sided limit does not exist, and $f'(0)$ is undefined. Yet $|x|$ is perfectly continuous at $0$. Geometrically, the graph has a sharp corner there: two straight pieces meeting at an angle, with no single tangent line. Under magnification it never flattens to one line — it stays a "V." Local straightness fails.
2. Cusp
A cusp is a corner pushed to an extreme: the two one-sided slopes don't merely disagree, they run off to $+\infty$ and $-\infty$. The standard example is $f(x) = x^{2/3}$ at $x = 0$. Its difference quotient there is
$$\frac{(0+h)^{2/3} - 0}{h} = \frac{h^{2/3}}{h} = h^{-1/3}.$$
As $h \to 0^+$ this $\to +\infty$; as $h \to 0^-$ it $\to -\infty$. The graph comes to a sharp point with the two sides becoming vertical, like the tip of a spike. No finite slope exists.
3. Vertical tangent
Here the curve is smooth and "single-sided," but the tangent line is vertical, so its slope is undefined (vertical lines have no slope). Take $f(x) = x^{1/3}$ at $x = 0$:
$$\frac{h^{1/3} - 0}{h} = h^{-2/3} \to +\infty \quad \text{as } h \to 0.$$
The slope grows without bound from both sides (since $h^{-2/3} > 0$ for all $h \neq 0$), so the limit is $+\infty$, not a real number. The graph passes through the origin rising vertically. There's a tangent line — it's just vertical, and $f'(0)$ does not exist as a number.
4. Discontinuity
If $f$ is not even continuous at $a$ — a jump, a hole, a blow-up — then it cannot possibly be differentiable at $a$. This is not a separate accident; it follows from a theorem that ties this chapter back to Chapter 4.
The Key Insight. Differentiability is stronger than continuity. A differentiable function is automatically continuous, but a continuous function need not be differentiable. Continuity says the graph has no breaks; differentiability says the graph has no breaks and no corners, cusps, or vertical tangents. Smoothness is a higher standard than mere connectedness.
We can prove the implication precisely.
Theorem (Differentiability implies continuity). If $f$ is differentiable at $a$, then $f$ is continuous at $a$.
Why we care. It tells us where not to bother looking for derivatives: wherever a function jumps or breaks, differentiability is already off the table. It also explains entry 4 above without further work.
Key idea. Continuity at $a$ means $f(a+h) \to f(a)$ as $h \to 0$, i.e. the change $f(a+h) - f(a)$ vanishes. We show that change vanishes by writing it as (difference quotient) $\times\, h$ and watching the $h$ crush it.
Proof. For $h \neq 0$, multiply and divide by $h$:
$$f(a + h) - f(a) = \frac{f(a+h) - f(a)}{h}\cdot h.$$
Take the limit as $h \to 0$. The first factor tends to $f'(a)$ (which exists, by hypothesis), and the second factor tends to $0$. Since both limits exist and are finite, the limit of the product is the product of the limits:
$$\lim_{h \to 0}\big[f(a + h) - f(a)\big] = f'(a)\cdot 0 = 0.$$
Therefore $\lim_{h \to 0} f(a + h) = f(a)$, which is exactly the statement that $f$ is continuous at $a$. $\blacksquare$
What it means. The hypothesis "$f'(a)$ exists" is load-bearing: it is what makes the first factor converge to a finite number, so that multiplying by $0$ kills the whole product. Strip the hypothesis and the argument collapses — which is correct, because continuous-but-not-differentiable functions are exactly the ones where this fails.
Warning. The converse is false, and spectacularly so. Continuity does not imply differentiability — $|x|$ at $0$ already shows one isolated failure. But it gets far worse: there exist functions that are continuous at every point yet differentiable at no point. The famous Weierstrass function $$W(x) = \sum_{n=0}^{\infty} a^n \cos\!\big(b^n \pi x\big), \qquad 0 < a < 1,\ \ b \text{ an odd integer},\ \ ab > 1 + \tfrac{3\pi}{2},$$ (for instance $a = \tfrac12,\, b = 13$) is unbroken everywhere but has a corner at every single point — an infinitely jagged, fractal-like curve with no tangent line anywhere. When Weierstrass published it in 1872 it scandalized mathematicians who had assumed continuous curves must be "mostly smooth." It announced that intuition about graphs is not a safe guide, and that calculus needs the rigor of limits precisely because the world of functions is wilder than it looks.
Math Major Sidebar (optional). Why the condition $ab > 1 + \tfrac{3\pi}{2}$? The cosine pieces oscillate ever faster (frequency $b^n$) but with ever smaller amplitude ($a^n$). The sum converges uniformly — guaranteeing continuity — precisely because $a < 1$ makes $\sum a^n$ a convergent geometric series, so the Weierstrass $M$-test applies. Non-differentiability is the subtler half: the rapid oscillations win the tug-of-war against the shrinking amplitudes exactly when $ab$ is large enough, so the difference quotients oscillate without settling at every point. Hardy later sharpened the threshold to the cleaner $ab \ge 1$. The full proof belongs to a real-analysis course, but the moral is portable: uniform convergence preserves continuity, but it does not preserve differentiability.
import numpy as np
import matplotlib.pyplot as plt
# Visualize two ways differentiability fails at x = 0
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
x = np.linspace(-1, 1, 400)
axes[0].plot(x, np.abs(x), 'b-')
axes[0].set_title('$|x|$: corner at $x=0$ (left slope $-1$, right slope $+1$)')
axes[1].plot(x, np.cbrt(x), 'b-') # np.cbrt = real cube root, handles x<0
axes[1].set_title('$x^{1/3}$: vertical tangent at $x=0$')
for ax in axes:
ax.grid(True, alpha=0.3)
ax.axhline(0, color='gray', lw=0.5)
ax.axvline(0, color='gray', lw=0.5)
plt.tight_layout()
plt.show()
# Figure 5.1: |x| shows a corner (two finite, unequal one-sided slopes);
# x^(1/3) shows a vertical tangent (slope -> +infinity). Neither has f'(0).
The figure (Figure 5.1) makes the failures visible: the "V" of $|x|$ can never be approximated by a single line at its vertex, and the cube-root curve stands up vertically at the origin where its slope runs off to infinity.
5.8 Interpreting the Derivative
The power of the derivative is that one mathematical object wears many costumes. $f'(a)$ is a single limit, but it answers a different question in each field that uses it. All of these readings are the same number; only the story around it changes.
- Geometric: the slope of the tangent line to $y = f(x)$ at $(a, f(a))$. The "steepness" of the graph at that point.
- Kinematic (physics): an instantaneous rate of change in time. If $f$ is position, $f'$ is velocity; if $f$ is velocity, $f'$ is acceleration.
- Marginal (economics): marginal cost, marginal revenue, marginal utility — the gain or cost of one more unit of input, the central concept of microeconomics.
- Sensitivity (engineering): how much the output responds to a small change in input — the heart of error analysis and control theory.
- Local linear approximation: $f(a + h) \approx f(a) + f'(a)\, h$ for small $h$. The derivative is the best linear predictor of the function near $a$ — the engine of Chapter 11.
Real-World Application — Gradient descent, the engine of AI. When a lab trains a large language model, it adjusts each of the model's billions of parameters to reduce a loss function that measures how wrong the model currently is. The size and direction of each adjustment is set by the derivative of the loss with respect to that parameter — how much the loss would change per unit nudge of the parameter. Step every parameter a little bit "downhill" against its derivative, repeat billions of times, and the model learns. This algorithm is gradient descent, and it is, quite literally, the derivative doing the work. We introduce it properly in Chapter 6 (the derivative as a direction to step), develop the single-variable picture there, and reach its full multivariable form — the gradient — in Chapter 30, where it meets machine learning head-on. Keep it in mind: the humble difference quotient of this chapter is the seed of modern AI.
5.9 A Few Quick Differentiations
We won't develop the general differentiation rules until Chapter 7. But the limit definition already lets us differentiate a few simple functions outright — and doing so by hand cements what the definition is before we hand the labor to formulas.
Constant function
If $f(x) = c$ (a constant), then $f(a + h) - f(a) = c - c = 0$ for every $h$. So
$$f'(a) = \lim_{h \to 0} \frac{0}{h} = \lim_{h \to 0} 0 = 0.$$
The derivative of a constant is zero. Geometrically obvious: a constant function is a horizontal line, slope $0$ everywhere. Nothing is changing, so the rate of change is nothing.
Linear function
If $f(x) = mx + b$, then
$$f(a + h) - f(a) = \big[m(a + h) + b\big] - \big[ma + b\big] = mh.$$
So $f'(a) = \lim_{h \to 0} \frac{mh}{h} = \lim_{h \to 0} m = m.$ The derivative of a linear function is its slope $m$, at every point. Again geometrically inevitable: a line has the same slope everywhere, so its rate of change is the constant $m$.
Powers
We've now shown $\frac{d}{dx}(x^2) = 2x$ and (in the Check Your Understanding above) $\frac{d}{dx}(x^3) = 3x^2$ — both from the definition. A pattern is unmistakable:
$$\frac{d}{dx}(x^2) = 2x^1, \qquad \frac{d}{dx}(x^3) = 3x^2.$$
The exponent comes down as a coefficient, and the new exponent is one less. In Chapter 7 we prove this holds for every constant exponent — the celebrated power rule, $\frac{d}{dx}(x^n) = n\,x^{n-1}$. For now, savor that you derived two instances of it by hand, straight from the limit, with no rule to lean on.
Add to Your Modeling Portfolio. Every chapter contributes one tool to your running model. This chapter's tool is the interpretation of the derivative as a rate. Take your modeling function $f(t)$ and write down, in your domain's own language, what $f'(t)$ measures. Track A — Biology: if $f$ is a population $P(t)$, then $f'(t) = P'(t)$ is the net growth rate (births minus deaths per unit time). State its units and what a positive vs. negative value means for your species. Track B — Economics: if $f$ is total cost $C(q)$, then $f'(q) = C'(q)$ is marginal cost — the cost of the next unit. Identify the quantity in your model whose derivative is its "marginal" version. Track C — Physics: if $f$ is position $s(t)$, then $f'(t)$ is velocity and $f''(t)$ is acceleration. Write both for your moving system and note their units. Track D — Data Science: if $f$ is a loss function $L(w)$ of a parameter $w$, then $f'(w) = L'(w)$ tells gradient descent which way and how far to step. Sketch how the sign of $L'(w)$ decides whether $w$ should increase or decrease — you are previewing Chapter 6.
5.10 Summary: The End of Part I
You have reached the end of Part I. Pause and take stock, because the foundation you've poured will carry the next thirty-five chapters.
You can now:
- read and write function notation fluently (Chapter 2);
- evaluate a wide range of limits, including indeterminate forms (Chapter 3);
- recognize continuity and classify discontinuities (Chapter 4);
- apply the Intermediate Value Theorem and the bisection method (Chapter 4);
- state the definition of the derivative and compute simple derivatives directly from it (this chapter);
- find tangent lines, instantaneous velocities, and rates of change in any field (this chapter);
- identify the four ways differentiability fails — corner, cusp, vertical tangent, discontinuity (this chapter).
You understand:
- the derivative is the limit of the difference quotient, the instantaneous version of an average rate;
- tangent slope, instantaneous velocity, and rate of change are the same mathematical object viewed through different windows;
- differentiability implies continuity, but the converse is false — smoothness is strictly stronger than connectedness.
These three insights — that change can be measured at an instant, that one limit unifies geometry and physics and economics, and that the limit is what makes it all rigorous — are the conceptual core of differential calculus. The recurring theme of the book is exactly this: calculus is the mathematics of change, and the derivative is how we measure change as it happens.
What Part II holds
Part I was about meaning: what a derivative is and why it matters. Part II is about power: how to compute derivatives without grinding through the limit every time. Chapter 6 develops the derivative as a function in its own right — graphing $f'$, reading $f$ from $f'$, and introducing the gradient-descent anchor that recurs to the end of the book. Chapter 7 delivers the rules — power, product, quotient, and chain — that turn the slow limit computations of this chapter into a fluent, near-instant skill. By the end of Chapter 7 you should be able to differentiate any function built from polynomials, roots, exponentials, logarithms, and trigonometric functions in well under a minute.
The investment of Part I was understanding. The payoff of Part II is fluency. And far down the road, in Chapter 14, you'll discover that this whole apparatus of rates and slopes is one half of a single great theorem — the Fundamental Theorem of Calculus — whose other half is the integral, and which ties differentiation and accumulation together as inverse operations. Everything you build from here points toward that summit.
Check Your Understanding. In one sentence each, explain why (a) instantaneous rate of change requires a limit but average rate of change does not, and (b) a function can be continuous at a point yet have no derivative there.
Answer
(a) Average rate compares two distinct moments, so it is an ordinary quotient $\Delta y/\Delta x$ with no division by zero; instantaneous rate would compare a moment with itself, giving $0/0$, which only a limit can resolve. (b) Continuity forbids breaks but permits corners, cusps, and vertical tangents — at such a point the difference quotient has no two-sided limit even though the function value is well-behaved (e.g. $|x|$ at $0$). If both are clear to you, you have the conceptual heart of the chapter.
Continue to: Exercises · Quiz · Case Study 1 · Case Study 2 · Key Takeaways · Further Reading