In Chapter 6 you learned what a derivative is: the limit of a difference quotient, the slope of a tangent line, the instantaneous rate of change. You also learned how to compute one — by writing $f'(x) = \lim_{h\to 0}\frac{f(x+h)-f(x)}{h}$ and...
Prerequisites
- chapter-06-the-derivative
Learning Objectives
- Apply the power, sum/difference, constant-multiple, product, quotient, and chain rules fluently
- Derive each rule from the limit definition of the derivative
- Recognize composite functions and apply the chain rule, including nested compositions
- Differentiate trigonometric, exponential, and logarithmic functions from memory
- Combine multiple rules in a single problem and verify results with sympy
In This Chapter
- 7.1 From Definition to Machinery
- 7.2 The Power Rule
- 7.3 Sum, Difference, and Constant Multiple Rules
- 7.4 The Product Rule
- 7.5 The Quotient Rule
- 7.6 The Chain Rule — The Heart of Differential Calculus
- 7.7 Derivatives of the Library Functions
- 7.8 Combining the Rules
- 7.9 Logarithmic Differentiation
- 7.10 Verifying with sympy
- 7.11 The Differentiation Cheat Sheet
- 7.12 Why Learn This When Computers Can Do It?
- Looking Ahead
- Reflection
Chapter 7 — Differentiation Rules
7.1 From Definition to Machinery
In Chapter 6 you learned what a derivative is: the limit of a difference quotient, the slope of a tangent line, the instantaneous rate of change. You also learned how to compute one — by writing $f'(x) = \lim_{h\to 0}\frac{f(x+h)-f(x)}{h}$ and grinding through the algebra. That definition is the bedrock, and you must never forget it. But as a method it is exhausting. Differentiating $f(x) = x^2\sin(3x+1)$ straight from the limit would take a page of trigonometric identities and squeeze theorems.
The pioneering insight of Newton and Leibniz was that differentiation obeys algebraic rules. If you know the derivatives of a few basic functions, you can combine those functions — adding, subtracting, multiplying, dividing, composing — and predict the derivative of the combination without ever returning to the limit. The limit does the work once, inside the proof of each rule; thereafter you apply the rule like a multiplication fact.
The Key Insight. The derivative is a machine that respects the structure of how functions are built. Sums, products, quotients, and compositions each have their own differentiation rule, derived once from the limit and then reused forever. Learn the rules, learn which one matches which structure, and you can differentiate any elementary function mechanically — turning calculus from an exercise in limits into an exercise in pattern recognition.
This chapter is the computational workhorse of the entire book. Here are the rules we will build and prove:
- Power rule: $\dfrac{d}{dx}(x^n) = n\,x^{n-1}$
- Sum / difference / constant multiple: $(f \pm g)' = f' \pm g'$, $(cf)' = cf'$
- Product rule: $(fg)' = f'g + fg'$
- Quotient rule: $\left(\dfrac{f}{g}\right)' = \dfrac{f'g - fg'}{g^2}$
- Chain rule: $\big(f(g(x))\big)' = f'(g(x))\cdot g'(x)$ — the most important rule in differential calculus.
Layered on top are the derivatives of the library functions — $\sin$, $\cos$, $e^x$, $\ln x$ and friends. Combine the library with the rules and you can differentiate anything elementary. This chapter rewards practice more than any other: differentiation must become as automatic as your multiplication tables. Read the derivations once for understanding, then drill the examples until your hand moves before your conscious mind catches up.
7.2 The Power Rule
We begin with the rule you will use most often, because polynomials are everywhere.
Power Rule. For any real number $n$, $$\frac{d}{dx}\big(x^n\big) = n\,x^{n-1}.$$
In Chapter 6 you verified this by hand for $n = 2$ (getting $2x$) and $n = 3$ (getting $3x^2$) from the limit definition. Let us prove it in general for a positive integer $n$, then explain why it extends to every real exponent.
Proof (positive integer $n$). Apply the binomial theorem to expand $(x+h)^n$:
$$(x + h)^n = x^n + n\,x^{n-1} h + \binom{n}{2} x^{n-2} h^2 + \cdots + h^n.$$
Form the difference quotient. The leading $x^n$ cancels:
$$\frac{(x + h)^n - x^n}{h} = \frac{n\,x^{n-1} h + \binom{n}{2} x^{n-2} h^2 + \cdots + h^n}{h} = n\,x^{n-1} + \binom{n}{2} x^{n-2} h + \cdots + h^{n-1}.$$
Every term except the first still carries a factor of $h$. As $h \to 0$ those terms all vanish, leaving
$$\frac{d}{dx}(x^n) = n\,x^{n-1}. \qquad \blacksquare$$
The formula holds far beyond positive integers. It is true for negative integers (provable via the quotient rule in §7.5), for rational exponents (via implicit differentiation, Chapter 8), and for any real exponent at all (via logarithmic differentiation, §7.9). The power rule is universally valid for every real $n$ — that uniformity is part of its beauty.
It is worth seeing the power rule at all three of our rigor levels at once, because it models how the whole book works. Intuitively, raising the exponent's worth of "copies" of $x$ and shaving one off captures how fast a power grows — a cube grows three times as fast as the underlying length, scaled by the face area $x^2$. Computationally, the rule is a one-step recipe: bring the exponent down front, subtract one. Formally, the binomial proof above establishes it for integers, and the two later proofs (implicit and logarithmic differentiation) extend it to every real exponent with full rigor. The same concept, three altitudes — keep that pattern in mind; every major idea in this book is presented this way.
Worked examples.
- $\dfrac{d}{dx}(x^5) = 5x^4.$
- $\dfrac{d}{dx}\big(x^{1/2}\big) = \tfrac12 x^{-1/2} = \dfrac{1}{2\sqrt{x}}.$ (The derivative of $\sqrt{x}$.)
- $\dfrac{d}{dx}\big(x^{-3}\big) = -3x^{-4} = -\dfrac{3}{x^4}.$
- $\dfrac{d}{dx}\big(x^{\pi}\big) = \pi\,x^{\pi - 1}.$ The exponent need not be rational.
Two special cases worth memorizing. Setting $n = 1$ gives $\dfrac{d}{dx}(x) = 1$ — a line of slope $1$ has slope $1$, as it must. And a constant $c = c\cdot x^0$ has derivative $0$: a horizontal line has zero slope, so $\dfrac{d}{dx}(c) = 0$.
Common Pitfall. The power rule applies to a variable raised to a constant, like $x^5$. It does not apply to a constant raised to a variable, like $2^x$. Many students reflexively write $\frac{d}{dx}(2^x) = x\,2^{x-1}$ — this is wrong. The base $2$ is fixed and the exponent varies, which is an exponential function, not a power. We handle $b^x$ correctly in §7.6: $\frac{d}{dx}(2^x) = 2^x\ln 2$. Ask yourself: is the variable in the base or the exponent? Power rule for the base; exponential rule for the exponent.
Check Your Understanding. Differentiate $f(x) = \dfrac{4}{x^3} + 5\sqrt[3]{x}$.
Answer
Rewrite with exponents: $f(x) = 4x^{-3} + 5x^{1/3}$. Then $f'(x) = 4\cdot(-3)x^{-4} + 5\cdot\tfrac13 x^{-2/3} = -\dfrac{12}{x^4} + \dfrac{5}{3x^{2/3}}.$ The whole trick is converting roots and reciprocals to powers before differentiating.
7.3 Sum, Difference, and Constant Multiple Rules
These three rules are so natural you might not notice you are using them, but they deserve a name and a one-line justification.
Linearity of the derivative. If $f$ and $g$ are differentiable and $c$ is a constant, then $$(f + g)' = f' + g', \qquad (f - g)' = f' - g', \qquad (cf)' = c\,f'.$$
Why. Each follows directly from the limit laws of Chapter 3, because the difference quotient distributes over sums and pulls out constants. For the sum:
$$(f+g)'(x) = \lim_{h\to 0}\frac{[f(x+h)+g(x+h)] - [f(x)+g(x)]}{h} = \lim_{h\to 0}\frac{f(x+h)-f(x)}{h} + \lim_{h\to 0}\frac{g(x+h)-g(x)}{h} = f'(x)+g'(x).$$
The limit of a sum is the sum of the limits — exactly the limit law from Chapter 3 — and the constant-multiple case is identical with $c$ factored out front. Taken together these say the derivative is a linear operator: it passes through sums and scales through constants. That single property is what makes differentiating a polynomial trivial.
Worked example. Differentiate term by term:
$$\frac{d}{dx}\big(3x^4 - 5x^2 + 7\big) = 3\cdot 4x^3 - 5\cdot 2x + 0 = 12x^3 - 10x.$$
Geometric Intuition. Adding two functions stacks their graphs vertically: at each $x$, the height of $f+g$ is the sum of the two heights. Slopes stack the same way — if one ramp rises $2$ units per step and another rises $3$, walking up both at once rises $5$. The constant-multiple rule is a vertical stretch: doubling a function's height doubles the steepness of every tangent line. Linearity of the derivative is just "stacked graphs have stacked slopes."
7.4 The Product Rule
Now the structure gets interesting. The derivative of a product is not the product of the derivatives — a fact that surprises everyone at first.
Product Rule. If $f$ and $g$ are differentiable, then $$(fg)' = f'g + fg'.$$
In words: derivative of the first times the second, plus the first times the derivative of the second. There are two terms, and each one differentiates exactly one factor while leaving the other alone.
Proof. Start from the definition and use the classic trick of adding and subtracting a bridging term — here $f(x+h)\,g(x)$ — to manufacture two recognizable difference quotients:
$$(fg)'(x) = \lim_{h \to 0} \frac{f(x+h)g(x+h) - f(x)g(x)}{h}.$$
Insert $-f(x+h)g(x) + f(x+h)g(x)$ in the numerator (a net of zero):
$$= \lim_{h \to 0} \frac{f(x+h)g(x+h) - f(x+h)g(x) \;+\; f(x+h)g(x) - f(x)g(x)}{h}.$$
Group and factor:
$$= \lim_{h \to 0} \left[\,f(x+h)\cdot \frac{g(x+h) - g(x)}{h} \;+\; g(x) \cdot \frac{f(x+h) - f(x)}{h}\,\right].$$
Now take the limit. The factor $f(x+h) \to f(x)$ because $f$, being differentiable, is continuous (Chapter 6). The two difference quotients become $g'(x)$ and $f'(x)$. Therefore
$$(fg)'(x) = f(x)g'(x) + g(x)f'(x). \qquad \blacksquare$$
Geometric Intuition. Picture $f(x)$ and $g(x)$ as the side lengths of a rectangle, so $fg$ is its area. Nudge $x$ by $dx$: one side grows by $df = f'\,dx$, the other by $dg = g'\,dx$. The area gains a tall thin strip ($g\cdot df$), a wide flat strip ($f\cdot dg$), and a tiny corner square ($df\cdot dg$). The corner is second order — proportional to $dx^2$ — so it vanishes in the limit, leaving $d(fg) = f\,dg + g\,df$. The product rule is the statement that "a rectangle grows along its two sides, and the corner doesn't matter."
Common Pitfall. Do not write $(fg)' = f'g'$. The product of the derivatives is almost never the derivative of the product. Test it on $f = g = x$: the false rule gives $(x\cdot x)' = 1\cdot 1 = 1$, but $(x^2)' = 2x$. The product rule has two terms for a reason — keep both.
Worked examples.
-
$\dfrac{d}{dx}\big(x^2 e^x\big)$ with $f = x^2$ ($f' = 2x$) and $g = e^x$ ($g' = e^x$): $$2x\cdot e^x + x^2\cdot e^x = (x^2 + 2x)e^x.$$
-
$\dfrac{d}{dx}\big(x \sin x\big) = 1\cdot \sin x + x\cdot \cos x = \sin x + x\cos x.$
-
$\dfrac{d}{dx}\big(\sqrt{x}\,\ln x\big) = \dfrac{1}{2\sqrt{x}}\cdot \ln x + \sqrt{x}\cdot \dfrac{1}{x} = \dfrac{\ln x}{2\sqrt{x}} + \dfrac{1}{\sqrt{x}}.$
Math Major Sidebar — The product rule generalizes (the Leibniz rule). For three factors, apply the rule twice: $(fgh)' = f'gh + fg'h + fgh'$ — differentiate each factor in turn, leaving the others alone. For an $n$-fold product the pattern continues: sum over which single factor you differentiate. There is even a higher-order analog for the $n$-th derivative of a product, $(fg)^{(n)} = \sum_{k=0}^n \binom{n}{k} f^{(k)} g^{(n-k)}$, which mirrors the binomial theorem exactly — the same coefficients $\binom{n}{k}$ appear. This is the general Leibniz rule, and the structural echo of the binomial theorem is not a coincidence; both count the ways to distribute an operation across a product.
7.5 The Quotient Rule
Division of functions gets its own rule. It looks fussier than the product rule, but it follows from it.
Quotient Rule. If $g(x) \neq 0$, then $$\left(\frac{f}{g}\right)' = \frac{f'g - fg'}{g^2}.$$
Mnemonic. "Low D-high minus high D-low, over the square of the low" — where "high" is the numerator $f$, "low" is the denominator $g$, and "D" means "derivative of." The order matters: unlike the product rule, the quotient rule has a minus sign, so $f'g$ must come first.
Derivation. Write the quotient as a product, $\dfrac{f}{g} = f\cdot g^{-1}$, and apply the product rule together with the power-and-chain rule $\big(g^{-1}\big)' = -g^{-2}g'$ (we formalize the chain rule in §5, but this case is just the reciprocal):
$$\left(f\cdot g^{-1}\right)' = f'\,g^{-1} + f\cdot\big(-g^{-2}g'\big) = \frac{f'}{g} - \frac{fg'}{g^2} = \frac{f'g - fg'}{g^2}.$$
The two fractions, placed over the common denominator $g^2$, give the boxed formula. So the quotient rule is not a new axiom — it is the product rule wearing a disguise.
Worked examples.
-
$\dfrac{d}{dx}\!\left(\dfrac{x^2 + 1}{x - 1}\right) = \dfrac{2x(x-1) - (x^2+1)(1)}{(x-1)^2} = \dfrac{2x^2 - 2x - x^2 - 1}{(x-1)^2} = \dfrac{x^2 - 2x - 1}{(x-1)^2}.$
-
Deriving $\tan x$. Since $\tan x = \dfrac{\sin x}{\cos x}$, with $(\sin x)' = \cos x$ and $(\cos x)' = -\sin x$: $$\frac{d}{dx}(\tan x) = \frac{\cos x\cdot\cos x - \sin x\cdot(-\sin x)}{\cos^2 x} = \frac{\cos^2 x + \sin^2 x}{\cos^2 x} = \frac{1}{\cos^2 x} = \sec^2 x.$$ The Pythagorean identity collapses the numerator to $1$, giving the clean result $\dfrac{d}{dx}(\tan x) = \sec^2 x$. The remaining trig derivatives ($\sec$, $\csc$, $\cot$) come out the same way; they are exercises.
Common Pitfall. The numerator of the quotient rule is not symmetric — $f'g - fg' \ne fg' - f'g$. Swapping the two terms flips the sign of every answer. When you are unsure which term comes first, fall back on the product-rule derivation ($f\cdot g^{-1}$), which never lets you misorder the subtraction. And don't forget to square the denominator.
7.6 The Chain Rule — The Heart of Differential Calculus
Here is the rule that matters most. The overwhelming majority of derivatives you will ever compute involve composite functions — a function inside a function — and the chain rule is the only tool that handles them.
A composite function $f(g(x))$ applies $g$ first, then feeds the result into $f$. Think of $\sin(x^2)$: square first, then take the sine. Or $e^{3x}$: triple first, then exponentiate. Or $\sqrt{1 + x^4}$: build $1 + x^4$, then take the square root. Most "real" functions are towers of compositions.
Chain Rule. If $y = f(g(x))$, where $g$ is differentiable at $x$ and $f$ is differentiable at $g(x)$, then $$\frac{dy}{dx} = f'\big(g(x)\big)\cdot g'(x).$$ Equivalently, in Leibniz notation, if $u = g(x)$ and $y = f(u)$, then $$\frac{dy}{dx} = \frac{dy}{du}\cdot\frac{du}{dx}.$$
In words: differentiate the outer function (leaving the inner alone), then multiply by the derivative of the inner. The Leibniz form is a memory aid that looks like fractions cancelling — and there is real content to that, which the derivation explains.
Why it should be true (the intuition). If $y$ changes $3$ times as fast as $u$, and $u$ changes $5$ times as fast as $x$, then $y$ changes $3\times 5 = 15$ times as fast as $x$. Rates of change multiply through a chain of dependencies. That is the whole idea; the boxed formula is just this sentence written symbolically.
Derivation sketch. Write the difference quotient and multiply by a clever form of $1$, namely $\dfrac{g(x+h)-g(x)}{g(x+h)-g(x)}$:
$$\frac{f(g(x+h)) - f(g(x))}{h} = \underbrace{\frac{f(g(x+h)) - f(g(x))}{g(x+h) - g(x)}}_{\to\, f'(g(x))} \cdot \underbrace{\frac{g(x+h) - g(x)}{h}}_{\to\, g'(x)}.$$
As $h \to 0$, the inner change $g(x+h) - g(x) \to 0$ (because $g$ is continuous), so the first factor is a difference quotient for $f$ at the point $g(x)$ and approaches $f'(g(x))$; the second factor approaches $g'(x)$. The product approaches $f'(g(x))\,g'(x)$. $\;\blacksquare$
Math Major Sidebar — Repairing the derivation. The sketch above has a genuine flaw: if $g$ is constant near $x$, then $g(x+h) - g(x) = 0$ and we divided by zero. The honest proof avoids this by defining an auxiliary function $\phi(t) = \frac{f(t) - f(g(x))}{t - g(x)}$ for $t \ne g(x)$ and $\phi(g(x)) = f'(g(x))$; differentiability of $f$ makes $\phi$ continuous at $g(x)$. Then $f(g(x+h)) - f(g(x)) = \phi(g(x+h))\,\big(g(x+h)-g(x)\big)$ holds even when the inner difference is zero (both sides are $0$), so dividing by $h$ and taking the limit gives $f'(g(x))\,g'(x)$ with no illegal division. This is the standard rigorous argument (Spivak, Calculus); the intuition is unchanged, but the bookkeeping is now airtight.
Worked Examples — One Layer
Example 1. $y = (3x^2 + 1)^7$. Outer is $u^7$, inner is $u = 3x^2 + 1$. $$y' = 7(3x^2+1)^6 \cdot 6x = 42x(3x^2+1)^6.$$
Example 2. $y = \sin(x^2)$. Outer $\sin$ (derivative $\cos$), inner $x^2$ (derivative $2x$). $$y' = \cos(x^2)\cdot 2x = 2x\cos(x^2).$$
Example 3. $y = e^{x^2}$. Outer $e^u$ (derivative $e^u$), inner $x^2$ (derivative $2x$). $$y' = e^{x^2}\cdot 2x = 2x\,e^{x^2}.$$
Example 4. $y = \ln(\sin x)$. Outer $\ln$ (derivative $1/u$), inner $\sin x$ (derivative $\cos x$). $$y' = \frac{1}{\sin x}\cdot \cos x = \cot x.$$
Common Pitfall. The single most common chain-rule error is forgetting to multiply by the inner derivative. The derivative of $\sin(x^2)$ is $\cos(x^2)\cdot 2x$, not $\cos(x^2)$ and not $\cos(2x)$. The outer derivative $\cos(x^2)$ is only half the answer; the inner derivative $2x$ must come along. Whenever you write the outer derivative, immediately ask "times the derivative of what's inside?" and write that factor before you can forget it.
Check Your Understanding. Differentiate $y = \cos(5x - 1)$ and $y = \sqrt{x^2 + 9}$.
Answer
For $\cos(5x-1)$: outer $\cos$ gives $-\sin$, inner $5x-1$ has derivative $5$, so $y' = -5\sin(5x-1)$. For $\sqrt{x^2+9} = (x^2+9)^{1/2}$: outer power gives $\tfrac12(x^2+9)^{-1/2}$, inner $x^2+9$ has derivative $2x$, so $y' = \dfrac{2x}{2\sqrt{x^2+9}} = \dfrac{x}{\sqrt{x^2+9}}.$
Worked Examples — Multiple Layers
When the composition has several layers — $f(g(h(x)))$ — apply the chain rule repeatedly, peeling from the outside in:
$$\big(f\circ g\circ h\big)'(x) = f'\big(g(h(x))\big)\cdot g'\big(h(x)\big)\cdot h'(x).$$
Example. $y = \sin\!\big(\sqrt{x^2 + 1}\big)$. Three layers: outer $\sin u$, middle $\sqrt{v}$, inner $x^2 + 1$.
$$y' = \cos\!\big(\sqrt{x^2+1}\big) \cdot \frac{1}{2\sqrt{x^2+1}} \cdot 2x = \frac{x\,\cos\!\big(\sqrt{x^2+1}\big)}{\sqrt{x^2+1}}.$$
Differentiate the outermost first ($\cos$ of the inside), then the next ($\frac{1}{2\sqrt{\cdot}}$), then the innermost ($2x$), and multiply all three. The $2x$ and the $2$ in the denominator partially cancel.
The Key Insight. The chain rule is the calculus operation for compositions, the most natural way functions combine, and it is the central rule of differential calculus — every other rule is convenience, but the chain rule is necessity. You will use it constantly: in implicit differentiation and related rates (Chapter 8), in differentiating inverse functions, in separable differential equations (Chapter 19), in the multivariable gradient and backpropagation that trains neural networks (Chapter 30), and throughout the integral theorems of vector calculus (Part VII). If you master one thing in this chapter, master this.
Real-World Application — Backpropagation in deep learning (data science). A neural network is a giant composition: the input passes through layer after layer, each a function of the last. Training the network means computing the derivative of a loss function with respect to millions of internal weights, and that derivative is computed by one massive application of the chain rule, run from the output back toward the input. This is backpropagation — the algorithm behind every modern AI system. The "rates of change multiply through a chain" intuition from this section is, quite literally, how a language model learns. We meet this anchor example again in Chapter 30, where the chain rule becomes the multivariable gradient.
Seeing the Chain Rule Numerically
The chain rule is not just an algebraic identity — it is a measurable fact about slopes, and we can confirm it numerically. The claim is that the slope of $\sin(x^2)$ at any point equals $2x\cos(x^2)$. Let us check by comparing the symbolic formula against a raw finite-difference estimate of the slope.
# Confirm the chain rule: the measured slope of sin(x^2) matches 2x*cos(x^2).
import numpy as np
f = lambda x: np.sin(x**2) # composite function
fprime = lambda x: 2*x*np.cos(x**2) # chain-rule prediction
h = 1e-6 # tiny step for the difference quotient
xs = np.array([0.5, 1.0, 1.5, 2.0])
measured = (f(xs + h) - f(xs - h)) / (2*h) # central-difference slope estimate
predicted = fprime(xs)
for x, m, p in zip(xs, measured, predicted):
print(f"x={x:.1f} measured slope={m:+.6f} chain-rule={p:+.6f}")
# Output:
# x=0.5 measured slope=+0.968912 chain-rule=+0.968912
# x=1.0 measured slope=+1.080605 chain-rule=+1.080605
# x=1.5 measured slope=-1.884521 chain-rule=-1.884521
# x=2.0 measured slope=-2.614574 chain-rule=-2.614574
The measured slopes — obtained by literally rising-over-running with a microscopic step — agree with the chain-rule formula to six decimal places. The factor of $2x$ that the chain rule forces us to include is exactly what the geometry demands; drop it and the columns would not match. This is the geometry-and-algebra theme made concrete: the formula $2x\cos(x^2)$ and the measured slope are the same number, viewed two ways.
7.7 Derivatives of the Library Functions
The rules above tell you how to combine functions. You also need the derivatives of the basic building blocks. Memorize these — they are the vocabulary of calculus.
$$\frac{d}{dx}(\sin x) = \cos x \qquad \frac{d}{dx}(\cos x) = -\sin x \qquad \frac{d}{dx}(\tan x) = \sec^2 x$$ $$\frac{d}{dx}(\sec x) = \sec x\tan x \qquad \frac{d}{dx}(\csc x) = -\csc x\cot x \qquad \frac{d}{dx}(\cot x) = -\csc^2 x$$ $$\frac{d}{dx}(e^x) = e^x \qquad \frac{d}{dx}(\ln x) = \frac{1}{x}\ \ (x>0) \qquad \frac{d}{dx}(b^x) = b^x\ln b \qquad \frac{d}{dx}(\log_b x) = \frac{1}{x\ln b}$$ $$\frac{d}{dx}(\arcsin x) = \frac{1}{\sqrt{1 - x^2}} \qquad \frac{d}{dx}(\arccos x) = -\frac{1}{\sqrt{1 - x^2}} \qquad \frac{d}{dx}(\arctan x) = \frac{1}{1 + x^2}$$
These are not arbitrary facts to swallow; the most important ones can be derived, and seeing the derivation makes them stick.
Why $\frac{d}{dx}(\sin x) = \cos x$
Start from the definition and use the angle-sum identity $\sin(x+h) = \sin x\cos h + \cos x\sin h$:
$$\frac{\sin(x+h) - \sin x}{h} = \frac{\sin x\cos h + \cos x\sin h - \sin x}{h} = \sin x\cdot\frac{\cos h - 1}{h} + \cos x\cdot\frac{\sin h}{h}.$$
Now invoke the two foundational trig limits from Chapter 3:
- $\dfrac{\sin h}{h} \to 1$ as $h \to 0$ (the famous squeeze-theorem limit), and
- $\dfrac{\cos h - 1}{h} \to 0$ as $h \to 0$ (provable from $\cos h - 1 = -2\sin^2(h/2)$).
Therefore the difference quotient tends to $\sin x\cdot 0 + \cos x\cdot 1 = \cos x$. $\;\blacksquare$ The same computation with $\cos(x+h)$ gives $\dfrac{d}{dx}(\cos x) = -\sin x$ — note the minus sign, which is the reason sine and cosine endlessly cycle into one another under differentiation.
Warning. These trig derivatives are valid only when $x$ is measured in radians. If $x$ is in degrees, the limit $\frac{\sin h}{h}$ no longer equals $1$ (it equals $\pi/180$), and an unwanted constant infects every derivative: $\frac{d}{dx}(\sin x^\circ) = \frac{\pi}{180}\cos x^\circ$. This is the reason mathematics always uses radians. Degrees would clutter every formula in calculus with factors of $\pi/180$.
Why $\frac{d}{dx}(e^x) = e^x$
This is the defining property of the number $e$: it is the unique base whose exponential function is its own derivative. Among all curves $b^x$, only $e^x$ has slope equal to its own height at every point. We will derive this rigorously in Chapter 23, where the series $e^x = \sum_{n=0}^\infty \frac{x^n}{n!}$ is differentiated term by term and reproduces itself. For now, take it as the special property that makes $e \approx 2.71828$ the natural base of calculus.
The general exponential follows by the chain rule: writing $b^x = e^{x\ln b}$,
$$\frac{d}{dx}(b^x) = \frac{d}{dx}\,e^{x\ln b} = e^{x\ln b}\cdot \ln b = b^x\ln b.$$
The extra factor $\ln b$ is exactly why $e$ is special — it is the only base for which $\ln b = 1$ and the annoying constant disappears.
Historical Note. Leonhard Euler (1707–1783) identified $e$ as the natural base of exponentials and logarithms in the 1730s, and the symbol "$e$" is due to him (likely for exponential, not for his own name — Euler was not one for self-promotion). The property $\frac{d}{dx}e^x = e^x$ makes $e$ the backbone of every growth-and-decay model in science: radioactive decay, compound interest, population dynamics, and the cooling of your coffee all run on $e^{kt}$ precisely because differentiation returns the function unchanged.
Real-World Application — Radioactive decay and carbon dating (physics/chemistry). A quantity decaying at a rate proportional to itself satisfies $\frac{dN}{dt} = -kN$, whose solution is $N(t) = N_0 e^{-kt}$. Differentiating this with the chain rule returns $-kN_0e^{-kt} = -kN(t)$, confirming it solves the equation. Because $e^x$ is its own derivative, exponentials are the only functions with this self-proportional-rate property — which is why carbon-14 dating, drug half-lives, and capacitor discharge all share the same mathematical form. We develop these models fully in Chapter 19.
7.8 Combining the Rules
Real problems mix rules. The strategy is always the same — identify the outermost operation first, apply its rule, and recurse inward on the smaller pieces it leaves behind.
Strategy. 1. Look at the function as a whole. What is the outermost operation — a sum, a product, a quotient, or a composition? 2. Apply the matching rule for that outermost operation. 3. The rule hands you smaller derivative subproblems. Repeat steps 1–2 on each. 4. When you reach a library function (polynomial, trig, exp, log), write down its known derivative and stop.
Example: Product + Chain
Differentiate $y = x^2\sin(3x + 1)$. The outermost operation is a product of $x^2$ and $\sin(3x+1)$, so start with the product rule:
$$y' = (x^2)'\cdot \sin(3x+1) + x^2\cdot \big(\sin(3x+1)\big)'.$$
The two subproblems: $(x^2)' = 2x$, and $\big(\sin(3x+1)\big)'$ needs the chain rule, giving $\cos(3x+1)\cdot 3$. Assemble:
$$y' = 2x\sin(3x+1) + 3x^2\cos(3x+1).$$
Example: Quotient + Chain
Differentiate $y = \dfrac{e^{x^2}}{x^2 + 1}$. Outermost operation is a quotient:
$$y' = \frac{\big(e^{x^2}\big)'(x^2+1) - e^{x^2}\,(x^2+1)'}{(x^2+1)^2}.$$
Subproblems: $\big(e^{x^2}\big)' = 2x\,e^{x^2}$ (chain rule) and $(x^2+1)' = 2x$. Substitute and factor the common $2x\,e^{x^2}$:
$$y' = \frac{2x\,e^{x^2}(x^2+1) - 2x\,e^{x^2}}{(x^2+1)^2} = \frac{2x\,e^{x^2}\big[(x^2+1) - 1\big]}{(x^2+1)^2} = \frac{2x^3\,e^{x^2}}{(x^2+1)^2}.$$
Example: Product of Two Chains
Differentiate $f(x) = (3x^2+1)^5\sin(2x)$. Product rule outermost, with each factor needing the chain rule:
$$\big[(3x^2+1)^5\big]' = 5(3x^2+1)^4\cdot 6x = 30x(3x^2+1)^4, \qquad \big[\sin(2x)\big]' = 2\cos(2x).$$
$$f'(x) = 30x(3x^2+1)^4\sin(2x) + 2(3x^2+1)^5\cos(2x).$$
Check Your Understanding. Differentiate $g(x) = e^{x^2}\ln(x^2 + 1)$.
Answer
Product rule with two chain-rule factors. $\big(e^{x^2}\big)' = 2x\,e^{x^2}$ and $\big(\ln(x^2+1)\big)' = \dfrac{2x}{x^2+1}$. So $g'(x) = 2x\,e^{x^2}\ln(x^2+1) + e^{x^2}\cdot\dfrac{2x}{x^2+1}.$Common Pitfall — corners, cusps, and discontinuities. Every rule in this chapter assumes the functions involved are differentiable. The rules say nothing useful at points where a derivative fails to exist: jumps and other discontinuities (no derivative at all), corners like $|x|$ at $0$ (left and right slopes disagree), and cusps like $x^{2/3}$ at $0$ (the tangent goes vertical). Before mechanically applying a rule, make sure you are at a point where the derivative actually exists. The machinery is fast, but it is not a license to ignore where it breaks.
7.9 Logarithmic Differentiation
Some functions defeat the standard rules — especially when the variable appears in both a base and an exponent. The trick is to take the natural logarithm of both sides first, which converts products into sums and exponents into coefficients, then differentiate.
The motivating problem: $y = x^x$ for $x > 0$. Neither the power rule (which needs a constant exponent) nor the exponential rule (which needs a constant base) applies — here the variable is in both places. Take $\ln$ of both sides:
$$\ln y = x\ln x.$$
Differentiate both sides with respect to $x$. The left side needs the chain rule, $\frac{d}{dx}\ln y = \frac{1}{y}\,y'$ (this is implicit differentiation, formalized in Chapter 8 — every $y$ picks up a factor of $y'$ as you differentiate). The right side is a product:
$$\frac{1}{y}\,y' = 1\cdot\ln x + x\cdot\frac{1}{x} = \ln x + 1.$$
Solve for $y'$ by multiplying through by $y$, then substitute $y = x^x$:
$$y' = y(\ln x + 1) = x^x(\ln x + 1).$$
So $\dfrac{d}{dx}(x^x) = x^x(\ln x + 1)$.
A second example: $y = x^{\sin x}$. Same recipe. $\ln y = \sin x\,\ln x$, so
$$\frac{y'}{y} = \cos x\,\ln x + \sin x\cdot\frac{1}{x} \quad\Longrightarrow\quad y' = x^{\sin x}\left(\cos x\,\ln x + \frac{\sin x}{x}\right).$$
Logarithmic differentiation also tames hairy products and quotients — taking $\ln$ turns $\frac{u\,v}{w}$ into $\ln u + \ln v - \ln w$, after which differentiating is a sum of simple terms. It is the cleanest route whenever a function is a tangle of factors.
7.10 Verifying with sympy
Hand computation builds understanding; the machine builds confidence and power. Following our standard three-tier pattern — state the problem, solve by hand, then confirm symbolically — here is sympy agreeing with every hand result above.
# Verify the chapter's hand-computed derivatives symbolically with sympy.
import sympy as sp
x = sp.symbols('x', positive=True) # positive=True so x**x and ln x are well-defined
checks = {
"x**5": sp.diff(x**5, x),
"sin(x**2)": sp.diff(sp.sin(x**2), x),
"e^{x^2}/(x^2+1)": sp.simplify(sp.diff(sp.exp(x**2)/(x**2 + 1), x)),
"x**x": sp.diff(x**x, x),
"(3x^2+1)^5 sin(2x)": sp.diff((3*x**2 + 1)**5 * sp.sin(2*x), x),
"tan(x)": sp.simplify(sp.diff(sp.tan(x), x)),
}
for name, deriv in checks.items():
print(f"{name:22s} -> {deriv}")
# Output:
# x**5 -> 5*x**4
# sin(x**2) -> 2*x*cos(x**2)
# e^{x^2}/(x^2+1) -> 2*x**3*exp(x**2)/(x**2 + 1)**2
# x**x -> x**x*(log(x) + 1)
# (3x^2+1)^5 sin(2x) -> 30*x*(3*x**2 + 1)**4*sin(2*x) + 2*(3*x**2 + 1)**5*cos(2*x)
# tan(x) -> sec(x)**2 (sympy may display tan(x)**2 + 1, an equivalent form)
Every line matches the hand computation — the quotient-plus-chain result simplifies to $\frac{2x^3 e^{x^2}}{(x^2+1)^2}$, and $x^x$ gives $x^x(\ln x + 1)$ exactly as we derived. (sympy writes $\ln$ as log; in sympy, log is the natural logarithm.)
Computational Note.
sympydifferentiates by applying these same rules internally — it is not magic, just the chain, product, and quotient rules executed flawlessly and tirelessly. Use it to check your work, never to replace learning the rules. When you derive by hand and the machine agrees, you gain certainty; when it disagrees, you have found a bug — usually a dropped chain-rule factor or a sign error in the quotient rule. The disagreements are where the learning happens. (For numerical, formula-free differentiation — useful when no symbolic form exists —numpy.gradientandscipy.misc.derivativeuse finite differences, the difference quotient with a small fixed $h$; we lean on those in later chapters.)
7.11 The Differentiation Cheat Sheet
Print this, tape it to your wall, and refer to it until every entry lives in your memory.
The rules:
| Rule | Statement |
|---|---|
| Power | $(x^n)' = n\,x^{n-1}$ (every real $n$) |
| Sum / difference | $(f \pm g)' = f' \pm g'$ |
| Constant multiple | $(cf)' = c\,f'$ |
| Product | $(fg)' = f'g + fg'$ |
| Quotient | $\left(\dfrac{f}{g}\right)' = \dfrac{f'g - fg'}{g^2}$ |
| Reciprocal | $\left(\dfrac{1}{g}\right)' = -\dfrac{g'}{g^2}$ |
| Chain | $\big(f(g(x))\big)' = f'(g(x))\,g'(x)$ |
| Logarithmic | $\big(\ln f\big)' = \dfrac{f'}{f}$ |
The library of standard derivatives:
| Function | Derivative | Function | Derivative |
|---|---|---|---|
| $c$ | $0$ | $\sin x$ | $\cos x$ |
| $x^n$ | $n\,x^{n-1}$ | $\cos x$ | $-\sin x$ |
| $e^x$ | $e^x$ | $\tan x$ | $\sec^2 x$ |
| $b^x$ | $b^x\ln b$ | $\sec x$ | $\sec x\tan x$ |
| $\ln x$ | $1/x$ | $\csc x$ | $-\csc x\cot x$ |
| $\log_b x$ | $1/(x\ln b)$ | $\cot x$ | $-\csc^2 x$ |
| $\arcsin x$ | $1/\sqrt{1-x^2}$ | $\arctan x$ | $1/(1+x^2)$ |
| $\arccos x$ | $-1/\sqrt{1-x^2}$ | $\sinh x$ | $\cosh x$ |
Hyperbolic functions round out the library. Defined as $\sinh x = \frac{e^x - e^{-x}}{2}$ and $\cosh x = \frac{e^x + e^{-x}}{2}$, they satisfy the identity $\cosh^2 x - \sinh^2 x = 1$ (a sign flip from $\sin^2 + \cos^2 = 1$). Differentiate them straight from the exponential definition: $(\sinh x)' = \frac{e^x + e^{-x}}{2} = \cosh x$, and $(\cosh x)' = \frac{e^x - e^{-x}}{2} = \sinh x$ — note that, unlike the circular cosine, there is no minus sign. Hyperbolic functions describe the shape of a hanging chain (the catenary $y = a\cosh(x/a)$), the cables of suspension bridges, and the "rapidity" of special relativity.
Add to Your Modeling Portfolio. Return to the modeling function $f(t)$ you wrote down in Chapter 2 and, using the rules of this chapter, compute its derivative $f'(t)$ symbolically and by hand — then confirm it with
sympy. You now have every tool needed, whatever form your model takes. Biology: if you chose logistic growth $P(t) = \dfrac{K}{1 + Ae^{-rt}}$, differentiate it (quotient + chain) to get the growth rate $P'(t)$ — the right-hand side of the logistic differential equation you will solve in Chapter 19. Economics: differentiate your cost or revenue function to obtain marginal cost or revenue, the central object of marginal analysis. Physics: differentiate your position function $s(t)$ to get velocity $v(t) = s'(t)$, and again for acceleration $a(t) = s''(t)$. Data Science: differentiate your loss function $L(w)$ to get the gradient $L'(w)$ that gradient descent (the anchor example from Chapter 6) steps along — the seed of Chapter 30. Write the explicit derivative formula into your portfolio; you will differentiate it again, numerically and symbolically, in later chapters.
7.12 Why Learn This When Computers Can Do It?
Every modern computer-algebra system — sympy, Mathematica, Maple — differentiates instantly and perfectly. So why drill these rules by hand? Four reasons.
- Understanding. Knowing how a derivative is built tells you what it means. When the product rule produces two terms, you understand a marginal-cost calculation or a related-rates problem in a way that a black-box answer never conveys.
- Verification. Computer systems are reliable but not infallible — input mistakes, branch-cut subtleties, and simplification quirks all occur. You must be able to sanity-check the machine.
- Speed and fluency. Simple derivatives belong in your head. Stopping to launch a tool for $\frac{d}{dx}(x^2\sin x)$ breaks your train of thought; fluency keeps you in flow.
- Foundation. Implicit differentiation, related rates, optimization, differential equations, Taylor series, and the entire edifice of vector calculus all assume you can differentiate without thinking. The rules here are the floor every later chapter stands on.
The deepest reason connects to automatic differentiation (autodiff), the technology behind modern machine learning. Autodiff computes exact derivatives by applying the chain rule, programmatically, to every elementary operation a program performs — forward mode (efficient for few inputs) or reverse mode, called backpropagation (efficient for the many-input case of neural networks). It is, at heart, the chain rule of §7.6 executed millions of times per second. The rules you are drilling by hand are the very rules a computer applies to train an AI. Understanding them by hand is understanding what the machine is doing.
Looking Ahead
You can now differentiate any elementary function — any combination of powers, roots, trig, exponentials, and logarithms — by recognizing its structure and applying the matching rule. That fluency is the prerequisite for everything that follows.
Chapter 8 develops two essential applications of the chain rule: implicit differentiation, for curves like $x^2 + y^2 = 1$ where $y$ is not isolated (differentiate both sides, and each $y$ picks up a factor of $y'$, giving $2x + 2y\,y' = 0$, so $y' = -x/y$); and related rates, where two quantities change together in time. Chapters 9 and 10 turn derivatives loose on curve sketching and optimization. Chapter 11 uses the derivative to build linear approximations and Newton's method. And far ahead, in Chapter 30, the chain rule reappears as the multivariable gradient that powers machine learning. From here on, differentiation is assumed — it is the tool, not the topic.
Reflection
Differentiation began, in Chapter 6, as a limit — slow, careful, computed one function at a time. This chapter turned it into an algebra. The product rule, the quotient rule, and above all the chain rule let you build the derivative of any function from the derivatives of its parts, mechanically and without fear. That is the gift Newton and Leibniz gave us: not just the idea of the derivative, but a calculus of it — a system of rules that makes the infinite limit routine. The chain rule in particular is one of the most important equations in all of mathematics; it trains every neural network and underlies every theorem still to come. Drill these rules until they are reflex. Then turn the page, and let the derivative go to work.
Continue to: Exercises · Quiz · Case Study 1 · Case Study 2 · Key Takeaways · Further Reading