For twenty-eight chapters, almost every function in this book has had the shape $f: \mathbb{R} \to \mathbb{R}$ — one number in, one number out. Its graph is a curve in the plane, and its derivative is a single slope. That world was rich enough to...
Prerequisites
- Chapter 28: Vector-Valued Functions
Learning Objectives
- Identify the domain and range of $f(x,y)$ and $f(x,y,z)$.
- Sketch surfaces $z = f(x,y)$ and level curves $f(x,y) = c$, and read a contour map.
- Compute multivariable limits, and prove non-existence by the two-path method.
- Recognize continuity in several variables.
- Compute partial derivatives and higher/mixed partials; apply Clairaut's theorem.
- Interpret partial derivatives geometrically and physically, and build the tangent plane and linearization.
In This Chapter
- 29.1 Beyond One Variable
- 29.2 Domains and Ranges
- 29.3 Graphs of $f(x, y)$ Are Surfaces
- 29.4 Level Curves and Contour Maps
- 29.5 Level Surfaces
- 29.6 Limits in Several Variables — and Why They Are Subtler
- 29.7 Continuity
- 29.8 Partial Derivatives
- 29.9 Higher-Order Partial Derivatives and Clairaut's Theorem
- 29.10 The Tangent Plane and Linearization
- 29.11 Functions of Three Variables
- 29.12 Computation: Surfaces, Contours, and Partials in Python
- 29.13 Applications Across Fields
- 29.14 The Geometric Picture: Features of a Surface
- Looking Ahead
- Reflection
Chapter 29 — Functions of Several Variables
29.1 Beyond One Variable
For twenty-eight chapters, almost every function in this book has had the shape $f: \mathbb{R} \to \mathbb{R}$ — one number in, one number out. Its graph is a curve in the plane, and its derivative is a single slope. That world was rich enough to carry us from limits all the way through Taylor series and vector-valued curves. But it is too small for the questions that actually drive science.
The temperature of a metal plate does not depend on one number; it depends on where you are — a pair $(x, y)$. The pressure of a gas depends on volume and temperature. A firm's output depends on labor and capital. The loss a neural network incurs depends on millions of weights at once. The natural object here is a function of several variables:
$$f: \mathbb{R}^2 \to \mathbb{R}, \qquad (x, y) \mapsto f(x, y).$$
Two inputs, one output. Here are five you should keep in mind as we go:
- Temperature $T(x, y)$ at the point $(x, y)$ on a heated plate.
- Pressure $P(V, T)$ of a fixed quantity of gas (Section 29.11).
- Profit or output $Q(L, K)$ as a function of labor $L$ and capital $K$ (Section 29.11).
- Elevation $h(x, y)$ above a point on a map.
- The joint probability density $p(x, y)$ of two random variables.
The single most important shift to absorb is geometric. The graph of $f(x)$ was a curve living in 2D. The graph of $f(x, y)$ is a surface living in 3D: over each point $(x, y)$ of the plane you raise a pin to height $z = f(x, y)$, and the tips of all those pins form a surface. And the derivative — which was one slope — now becomes a whole family of slopes, one for every direction you could walk away from a point. Taming that family is the project of this chapter and the next.
Higher still, $f(x, y, z): \mathbb{R}^3 \to \mathbb{R}$ assigns a number to each point of space — a temperature field in a room, a density inside a star. Its "graph" would live in 4D, beyond drawing, but as we will see, we can still picture it through its level surfaces.
The Key Insight. A multivariable function sends a point of higher-dimensional space to a single number. The geometry expands accordingly — graphs become surfaces, level sets become curves or surfaces — and the derivative fractures into infinitely many directional slopes. Calculus survives the move intact, but every one-variable idea acquires a direction.
This chapter weaves three threads from the continuity guide. Geometry and algebra are inseparable — every surface here has a formula and every formula a picture, and we never present one without the other. Approximation is the soul of calculus — the chapter ends, as Chapter 11 did in one variable, by replacing a curved surface with the flat plane that best hugs it. And calculus appears in every quantitative field — our worked applications run from thermodynamics to economics to machine learning.
29.2 Domains and Ranges
A function is not fully specified until you say where its inputs may live.
Domain
The domain is the set of valid inputs — a region of $\mathbb{R}^2$ rather than an interval of $\mathbb{R}$. Finding it is the same game as in one variable: forbid division by zero, negative arguments under even roots, and non-positive arguments of logarithms. The difference is that the answer is now a two-dimensional region, which you should sketch.
- $f(x, y) = \sqrt{x^2 + y^2}$: no restriction; the domain is all of $\mathbb{R}^2$.
- $f(x, y) = \sqrt{1 - x^2 - y^2}$: we need $1 - x^2 - y^2 \ge 0$, i.e. $x^2 + y^2 \le 1$ — the closed unit disk.
- $f(x, y) = \ln(x - y)$: we need $x - y > 0$, i.e. $y < x$ — the open half-plane below the line $y = x$.
- $f(x, y) = \dfrac{1}{x^2 - y}$: we need $y \ne x^2$ — the plane with a parabola removed.
A domain in $\mathbb{R}^2$ can be all of the plane, or a disk, half-plane, strip, or wedge, or the complement of some curve. Whenever you meet a new $f$, sketch its domain first; the geometry of where $f$ is defined often previews the geometry of how $f$ behaves.
Range
The range is the set of attained outputs. For $f(x, y) = x^2 + y^2$ the outputs are all $\ge 0$ and every nonnegative value occurs, so the range is $[0, \infty)$. For $f(x, y) = \sin(x + y)$ the range is $[-1, 1]$, because $x + y$ already sweeps all of $\mathbb{R}$.
Check Your Understanding. Find the domain and range of $f(x, y) = \sqrt{9 - x^2 - y^2}$.
Answer
We need $9 - x^2 - y^2 \ge 0$, i.e. $x^2 + y^2 \le 9$: the closed disk of radius $3$ centered at the origin. The output ranges from $0$ (on the boundary circle) up to $\sqrt 9 = 3$ (at the center), so the range is $[0, 3]$. The graph is the upper hemisphere of radius $3$.
29.3 Graphs of $f(x, y)$ Are Surfaces
The graph of $f$ is the set
$$\{(x, y, f(x, y)) : (x, y) \in \operatorname{dom} f\} \subset \mathbb{R}^3.$$
It is a surface floating above (or below) the $xy$-plane. A handful of basic surfaces appears again and again, and you should know each one by its formula, its name, and its shape:
| Formula | Name | Shape |
|---|---|---|
| $z = ax + by + c$ | plane | flat, tilted |
| $z = x^2 + y^2$ | (circular) paraboloid | a bowl opening up |
| $z = -x^2 - y^2$ | paraboloid | a dome opening down |
| $z = x^2 - y^2$ | hyperbolic paraboloid | a saddle |
| $z = \sqrt{x^2 + y^2}$ | cone | an ice-cream cone, vertex at origin |
| $z = \sqrt{1 - x^2 - y^2}$ | hemisphere | the top half of the unit sphere |
The saddle: the surface with no one-variable analog
Of these, the saddle $z = x^2 - y^2$ deserves special attention, because it exhibits behavior that cannot occur in one variable. Stand at the origin. Walk in the $x$-direction and the height $x^2$ increases — you are climbing. Walk in the $y$-direction and the height $-y^2$ decreases — you are descending. The same point is simultaneously a low point along one path and a high point along another. This is a saddle point, and it is the reason multivariable optimization (Chapter 31) is genuinely harder than the one-variable version: "the derivative is zero" no longer means "max or min."
Geometric Intuition. Read $z = f(x, y)$ as a landscape. The plane is a map; $f(x, y)$ is the altitude above the point $(x, y)$. Then peaks are local maxima, basins are local minima, and a mountain pass — high if you cross it, low if you walk the ridge — is a saddle. Every fact about $f$ in this chapter has a hiking interpretation, and that picture is worth more than any formula.
29.4 Level Curves and Contour Maps
A surface in 3D is hard to draw on a flat page. Cartographers solved this problem centuries ago, and their solution is pure multivariable calculus.
A level curve (or contour) of $f$ is the set of points where $f$ takes one fixed value $c$:
$$\{(x, y) : f(x, y) = c\}.$$
In the landscape picture these are points of equal altitude — exactly the contour lines on a topographic map. A contour map draws several level curves $f = c_1, c_2, c_3, \dots$ at evenly spaced values of $c$, projecting the 3D surface down to an informative 2D plot.
Example: the paraboloid
For $f(x, y) = x^2 + y^2$, the level curve $f = c$ is $x^2 + y^2 = c$. For $c > 0$ this is a circle of radius $\sqrt c$; for $c = 0$ it collapses to the single point at the origin; for $c < 0$ it is empty. The contour map is a family of concentric circles. Crucially, equally spaced heights $c = 1, 2, 3, 4$ give radii $1, \sqrt2, \sqrt3, 2$ — the circles crowd together as you move outward, telling you the bowl gets steeper the farther out you go.
Example: the saddle
For $f(x, y) = x^2 - y^2$, the level curve $x^2 - y^2 = c$ is a hyperbola:
- $c > 0$: hyperbolas opening left and right (crossing the $x$-axis);
- $c < 0$: hyperbolas opening up and down (crossing the $y$-axis);
- $c = 0$: the degenerate case $x^2 = y^2$, i.e. the two lines $y = \pm x$ crossing at the origin.
The contour map shows a characteristic "X" at the origin with hyperbolas fanning into the four regions — the signature of a saddle.
Reading a contour map
The whole power of contour maps is that you can read the surface's steepness and shape without ever lifting it into 3D:
- Closely spaced contours mean the altitude changes fast over a short distance — steep terrain.
- Widely spaced contours mean gentle terrain.
- Nested closed loops mark a peak or a pit (a local max or min) at their center.
- An "X" or cross pattern marks a saddle.
This is exactly how a hiker reads a trail map, and it is no coincidence: the spacing of contours encodes the magnitude of the steepest slope — a quantity we will name the gradient $\nabla f$ in Chapter 30. For now, notice that the contour map already contains the derivative information; Chapter 30 just extracts it.
Common Pitfall. Many students read evenly spaced contours as an evenly sloped surface and stop there. But contour spacing is what carries the steepness, not contour count: where the lines bunch up, the surface is steep even if the labels increase by the same step. On the paraboloid the labels rise uniformly ($1,2,3,4$) yet the rings crowd outward — the surface is steepening, and only the spacing reveals it.
Computational Note. You almost never plot contours by hand. In matplotlib,
plt.contour(X, Y, Z, levels=...)draws labeled level curves from a grid of $f$ values,plt.contourf(...)fills the bands with color, andplt.pcolormesh(...)renders a heatmap. We use all three in Section 29.12.
29.5 Level Surfaces
For $f(x, y, z): \mathbb{R}^3 \to \mathbb{R}$ the graph would need a fourth axis, so we cannot draw it. The contour idea rescues us. A level surface is the set where $f$ equals a constant:
$$\{(x, y, z) : f(x, y, z) = c\}.$$
This is a genuine 2D surface sitting in 3D space — the three-dimensional analog of a level curve.
- $f(x, y, z) = x^2 + y^2 + z^2$: level surfaces are spheres of radius $\sqrt c$ (for $c > 0$).
- $f(x, y, z) = z - x^2 - y^2$: level surfaces $z = x^2 + y^2 + c$ are paraboloids stacked at different heights.
- $f(x, y, z) = x^2 + y^2 - z^2$: level surfaces are hyperboloids — one-sheeted for $c > 0$, the cone for $c = 0$, two-sheeted for $c < 0$.
Real-World Application — Isosurfaces in physics and medicine. A temperature field $T(x, y, z)$ in a room has level surfaces called isotherms: every point on one isotherm is at the same temperature. The same idea names equipotential surfaces in electrostatics, isobars (level surfaces of pressure) in meteorology, and — in a CT or MRI scan — the surfaces of constant tissue density that medical software renders to show the boundary of an organ or tumor. "Level surface" is the mathematician's word for what a radiologist sees on screen.
29.6 Limits in Several Variables — and Why They Are Subtler
Limits power everything in calculus, and they survive the move to several variables — but with a genuinely new wrinkle that you must respect.
The definition
We say
$$\lim_{(x, y) \to (a, b)} f(x, y) = L$$
if for every $\varepsilon > 0$ there is a $\delta > 0$ such that
$$0 < \sqrt{(x - a)^2 + (y - b)^2} < \delta \;\implies\; |f(x, y) - L| < \varepsilon.$$
This is the one-variable $\varepsilon$–$\delta$ definition with one change: the distance $|x - a|$ on the input side becomes the planar distance $\sqrt{(x-a)^2 + (y-b)^2}$ from $(x,y)$ to $(a,b)$. The $\delta$-condition now describes a small disk around $(a, b)$, not a small interval.
The new subtlety: infinitely many paths
In one variable, $x$ can approach $a$ from only two sides — left and right. The whole theory of one-sided limits rests on there being exactly two directions of approach. In two variables, $(x, y)$ can approach $(a, b)$ along infinitely many paths: straight lines from any angle, parabolas, spirals, anything that ends at $(a, b)$.
For the limit to exist, $f$ must approach the same value $L$ along every path. This is a far stronger demand. It is the central new idea of multivariable limits, and it powers both the standard test and the standard trap.
The Key Insight. A two-variable limit exists only if $f$ approaches one common value along every path into the point. Agreement along a few paths proves nothing; disagreement along just two paths disproves existence.
The two-path test: proving a limit does not exist
The cleanest way to show a limit fails is to find two paths that give different answers. The classic specimen is
$$f(x, y) = \frac{2xy}{x^2 + y^2}, \qquad (x, y) \to (0, 0).$$
Approach the origin along several lines:
- Along the $x$-axis ($y = 0$): $f(x, 0) = \dfrac{0}{x^2} = 0$. The path-limit is $0$.
- Along the $y$-axis ($x = 0$): $f(0, y) = \dfrac{0}{y^2} = 0$. The path-limit is $0$.
- Along the diagonal $y = x$: $f(x, x) = \dfrac{2x^2}{2x^2} = 1$. The path-limit is $1$.
The $x$-axis and the diagonal already disagree ($0 \ne 1$), so the limit does not exist. We can see why it is hopeless by approaching along the whole family of lines $y = mx$:
$$f(x, mx) = \frac{2x(mx)}{x^2 + m^2 x^2} = \frac{2m x^2}{(1 + m^2) x^2} = \frac{2m}{1 + m^2}.$$
The $x^2$ cancels, leaving a value that depends on the slope $m$ of the line you came in on: $0$ along the axes ($m = 0$ or $m = \infty$), $1$ along $y = x$ ($m = 1$), and everything between. The function takes a different constant value on each ray through the origin — there is no hope of a single limit.
Common Pitfall. Many students check the limit along the $x$-axis and the $y$-axis, find they agree, and declare the limit exists. They are wrong, and this exact mistake is the most common multivariable-limit error. Agreement on the two axes is necessary but nowhere near sufficient. Always test the diagonal $y = x$ and the family $y = mx$; for stubborn cases, try parabolic paths $y = mx^2$ as well.
Proving a limit does exist: the polar trick
The two-path test can only disprove a limit — finding two matching paths never proves the infinitely many others match. To prove existence you must control $f$ on every path at once. The cleanest tool is to switch to polar coordinates centered at the limit point: write $x = r\cos\theta$, $y = r\sin\theta$, so that $r \to 0$ captures approach from every direction simultaneously. If the result tends to $L$ for all $\theta$ — ideally with the $\theta$-dependence trapped by a factor of $r$ — the limit is $L$.
$$f(x, y) = \frac{x^3 + y^3}{x^2 + y^2}.$$
In polar coordinates,
$$f = \frac{r^3 \cos^3\theta + r^3 \sin^3\theta}{r^2} = r\,(\cos^3\theta + \sin^3\theta).$$
The bracket stays bounded — $|\cos^3\theta + \sin^3\theta| \le 2$ for every $\theta$ — and the leading factor $r \to 0$. So $|f| \le 2r \to 0$ no matter the direction, and the limit is $0$. Here the polar substitution converts "infinitely many paths" into a single variable $r$, which is exactly why it works where the two-path test cannot.
Math Major Sidebar — Polar is a tool, not a theorem. Switching to polar makes the radial approach transparent, but a limit can still fail along a curved path even when every straight ray gives the same value. The function $f(x,y) = \dfrac{x^2 y}{x^4 + y^2}$ tends to $0$ along every line $y = mx$, yet along the parabola $y = x^2$ it gives $\dfrac{x^4}{2x^4} = \tfrac12$. So "constant along all rays" does not imply the limit exists — you still need a genuine bound like $|f| \le g(r) \to 0$, with $g$ independent of $\theta$, to conclude. The squeeze theorem (Chapter 3) is what is really doing the work; polar coordinates just hand you a convenient bounding function.
29.7 Continuity
With limits in hand, continuity reads exactly as in Chapter 4. The function $f(x, y)$ is continuous at $(a, b)$ if
$$\lim_{(x, y) \to (a, b)} f(x, y) = f(a, b)$$
— the limit exists, the function is defined there, and the two agree.
The reassuring news is that the algebra of continuity carries over wholesale. Sums, products, quotients (where the denominator is nonzero), and compositions of continuous functions are continuous. Consequently every polynomial in $x$ and $y$ is continuous on all of $\mathbb{R}^2$, every rational function is continuous wherever its denominator is nonzero, and $\sin$, $\cos$, $e^{(\cdot)}$, and $\ln$ remain continuous on their domains when fed continuous inputs. The pathological path-dependent functions of Section 29.6 are deliberately engineered; the functions that arise from modeling real systems are continuous almost everywhere you look.
Check Your Understanding. Is $f(x, y) = \dfrac{2xy}{x^2 + y^2}$ (with $f(0,0)$ defined to be $0$) continuous at the origin?
Answer
No. We showed in Section 29.6 that $\lim_{(x,y)\to(0,0)} f$ does not exist (it is $0$ along the axes but $1$ along $y = x$). Since the limit fails to exist, $f$ cannot be continuous at $(0,0)$ no matter how we define $f(0,0)$ — there is no value that would make the limit equal $f(0,0)$. Away from the origin $f$ is a quotient of polynomials with nonzero denominator, hence continuous there.
29.8 Partial Derivatives
Now the heart of the chapter. In one variable the derivative was a single limit of a difference quotient. With two inputs, which input do we wiggle? The answer is: one at a time.
The partial derivative of $f$ with respect to $x$ is what you get by holding $y$ fixed and differentiating in $x$ alone:
$$\frac{\partial f}{\partial x} = \lim_{h \to 0} \frac{f(x + h, y) - f(x, y)}{h}.$$
Note that only the first slot changes by $h$; $y$ is frozen as a spectator constant. Symmetrically, the partial with respect to $y$ freezes $x$:
$$\frac{\partial f}{\partial y} = \lim_{h \to 0} \frac{f(x, y + h) - f(x, y)}{h}.$$
Notation
The curly symbol $\partial$ ("partial," or "del" / "round d") signals that other variables are being held fixed. Several notations coexist, all meaning the same thing:
$$\frac{\partial f}{\partial x} = f_x = D_x f = \partial_x f.$$
We use the Leibniz form $\frac{\partial f}{\partial x}$ when emphasizing what is held constant and the compact subscript form $f_x$ when computing.
The procedure is the easiest part
Here is the good news that surprises every student: computing a partial derivative requires no new differentiation rules at all. To find $f_x$, pretend every other variable is a constant and apply the ordinary single-variable rules from Chapter 7.
Worked example. Let $f(x, y) = x^2 y + 3xy^2$.
- For $f_x$, treat $y$ as a constant. Then $x^2 y$ differentiates to $2xy$ (the constant $y$ rides along), and $3xy^2$ differentiates to $3y^2$ (the constant $3y^2$ times the derivative of $x$). So $$f_x = 2xy + 3y^2.$$
- For $f_y$, treat $x$ as a constant. Then $x^2 y$ differentiates to $x^2$, and $3xy^2$ differentiates to $6xy$. So $$f_y = x^2 + 6xy.$$
That is the entire technique: differentiate one variable, freeze the rest.
A second example with a product and a chain rule. Let $g(x, y) = e^{xy}\sin x$. For $g_x$, hold $y$ constant; this is a product in $x$, so $$g_x = \frac{\partial}{\partial x}\!\big(e^{xy}\big)\sin x + e^{xy}\cos x = y\,e^{xy}\sin x + e^{xy}\cos x,$$ where the chain rule supplies the factor $y$ (the derivative of the exponent $xy$ in $x$). For $g_y$, hold $x$ constant; now $\sin x$ is itself a constant and only the exponential moves: $$g_y = x\,e^{xy}\sin x.$$ Notice how differently the two come out — that asymmetry is the whole point of "partial."
Geometric interpretation: slicing the surface
What does $f_x$ mean on the surface? Fix $y = b$. The equation $z = f(x, b)$ is now a function of $x$ alone — it is the curve you get by slicing the surface with the vertical plane $y = b$, a single trail cut across the landscape running in the $x$-direction. Then $f_x(a, b)$ is simply the ordinary slope of that trail at $x = a$. Likewise $f_y(a, b)$ is the slope of the trail cut by the plane $x = a$, running in the $y$-direction.
Geometric Intuition. Stand on the hillside at the point $(a, b)$. Face due east (the $+x$ direction) and the steepness under your feet is $f_x(a, b)$. Turn to face due north (the $+y$ direction) and the steepness is $f_y(a, b)$. These are just two of the infinitely many slopes available — you could face any compass direction and feel a different steepness. Partial derivatives are the east–west and north–south slopes; Chapter 30's directional derivative handles every other heading, and the gradient packages them all at once.
Real-World Application — Marginal analysis in economics. When output depends on labor and capital, $Q(L, K)$, the partial $\partial Q / \partial L$ is the marginal product of labor: the extra output from one more unit of labor with capital held fixed. "Holding the other variable fixed" is not a mathematical convenience here — it is the precise economic question a manager asks ("if I hire one more worker but buy no new machines, how much more do I produce?"). Every "marginal" quantity in economics — marginal cost, marginal utility, marginal rate of substitution — is a partial derivative. We return to this in Section 29.11.
29.9 Higher-Order Partial Derivatives and Clairaut's Theorem
Partial derivatives are themselves functions of $x$ and $y$, so we can differentiate them again. With two first partials, each differentiable in two ways, there are four second-order partial derivatives:
$$f_{xx} = \frac{\partial^2 f}{\partial x^2}, \qquad f_{yy} = \frac{\partial^2 f}{\partial y^2},$$ $$f_{xy} = \frac{\partial}{\partial y}\!\left(\frac{\partial f}{\partial x}\right) = \frac{\partial^2 f}{\partial y\,\partial x}, \qquad f_{yx} = \frac{\partial}{\partial x}\!\left(\frac{\partial f}{\partial y}\right) = \frac{\partial^2 f}{\partial x\,\partial y}.$$
The last two — where the variables differ — are the mixed partials.
Warning — read the subscript order carefully. The subscript notation and the Leibniz notation run in opposite orders. In $f_{xy}$ you differentiate by $x$ first, then $y$, reading left to right. In $\frac{\partial^2 f}{\partial y\,\partial x}$ the rightmost $\partial x$ acts first, reading right to left (inside-out, like nested function application). Both denote the same object: differentiate in $x$, then in $y$. Keeping the two conventions straight prevents a surprising amount of grief.
Clairaut's theorem
Compute both mixed partials of $f(x, y) = x^2 y + 3xy^2$ from Section 29.8:
$$f_x = 2xy + 3y^2 \;\Rightarrow\; f_{xy} = 2x + 6y,$$ $$f_y = x^2 + 6xy \;\Rightarrow\; f_{yx} = 2x + 6y.$$
They are equal. This is not luck. It is the content of a beautiful theorem.
Clairaut's Theorem (equality of mixed partials). If $f_{xy}$ and $f_{yx}$ are both continuous on an open region containing $(a, b)$, then $$f_{xy}(a, b) = f_{yx}(a, b).$$
In words: the order of differentiation does not matter — under continuity, the mixed-partial operators commute. This is a genuinely deep fact. There is no obvious reason that "wiggle in $x$ then in $y$" should agree with "wiggle in $y$ then in $x$," yet for every smooth function it does. The theorem will silently underwrite later results: it is exactly why curl-free gradient fields exist (Chapter 34) and why the second-derivative test has a symmetric Hessian matrix (Chapter 31).
Math Major Sidebar — When Clairaut fails. Continuity of the mixed partials is not decorative; drop it and equality can break. The standard counterexample is $$f(x, y) = \frac{xy(x^2 - y^2)}{x^2 + y^2}, \quad f(0,0) = 0.$$ Computing carefully off and on the axes gives $f_x(0, y) = -y$ and $f_y(x, 0) = x$, so $$f_{xy}(0,0) = \frac{\partial}{\partial y}f_x(0,y)\Big|_{y=0} = -1, \qquad f_{yx}(0,0) = \frac{\partial}{\partial x}f_y(x,0)\Big|_{x=0} = +1.$$ The mixed partials exist everywhere but are discontinuous at the origin, and there $f_{xy} = -1 \ne 1 = f_{yx}$. (You can confirm all four values with sympy.) Such functions are constructed, not encountered — but they show the hypothesis earns its keep.
29.10 The Tangent Plane and Linearization
We now repay a debt to Chapter 11. There, the derivative $f'(a)$ gave the tangent line, and the linearization $L(x) = f(a) + f'(a)(x - a)$ was the best straight-line approximation to a curve near $a$. The multivariable story is the exact analog, with the line replaced by a plane.
The tangent plane
At a point $(a, b)$ on the surface $z = f(x, y)$, the two partials give two slopes: $f_x(a, b)$ in the $x$-direction and $f_y(a, b)$ in the $y$-direction. The unique plane through the surface point $(a, b, f(a, b))$ having those two slopes is the tangent plane:
$$z = f(a, b) + f_x(a, b)(x - a) + f_y(a, b)(y - b).$$
Compare it term for term with the tangent line $y = f(a) + f'(a)(x - a)$ from Chapter 11: the constant $f(a)$ becomes $f(a, b)$, and the single slope-times-displacement term $f'(a)(x-a)$ splits into one term per input direction. The tangent plane is the flat sheet that hugs the surface most closely near $(a, b)$ — the surface's best linear approximation, just as the tangent line was the curve's.
Linearization and the total differential
Calling the right-hand side $L(x, y)$, the linearization of $f$ at $(a, b)$ is
$$L(x, y) = f(a, b) + f_x(a, b)(x - a) + f_y(a, b)(y - b),$$
and for $(x, y)$ near $(a, b)$ we have $f(x, y) \approx L(x, y)$. Writing $\Delta x = x - a$, $\Delta y = y - b$, and $df$ for the resulting change in $f$, the same statement in differential form is the total differential:
$$df = f_x\,dx + f_y\,dy.$$
This says the small change in $f$ is, to first order, the change due to $x$ (rate $f_x$ times step $dx$) plus the change due to $y$ (rate $f_y$ times step $dy$). It is the engine behind error propagation: if two measured quantities carry small errors $dx$ and $dy$, the resulting error in a computed quantity $f$ is approximately $f_x\,dx + f_y\,dy$.
Worked example. Approximate $\sqrt{1.1^2 + 0.9^2}$ — no, let us pick something cleaner. Let $f(x, y) = x^2 + y^2$ and estimate $f(1.1, 0.9)$ from the base point $(1, 1)$. There $f(1,1) = 2$, and $f_x = 2x = 2$, $f_y = 2y = 2$. With $\Delta x = 0.1$, $\Delta y = -0.1$,
$$L(1.1, 0.9) = 2 + 2(0.1) + 2(-0.1) = 2.$$
The exact value is $1.1^2 + 0.9^2 = 1.21 + 0.81 = 2.02$, so the linear approximation $2$ is off by only $0.02$ — and that error is precisely the second-order piece $(\Delta x)^2 + (\Delta y)^2 = 0.02$ that linearization discards. The tangent plane nails the value to first order, exactly as Chapter 11 promised in one dimension. Approximation is the soul of calculus, and the tangent plane is its multivariable face.
Common Pitfall. A linearization is only trustworthy near its base point. Students sometimes compute $L$ at $(a, b)$ and then evaluate it far away — say, using a base point $(1,1)$ to estimate $f(5, 5)$ — and trust the result. Don't. The approximation error grows roughly like the square of the distance from $(a, b)$, so a linearization good to three decimals at distance $0.1$ may be useless at distance $1$. Pick a base point close to where you actually need the value.
29.11 Functions of Three Variables
Everything generalizes cleanly to $f(x, y, z): \mathbb{R}^3 \to \mathbb{R}$ and beyond:
- The domain is a region of $\mathbb{R}^3$; the range is still a set of real numbers.
- Level sets are the level surfaces of Section 29.5.
- There are now three first partials, $f_x$, $f_y$, $f_z$, each found by freezing the other two variables.
- There are nine second-order partials, but Clairaut pairs them up: $f_{xy} = f_{yx}$, $f_{xz} = f_{zx}$, $f_{yz} = f_{zy}$ (under continuity), leaving six distinct values.
The picture to hold is a temperature field $T(x, y, z)$ filling a room. The partial $T_x$ is how fast the temperature rises as you step east at a point; the level surfaces are the isotherms; and the tangent-plane idea becomes a tangent hyperplane — still the best linear approximation, now one dimension up.
29.12 Computation: Surfaces, Contours, and Partials in Python
Our standard three-tier pattern — state it, do it by hand, check it by machine — extends naturally. First, visualize: a single function deserves both its 3D surface and its 2D contour map side by side, because each view reveals what the other hides.
# Plot z = x^2 - y^2 (a saddle) as a 3D surface AND its contour map.
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D # noqa: F401 (registers 3d projection)
x = np.linspace(-3, 3, 60)
y = np.linspace(-3, 3, 60)
X, Y = np.meshgrid(x, y)
Z = X**2 - Y**2 # the saddle surface
fig = plt.figure(figsize=(12, 5))
ax1 = fig.add_subplot(121, projection="3d")
ax1.plot_surface(X, Y, Z, cmap="viridis")
ax1.set_title("Surface z = x^2 - y^2") # a saddle: up in x, down in y
ax2 = fig.add_subplot(122)
cs = ax2.contour(X, Y, Z, levels=12) # level curves x^2 - y^2 = c
ax2.clabel(cs, inline=True, fontsize=8)
ax2.set_aspect("equal")
ax2.set_title("Contour map (hyperbolas + 'X' at origin)")
plt.tight_layout()
plt.show()
# Left: the saddle surface. Right: hyperbolic level curves with the
# tell-tale 'X' (the c = 0 lines y = +/-x) crossing at the origin.
The contour map on the right shows exactly the hyperbolas of Section 29.4, with the degenerate cross at the center — the 2D fingerprint of a saddle. Swap in Z = X**2 + Y**2 and the right panel becomes the concentric circles of the paraboloid instead.
Second, verify the calculus. Partial derivatives and Clairaut's theorem are perfect jobs for sympy, which differentiates symbolically and will confirm our hand computations from Sections 29.8–29.9 to the letter.
# Verify the partials of f = x^2*y + 3*x*y^2 and Clairaut's theorem.
import sympy as sp
x, y = sp.symbols("x y")
f = x**2 * y + 3 * x * y**2
print("f_x =", sp.diff(f, x)) # f_x = 2*x*y + 3*y**2
print("f_y =", sp.diff(f, y)) # f_y = x**2 + 6*x*y
print("f_xy =", sp.diff(f, x, y)) # f_xy = 2*x + 6*y (x first, then y)
print("f_yx =", sp.diff(f, y, x)) # f_yx = 2*x + 6*y (equal -> Clairaut)
print("equal?", sp.simplify(sp.diff(f, x, y) - sp.diff(f, y, x)) == 0) # True
Computational Note. In
sp.diff(f, x, y)the variables are consumed left to right: differentiate byxfirst, then byy. That matches the subscript order $f_{xy}$ but is the reverse of the Leibniz denominator order $\partial^2 f / \partial y\,\partial x$ (Section 29.9). Because Clairaut holds for every polynomial,sp.diff(f, x, y)andsp.diff(f, y, x)return identical expressions here — but the habit of tracking order matters the moment you meet a non-smooth function like the Sidebar's counterexample, where sympy's limit machinery would reveal the two are different.
29.13 Applications Across Fields
Multivariable functions are where calculus meets the working sciences, because almost nothing in the real world depends on a single input. Here are three readings of the same machinery.
Economics: the Cobb–Douglas production function
A firm's output as a function of labor $L$ and capital $K$ is often modeled as
$$Q(L, K) = A\,L^{\alpha} K^{\beta}, \qquad A > 0,\; \alpha, \beta > 0.$$
The partials are the marginal products:
$$Q_L = \alpha A\,L^{\alpha - 1} K^{\beta} = \frac{\alpha Q}{L}, \qquad Q_K = \beta A\,L^{\alpha} K^{\beta - 1} = \frac{\beta Q}{K}.$$
So $Q_L$ is the extra output from one more unit of labor at fixed capital, and $Q_K$ the analog for capital. The level curves $Q(L, K) = c$ are the isoquants — combinations of labor and capital that produce the same output — and a firm minimizing cost slides along an isoquant until it touches the cheapest budget line. With $A = 1$, $\alpha = 0.3$, $\beta = 0.7$, at $(L, K) = (100, 100)$ we get $Q_L = 0.3 \cdot 100^{0.3} \cdot 100^{0.7} / 100 = 0.3$, a marginal product of labor of $0.3$ units of output per worker.
Thermodynamics: the ideal gas law
For a fixed amount of gas, pressure depends on volume and temperature:
$$P(V, T) = \frac{nRT}{V}.$$
Its partials encode the physics:
$$\frac{\partial P}{\partial V} = -\frac{nRT}{V^2} < 0 \quad(\text{compress the gas and pressure rises}),$$ $$\frac{\partial P}{\partial T} = \frac{nR}{V} > 0 \quad(\text{heat the gas and pressure rises}).$$
The signs alone tell the story an engineer needs: pressure falls with expanding volume and climbs with rising temperature — the basis of every engine and refrigeration cycle. The level surfaces $P = c$ in $(V, T)$ space are the constant-pressure (isobaric) curves a thermodynamics text plots.
Data science: loss surfaces and gradient descent
This is the anchor example threaded through the whole book (introduced in Chapter 6). A machine-learning model with parameters $w_1, w_2, \dots, w_n$ has a loss function $L(w_1, \dots, w_n)$ measuring how badly it fits the data. Training means minimizing $L$ — finding the lowest point of a surface in $n$-dimensional parameter space, where $n$ may be in the millions.
The algorithm computes every partial $\partial L / \partial w_i$ and steps each weight downhill against it. Those partials are precisely the objects of this chapter, and Chapter 30 will bundle them into the gradient $\nabla L$ — the single vector that points uphill fastest, whose negative is the direction gradient descent travels. Everything modern AI does to fit a model is partial differentiation at scale. The thread that began as "the derivative tells you which way to step" in Chapter 6 reaches its multivariable climax in Chapter 30.
Add to Your Modeling Portfolio. Introduce a two-input function into your model, sketch its contour map, and compute both partial derivatives, interpreting each as a marginal rate. Biology: model a growth rate $g(N, F)$ as a function of population $N$ and available food $F$; interpret $g_N$ and $g_F$. Economics: build a Cobb–Douglas output $Q(L, K)$ or utility $U(x, y)$; compute the marginal products (or marginal utilities) and sketch one isoquant (or indifference curve). Physics: write the gravitational potential $\Phi(x, y, z) = -GM/\sqrt{x^2+y^2+z^2}$; describe its level surfaces and compute $\partial \Phi/\partial x$. Data Science: define a two-parameter loss $L(w_0, w_1)$ for a linear fit (mean squared error), plot its bowl-shaped contour map, and compute $\partial L/\partial w_0$ and $\partial L/\partial w_1$ — the exact quantities your gradient-descent code will use in Chapter 30.
29.14 The Geometric Picture: Features of a Surface
Pulling the chapter together, the surface $z = f(x, y)$ has a small vocabulary of features, and partial derivatives are the tools that detect them:
- Peaks (local maxima) and pits (local minima): the highest and lowest points of their neighborhood.
- Saddle points: critical points that are neither — up in one direction, down in another (Section 29.3).
- Ridges and valleys: lines along which the surface is locally highest or lowest in one direction.
At any peak, pit, or saddle the surface is momentarily level in both axis directions, so
$$f_x = 0 \quad\text{and}\quad f_y = 0$$
— the multivariable echo of "$f'(a) = 0$ at a critical point" from Chapter 9. But $f_x = f_y = 0$ does not by itself distinguish a peak from a pit from a saddle, because (as the saddle taught us) zero slopes in the two axis directions say nothing about the diagonal directions. Sorting them out requires the second partials assembled into the second-derivative test — the central business of Chapter 31, multivariable optimization, where Clairaut's symmetry from this chapter guarantees the test matrix is well behaved.
Looking Ahead
You now have the objects of multivariable calculus — surfaces, contours, level surfaces, limits, continuity, partial derivatives, and the tangent plane — and you can compute and visualize all of them. Three large doors open from here.
Chapter 30 introduces the gradient $\nabla f = (f_x, f_y)$, the vector that collects the partial derivatives and points in the direction of steepest ascent, and the directional derivative $D_{\mathbf{u}} f$, which gives the slope in any direction — finally answering the "you could face any compass heading" promise of Section 29.8. It also extends the chain rule to several variables, tracking how change propagates through chains of dependencies. This is where the gradient-descent thread from Chapter 6 reaches machine learning in full.
Chapter 31 uses these tools to optimize: finding and classifying peaks, pits, and saddles with the second-derivative test, and handling constraints with Lagrange multipliers.
Chapters 32 and 33 turn from differentiating multivariable functions to integrating them — double and triple integrals, accumulating a function over a region of the plane or space, the multivariable sequel to the definite integral of Chapter 13. The Fundamental Theorem of Calculus from Chapter 14 will reappear, generalized, in Chapters 35–38.
Reflection
The leap from one variable to several is the leap from a line to a landscape. The ideas you built over twenty-eight chapters do not break under the strain — limits, continuity, derivatives, and linear approximation all survive — but each acquires a direction, and that single word reorganizes everything. A "limit" must now agree along infinitely many paths. A "derivative" splits into a slope for every heading, of which the partials capture two. And the tangent line that approximated a curve becomes the tangent plane that approximates a surface, carrying the soul of calculus — approximation with precision — into higher dimensions. Hold the landscape picture firmly: in the next chapter we learn to find, at any point on it, the single direction that climbs the fastest.