Chapter 30 — Quiz

10 questions covering the multivariable chain rule, the gradient, directional derivatives, steepest ascent, tangent planes, and gradient descent. Work each by hand, then reveal the answer.


1. Let $z = x^2 y$ with $x = t$ and $y = t^2$. Use the Case 1 chain rule to find $\dfrac{dz}{dt}$.

Answer $f_x = 2xy,\ f_y = x^2,\ \frac{dx}{dt}=1,\ \frac{dy}{dt}=2t$. So $\frac{dz}{dt} = 2xy(1) + x^2(2t) = 2t\cdot t^2 + t^2\cdot 2t = 2t^3 + 2t^3 = 4t^3$. Check: $z = t^2\cdot t^2 = t^4$, so $\frac{dz}{dt} = 4t^3$. ✓ (§30.2, Case 1.)

2. A common error is to write $\dfrac{dz}{dt} = \dfrac{\partial f}{\partial x} + \dfrac{\partial f}{\partial y}$ for $z = f(x(t), y(t))$. What essential factors are missing, and why?

Answer Each term must be multiplied by the inner derivative: $\frac{dz}{dt} = f_x\frac{dx}{dt} + f_y\frac{dy}{dt}$. The missing factors $\frac{dx}{dt}$ and $\frac{dy}{dt}$ encode *how fast each intermediate variable is moving*. Dropping them is the multivariable cousin of forgetting the chain-rule factor in single-variable calculus (Chapter 7). (§30.2, Common Pitfall.)

3. Compute the gradient $\nabla f$ for $f(x, y) = x^2 + 3xy + y^2$ at the point $(1, 2)$.

Answer $f_x = 2x + 3y$, $f_y = 3x + 2y$. At $(1,2)$: $f_x = 2 + 6 = 8$, $f_y = 3 + 4 = 7$. So $\nabla f(1,2) = \langle 8, 7\rangle$. (§30.3.)

4. For $f(x, y) = x^2 + y^2$ at $(1, 1)$, find the directional derivative in the direction of $\langle 3, 4\rangle$.

Answer Normalize: $\|\langle 3,4\rangle\| = 5$, so $\mathbf{u} = \langle 3/5, 4/5\rangle$. $\nabla f(1,1) = \langle 2, 2\rangle$. Then $D_{\mathbf{u}}f = \langle 2,2\rangle\cdot\langle 3/5,4/5\rangle = \tfrac{6}{5} + \tfrac{8}{5} = \tfrac{14}{5} = 2.8$. (§30.5. Forgetting to normalize gives the wrong answer $14$.)

5. At a point where $\nabla f = \langle 3, -4\rangle$, in what direction does $f$ increase fastest, and what is the maximum rate of increase?

Answer Fastest increase is along $\nabla f = \langle 3, -4\rangle$ itself; as a unit direction, $\tfrac{1}{5}\langle 3,-4\rangle$. The maximum rate is $\|\nabla f\| = \sqrt{9 + 16} = 5$. (§30.4 Superpower 1 / §30.5.)

6. True or false: at any point, $\nabla f$ is tangent to the level curve $f = c$ through that point. Explain.

Answer **False.** $\nabla f$ is **perpendicular** (normal) to the level curve, not tangent. Moving *along* a level curve keeps $f$ constant, so the rate of change is zero there; the direction of *fastest* change must be perpendicular to the direction of *no* change. (§30.4 Superpower 2; proved via $D_{\mathbf{u}}f = \|\nabla f\|\cos\theta = 0$ when $\theta = \pi/2$.)

7. Find the tangent plane to the sphere $x^2 + y^2 + z^2 = 6$ at the point $(1, 1, 2)$.

Answer Set $F = x^2+y^2+z^2-6$; $\nabla F = \langle 2x,2y,2z\rangle = \langle 2,2,4\rangle$ at $(1,1,2)$. Plane: $2(x-1)+2(y-1)+4(z-2)=0 \Rightarrow 2x+2y+4z = 12 \Rightarrow x + y + 2z = 6$. (§30.6. The normal $\langle 2,2,4\rangle \parallel \langle 1,1,2\rangle$ points radially outward, as it must.)

8. In gradient descent, the update is $\mathbf{x}_{k+1} = \mathbf{x}_k - \eta\,\nabla f(\mathbf{x}_k)$. Why the minus sign, and what role does $\eta$ play?

Answer The minus sign sends each step in the direction $-\nabla f$, which is the direction of **steepest descent** (fastest decrease of $f$), so the function goes down. $\eta > 0$ is the **learning rate** (step size): too small means glacial convergence, too large means overshoot and divergence. (§30.8.)

9. For $f(x, y) = x^2 + 4y^2$, you stand at $(2, 1)$ with $\eta = 0.1$. Compute the result of one gradient-descent step.

Answer $\nabla f = \langle 2x, 8y\rangle = \langle 4, 8\rangle$ at $(2,1)$. Step: $\mathbf{x}_1 = (2,1) - 0.1\langle 4,8\rangle = (2 - 0.4,\ 1 - 0.8) = (1.6,\ 0.2)$. Check $f$: $f(2,1)=4+4=8$, $f(1.6,0.2)=2.56+0.16=2.72$ — decreased. ✓ (§30.8.)

10. Explain in one or two sentences how the multivariable chain rule (§30.2) and the gradient (§30.3) together make training a neural network possible.

Answer Training minimizes a loss $L(\boldsymbol\theta)$ over the parameters by gradient descent: $\boldsymbol\theta \leftarrow \boldsymbol\theta - \eta\nabla L$. The gradient $\nabla L$ (one partial per parameter) is computed by **backpropagation**, which is the reverse-mode multivariable chain rule applied to the network's computational graph. Chain rule produces the gradient; the gradient drives the descent. (§30.8–30.9.)

Scoring Guide

Score Interpretation
9–10 Mastery. You can compute gradients and directional derivatives fluently, reason geometrically about level sets and steepest ascent, and explain gradient descent. Move on to Chapter 31.
7–8 Solid. Re-check the one or two ideas you missed — most often normalization (Q4) or the gradient-is-normal fact (Q6).
5–6 Partial. Re-read §30.3–30.5 and redo Exercises 14–29 before continuing.
0–4 Revisit the chapter from §30.2. Focus on the worked examples and the Check Your Understanding boxes, then retry this quiz.