Chapter 30 — Quiz
10 questions covering the multivariable chain rule, the gradient, directional derivatives, steepest ascent, tangent planes, and gradient descent. Work each by hand, then reveal the answer.
1. Let $z = x^2 y$ with $x = t$ and $y = t^2$. Use the Case 1 chain rule to find $\dfrac{dz}{dt}$.
Answer
$f_x = 2xy,\ f_y = x^2,\ \frac{dx}{dt}=1,\ \frac{dy}{dt}=2t$. So $\frac{dz}{dt} = 2xy(1) + x^2(2t) = 2t\cdot t^2 + t^2\cdot 2t = 2t^3 + 2t^3 = 4t^3$. Check: $z = t^2\cdot t^2 = t^4$, so $\frac{dz}{dt} = 4t^3$. ✓ (§30.2, Case 1.)2. A common error is to write $\dfrac{dz}{dt} = \dfrac{\partial f}{\partial x} + \dfrac{\partial f}{\partial y}$ for $z = f(x(t), y(t))$. What essential factors are missing, and why?
Answer
Each term must be multiplied by the inner derivative: $\frac{dz}{dt} = f_x\frac{dx}{dt} + f_y\frac{dy}{dt}$. The missing factors $\frac{dx}{dt}$ and $\frac{dy}{dt}$ encode *how fast each intermediate variable is moving*. Dropping them is the multivariable cousin of forgetting the chain-rule factor in single-variable calculus (Chapter 7). (§30.2, Common Pitfall.)3. Compute the gradient $\nabla f$ for $f(x, y) = x^2 + 3xy + y^2$ at the point $(1, 2)$.
Answer
$f_x = 2x + 3y$, $f_y = 3x + 2y$. At $(1,2)$: $f_x = 2 + 6 = 8$, $f_y = 3 + 4 = 7$. So $\nabla f(1,2) = \langle 8, 7\rangle$. (§30.3.)4. For $f(x, y) = x^2 + y^2$ at $(1, 1)$, find the directional derivative in the direction of $\langle 3, 4\rangle$.
Answer
Normalize: $\|\langle 3,4\rangle\| = 5$, so $\mathbf{u} = \langle 3/5, 4/5\rangle$. $\nabla f(1,1) = \langle 2, 2\rangle$. Then $D_{\mathbf{u}}f = \langle 2,2\rangle\cdot\langle 3/5,4/5\rangle = \tfrac{6}{5} + \tfrac{8}{5} = \tfrac{14}{5} = 2.8$. (§30.5. Forgetting to normalize gives the wrong answer $14$.)5. At a point where $\nabla f = \langle 3, -4\rangle$, in what direction does $f$ increase fastest, and what is the maximum rate of increase?
Answer
Fastest increase is along $\nabla f = \langle 3, -4\rangle$ itself; as a unit direction, $\tfrac{1}{5}\langle 3,-4\rangle$. The maximum rate is $\|\nabla f\| = \sqrt{9 + 16} = 5$. (§30.4 Superpower 1 / §30.5.)6. True or false: at any point, $\nabla f$ is tangent to the level curve $f = c$ through that point. Explain.
Answer
**False.** $\nabla f$ is **perpendicular** (normal) to the level curve, not tangent. Moving *along* a level curve keeps $f$ constant, so the rate of change is zero there; the direction of *fastest* change must be perpendicular to the direction of *no* change. (§30.4 Superpower 2; proved via $D_{\mathbf{u}}f = \|\nabla f\|\cos\theta = 0$ when $\theta = \pi/2$.)7. Find the tangent plane to the sphere $x^2 + y^2 + z^2 = 6$ at the point $(1, 1, 2)$.
Answer
Set $F = x^2+y^2+z^2-6$; $\nabla F = \langle 2x,2y,2z\rangle = \langle 2,2,4\rangle$ at $(1,1,2)$. Plane: $2(x-1)+2(y-1)+4(z-2)=0 \Rightarrow 2x+2y+4z = 12 \Rightarrow x + y + 2z = 6$. (§30.6. The normal $\langle 2,2,4\rangle \parallel \langle 1,1,2\rangle$ points radially outward, as it must.)8. In gradient descent, the update is $\mathbf{x}_{k+1} = \mathbf{x}_k - \eta\,\nabla f(\mathbf{x}_k)$. Why the minus sign, and what role does $\eta$ play?
Answer
The minus sign sends each step in the direction $-\nabla f$, which is the direction of **steepest descent** (fastest decrease of $f$), so the function goes down. $\eta > 0$ is the **learning rate** (step size): too small means glacial convergence, too large means overshoot and divergence. (§30.8.)9. For $f(x, y) = x^2 + 4y^2$, you stand at $(2, 1)$ with $\eta = 0.1$. Compute the result of one gradient-descent step.
Answer
$\nabla f = \langle 2x, 8y\rangle = \langle 4, 8\rangle$ at $(2,1)$. Step: $\mathbf{x}_1 = (2,1) - 0.1\langle 4,8\rangle = (2 - 0.4,\ 1 - 0.8) = (1.6,\ 0.2)$. Check $f$: $f(2,1)=4+4=8$, $f(1.6,0.2)=2.56+0.16=2.72$ — decreased. ✓ (§30.8.)10. Explain in one or two sentences how the multivariable chain rule (§30.2) and the gradient (§30.3) together make training a neural network possible.
Answer
Training minimizes a loss $L(\boldsymbol\theta)$ over the parameters by gradient descent: $\boldsymbol\theta \leftarrow \boldsymbol\theta - \eta\nabla L$. The gradient $\nabla L$ (one partial per parameter) is computed by **backpropagation**, which is the reverse-mode multivariable chain rule applied to the network's computational graph. Chain rule produces the gradient; the gradient drives the descent. (§30.8–30.9.)Scoring Guide
| Score | Interpretation |
|---|---|
| 9–10 | Mastery. You can compute gradients and directional derivatives fluently, reason geometrically about level sets and steepest ascent, and explain gradient descent. Move on to Chapter 31. |
| 7–8 | Solid. Re-check the one or two ideas you missed — most often normalization (Q4) or the gradient-is-normal fact (Q6). |
| 5–6 | Partial. Re-read §30.3–30.5 and redo Exercises 14–29 before continuing. |
| 0–4 | Revisit the chapter from §30.2. Focus on the worked examples and the Check Your Understanding boxes, then retry this quiz. |