Chapter 30 — Quiz

Q: Let with and . Use the Case 1 chain rule to find .

. So . Check: , so . ✓ (§30.2, Case 1.)

Q: A common error is to write for . What essential factors are missing, and why?

Each term must be multiplied by the inner derivative: . The missing factors and encode how fast each intermediate variable is moving. Dropping them is the multivariable cousin of forgetting the chain-rule factor in single-variable calculus (Chapter 7). (§30.2, Common Pitfall.)

Q: For at , find the directional derivative in the direction of .

Normalize: , so . . Then . (§30.5. Forgetting to normalize gives the wrong answer .)

Q: At a point where , in what direction does increase fastest, and what is the maximum rate of increase?

Fastest increase is along itself; as a unit direction, . The maximum rate is . (§30.4 Superpower 1 / §30.5.)

Q: True or false: at any point, is tangent to the level curve through that point. Explain.

False. is perpendicular (normal) to the level curve, not tangent. Moving along a level curve keeps constant, so the rate of change is zero there; the direction of fastest change must be perpendicular to the direction of no change. (§30.4 Superpower 2; proved via when .)

Q: Find the tangent plane to the sphere at the point .

Set ; at . Plane: . (§30.6. The normal points radially outward, as it must.)

Q: In gradient descent, the update is . Why the *minus* sign, and what role does play?

The minus sign sends each step in the direction , which is the direction of steepest descent (fastest decrease of ), so the function goes down. is the learning rate (step size): too small means glacial convergence, too large means overshoot and divergence. (§30.8.)

Q: For , you stand at with . Compute the result of one gradient-descent step.

at . Step: . Check : , — decreased. ✓ (§30.8.)

DataField.Dev

Chapter 30 — Quiz

10 questions covering the multivariable chain rule, the gradient, directional derivatives, steepest ascent, tangent planes, and gradient descent. Work each by hand, then reveal the answer.

1. Let $z = x^2 y$ with $x = t$ and $y = t^2$. Use the Case 1 chain rule to find $\dfrac{dz}{dt}$.

Answer

$f_x = 2xy,\ f_y = x^2,\ \frac{dx}{dt}=1,\ \frac{dy}{dt}=2t$. So $\frac{dz}{dt} = 2xy(1) + x^2(2t) = 2t\cdot t^2 + t^2\cdot 2t = 2t^3 + 2t^3 = 4t^3$. Check: $z = t^2\cdot t^2 = t^4$, so $\frac{dz}{dt} = 4t^3$. ✓ (§30.2, Case 1.)

2. A common error is to write $\dfrac{dz}{dt} = \dfrac{\partial f}{\partial x} + \dfrac{\partial f}{\partial y}$ for $z = f(x(t), y(t))$. What essential factors are missing, and why?

Answer

Each term must be multiplied by the inner derivative: $\frac{dz}{dt} = f_x\frac{dx}{dt} + f_y\frac{dy}{dt}$. The missing factors $\frac{dx}{dt}$ and $\frac{dy}{dt}$ encode *how fast each intermediate variable is moving*. Dropping them is the multivariable cousin of forgetting the chain-rule factor in single-variable calculus (Chapter 7). (§30.2, Common Pitfall.)

3. Compute the gradient $\nabla f$ for $f(x, y) = x^2 + 3xy + y^2$ at the point $(1, 2)$.

Answer

$f_x = 2x + 3y$, $f_y = 3x + 2y$. At $(1,2)$: $f_x = 2 + 6 = 8$, $f_y = 3 + 4 = 7$. So $\nabla f(1,2) = \langle 8, 7\rangle$. (§30.3.)

4. For $f(x, y) = x^2 + y^2$ at $(1, 1)$, find the directional derivative in the direction of $\langle 3, 4\rangle$.

Answer

Normalize: $\|\langle 3,4\rangle\| = 5$, so $\mathbf{u} = \langle 3/5, 4/5\rangle$. $\nabla f(1,1) = \langle 2, 2\rangle$. Then $D_{\mathbf{u}}f = \langle 2,2\rangle\cdot\langle 3/5,4/5\rangle = \tfrac{6}{5} + \tfrac{8}{5} = \tfrac{14}{5} = 2.8$. (§30.5. Forgetting to normalize gives the wrong answer $14$.)

5. At a point where $\nabla f = \langle 3, -4\rangle$, in what direction does $f$ increase fastest, and what is the maximum rate of increase?

Answer

Fastest increase is along $\nabla f = \langle 3, -4\rangle$ itself; as a unit direction, $\tfrac{1}{5}\langle 3,-4\rangle$. The maximum rate is $\|\nabla f\| = \sqrt{9 + 16} = 5$. (§30.4 Superpower 1 / §30.5.)

6. True or false: at any point, $\nabla f$ is tangent to the level curve $f = c$ through that point. Explain.

Answer

**False.** $\nabla f$ is **perpendicular** (normal) to the level curve, not tangent. Moving *along* a level curve keeps $f$ constant, so the rate of change is zero there; the direction of *fastest* change must be perpendicular to the direction of *no* change. (§30.4 Superpower 2; proved via $D_{\mathbf{u}}f = \|\nabla f\|\cos\theta = 0$ when $\theta = \pi/2$.)

7. Find the tangent plane to the sphere $x^2 + y^2 + z^2 = 6$ at the point $(1, 1, 2)$.

Answer

Set $F = x^2+y^2+z^2-6$; $\nabla F = \langle 2x,2y,2z\rangle = \langle 2,2,4\rangle$ at $(1,1,2)$. Plane: $2(x-1)+2(y-1)+4(z-2)=0 \Rightarrow 2x+2y+4z = 12 \Rightarrow x + y + 2z = 6$. (§30.6. The normal $\langle 2,2,4\rangle \parallel \langle 1,1,2\rangle$ points radially outward, as it must.)

8. In gradient descent, the update is $\mathbf{x}_{k+1} = \mathbf{x}_k - \eta\,\nabla f(\mathbf{x}_k)$. Why the minus sign, and what role does $\eta$ play?

Answer

The minus sign sends each step in the direction $-\nabla f$, which is the direction of **steepest descent** (fastest decrease of $f$), so the function goes down. $\eta > 0$ is the **learning rate** (step size): too small means glacial convergence, too large means overshoot and divergence. (§30.8.)

9. For $f(x, y) = x^2 + 4y^2$, you stand at $(2, 1)$ with $\eta = 0.1$. Compute the result of one gradient-descent step.

Answer

$\nabla f = \langle 2x, 8y\rangle = \langle 4, 8\rangle$ at $(2,1)$. Step: $\mathbf{x}_1 = (2,1) - 0.1\langle 4,8\rangle = (2 - 0.4,\ 1 - 0.8) = (1.6,\ 0.2)$. Check $f$: $f(2,1)=4+4=8$, $f(1.6,0.2)=2.56+0.16=2.72$ — decreased. ✓ (§30.8.)

10. Explain in one or two sentences how the multivariable chain rule (§30.2) and the gradient (§30.3) together make training a neural network possible.

Answer

Training minimizes a loss $L(\boldsymbol\theta)$ over the parameters by gradient descent: $\boldsymbol\theta \leftarrow \boldsymbol\theta - \eta\nabla L$. The gradient $\nabla L$ (one partial per parameter) is computed by **backpropagation**, which is the reverse-mode multivariable chain rule applied to the network's computational graph. Chain rule produces the gradient; the gradient drives the descent. (§30.8–30.9.)

Scoring Guide

Score	Interpretation
9–10	Mastery. You can compute gradients and directional derivatives fluently, reason geometrically about level sets and steepest ascent, and explain gradient descent. Move on to Chapter 31.
7–8	Solid. Re-check the one or two ideas you missed — most often normalization (Q4) or the gradient-is-normal fact (Q6).
5–6	Partial. Re-read §30.3–30.5 and redo Exercises 14–29 before continuing.
0–4	Revisit the chapter from §30.2. Focus on the worked examples and the Check Your Understanding boxes, then retry this quiz.