Skip to main content
\(\renewcommand{\chaptername}{Unit} \newcommand{\derivativehomeworklink}[1]{\href{http://db.tt/cSeKG8XO}{#1}} \newcommand{\chpname}{unit} \newcommand{\sageurlforcurvature}{http://bmw.byuimath.com/dokuwiki/doku.php?id=curvature_calculator} \newcommand{\uday}{ \LARGE Day \theunitday \normalsize \flushleft \stepcounter{unitday} } \newcommand{\sageworkurl}{http://bmw.byuimath.com/dokuwiki/doku.php?id=work_calculator} \newcommand{\sagefluxurl}{http://bmw.byuimath.com/dokuwiki/doku.php?id=flux_calculator} \newcommand{\sageworkfluxurl}{http://bmw.byuimath.com/dokuwiki/doku.php?id=both_flux_and_work} \newcommand{\sagelineintegral}{http://bmw.byuimath.com/dokuwiki/doku.php?id=line_integral_calculator} \newcommand{\sagephysicalpropertiestwod}{http://bmw.byuimath.com/dokuwiki/doku.php?id=physical_properties_in_2d} \newcommand{\sagephysicalpropertiesthreed}{http://bmw.byuimath.com/dokuwiki/doku.php?id=physical_properties_in_3d} \newcommand{\sageDoubleIntegralCheckerURL}{http://bmw.byuimath.com/dokuwiki/doku.php?id=double_integral_calculator} \newcommand{\myscale}{1} \newcommand{\ds}{\displaystyle} \newcommand{\dfdx}[1]{\frac{d#1}{dx}} \newcommand{\ddx}{\frac{d}{dx}} \newcommand{\ii}{\vec \imath} \newcommand{\jj}{\vec \jmath} \newcommand{\kk}{\vec k} \newcommand{\vv}{\mathbf{v}} \newcommand{\RR}{\mathbb{R}} \newcommand{\R}{ \mathbb{R}} \newcommand{\inv}{^{-1}} \newcommand{\im}{\text{im }} \newcommand{\colvec}[1]{\begin{bmatrix}#1\end{bmatrix} } \newcommand{\cl}[1]{ \begin{matrix} #1 \end{matrix} } \newcommand{\bm}[1]{ \begin{bmatrix} #1 \end{bmatrix} } \DeclareMathOperator{\rank}{rank} \DeclareMathOperator{\rref}{rref} \DeclareMathOperator{\vspan}{span} \DeclareMathOperator{\trace}{tr} \DeclareMathOperator{\proj}{proj} \DeclareMathOperator{\curl}{curl} \newcommand{\blank}[1]{[14pt]{\rule{#1}{1pt}}} \newcommand{\vp}{^{\,\prime}} \newcommand{\lt}{<} \newcommand{\gt}{>} \newcommand{\amp}{&} \)

Section7.4The Chain Rule

Objectives

In this section you will learn how to...

  • Compute partial and total derivatives of multivariable and vector functions:

    • Find derivatives of composite functions, using the chain rule (matrix multiplication).

We'll now see how the chain rule generalizes to all dimensions. Just as before, we'll find that the first semester calculus rule will generalize to all dimensions, if we replace \(f'\) with the matrix \(Df\text{.}\) Let's recall the chain rule from first-semester calculus.

Some people remember the theorem above as “the derivative of a composition is the derivative of the outside (evaluated at the inside) multiplied by the derivative of the inside.” If \(u=g(x)\text{,}\) we sometimes write \(\ds \frac{df}{dx}=\frac{df}{du}\frac{du}{dx}\text{.}\) The following exercise should help us master this notation.

Review7.4.1

Suppose we know that \(\ds f'(x) = \frac{\sin(x)}{2x^2+3}\) and \(g(x)=\sqrt{x^2+1}\text{.}\) Notice we don't know \(f(x)\text{.}\)

Not knowing a function \(f\) is actually quite common in real life. We can often measure how something changes (a derivative) without knowing the function itself.

(a)

State \(f'(x)\) and \(g'(x)\text{.}\)

(b)

State \(f'(g(x))\text{,}\) and explain the difference between \(f'(x)\) and \(f'(g(x))\text{.}\)

(c)

Use the chain rule to compute \((f\circ g)'(x)\text{.}\)

Subsection7.4.1Higher Dimensional Chain Rule

We now generalize to higher dimensions. If I want to write \(\vec f(\vec g(\vec x))\text{,}\) then \(\vec x\) must be a vector in the domain of \(g\text{.}\) After computing \(\vec g(\vec x)\text{,}\) we must get a vector that is in the domain of \(f\text{.}\)

Since the chain rule in first semester calculus states \((f(g(x))'=f'(g(x))g'(x)\text{,}\) then in high dimension it should state \(D(f(g(x)) = Df(g(x))Dg(x)\text{,}\) the product of two matrices.

Exercise7.4.2

In Exercise 7.2.1, we showed that for a circular cylinder with volume \(V=\pi r^2 h\text{,}\) the derivative is

\begin{equation*} DV(r,h)=\begin{bmatrix}2\pi rh \amp \pi r^2 \end{bmatrix} . \end{equation*}

Recall that to get this derivative we assumed that the radius and height are both changing with respect to time. Now we actually use functions for each letting \(r=3t\) and \(h=t^2\text{.}\) We'll write this parametrically as \(\vec x (t) = (r,h)(t) = (3t, t^2)\text{.}\)

(a)

In \(V=\pi r^2 h\text{,}\) replace \(r\) and \(h\) with what they are in terms of \(t\text{.}\)

(b)

Compute \(\dfrac{dV}{dt}\text{.}\)

(c)

Find the derivative of \(\vec x (t)\text{,}\) i.e. the derivative of \((r,h)(t)\text{.}\)

Hint

The output should be a 2x1 matrix.

(d)

We know \(DV(r,h)\) and \(D(r,h)(t)\) In first semester calculus, the chain rule was the product of derivatives. Multiply these matrices together to get find \(\frac{dV}{dt}\text{.}\) I.E. computer:

\begin{equation*} \dfrac{dV}{dt}=DV((r,h)(t))\cdot D(r,h)(t). \end{equation*}

(Did you get the same answer as the first part? )

For the results in part 1 and 4 to match you had to replace \(r\) and \(h\) with what they equaled in terms of \(t\text{.}\)

  • What part of the notation \(\dfrac{dV}{dt}=DV((r,h)(t))\cdot D(r,h)(t)\) tells you to replace \(r\) and \(h\) with what they equal in terms of \(t\text{?}\)

Subsection7.4.2Using the Chain Rule

Let's look at some physical examples involving motion and temperature, and try to connect what we know should happen to what the chain rule states.

Exercise7.4.3

Consider \(f(x,y)=9-x^2-y^2\) and \(\vec r(t)=(2\cos t, 3\sin t)\text{.}\) Imagine the following scenario. A horse runs around outside in the cold. The horse's position at time \(t\) is given parametrically by the elliptical path \(\vec r(t)\text{.}\) The function \(T=f(x,y)\) gives the temperature of the air at any point \((x,y)\text{.}\)

(a)

At time \(t=0\text{,}\) what is the horse's position \(\vec r(0)\text{,}\) and what is the temperature \(f(\vec r(0))\) at that position?

(b)

Find the temperatures at \(t=\pi/2\text{,}\) \(t=\pi\text{,}\) and \(t=3\pi/2\) as well.

(c)

In the plane, draw the path of the horse for \(t\in [0,2\pi]\text{.}\)

(d)

If you end up with an ellipse and several concentric circles, then you've done this right.

On the same 2D graph, include a contour plot of the temperature function \(f\text{.}\) Make sure you include the level curves that pass through the points in part a, and write the temperature on each level curve you draw.

(e)

This idea leads to an optimization technique, Lagrange multipliers, later in the semester.

As the horse runs around, the temperature of the air around the horse is constantly changing. At which \(t\) does the temperature around the horse reach a maximum? A minimum? Explain, using your graph.

(f)

As the horse moves past the point at \(t=\pi/4\text{,}\) is the temperature of the surrounding air increasing or decreasing? In other words, is \(\dfrac{df}{dt}\) positive or negative? Use your graph to explain.

(g)

We'll complete this part in class, but you're welcome to give it a try yourself. Draw the 3D surface plot of \(f\text{.}\) In the \(xy\)-plane of your 3D plot (so \(z=0\)) add the path of the horse. In class, we'll project the path of the horse up into the 3D surface.

Exercise7.4.4

Consider again \(f(x,y)=9-x^2-y^2\) and \(\vec r(t)=(2\cos t, 3\sin t)\text{,}\) which means \(x=2\cos t\) and \(y=3\sin t\text{.}\)

(a)

At the point \(\vec r(t)\text{,}\) we'd like a formula for the temperature \(f(\vec r(t))\text{.}\) What is the temperature of the horse at any time \(t\text{?}\) [In \(f(x,y)\text{,}\) replace \(x\) and \(y\) with what they are in terms of \(t\text{.}\)]

(b)

Compute \(df/dt\) (the derivative as you did in first-semester calculus).

(c)

Construct a graph of \(f(t)\) (use software to draw this if you like). From your graph, at what time values do the maxima and minima occur?

(d)

What is \(df/dt\) at \(t=\pi/4\text{?}\)

(e)

Compare your work with the previous exercise.

Exercise7.4.5

Consider again \(f(x,y)=9-x^2-y^2\) and \(\vec r(t)=(2\cos t, 3\sin t)\text{.}\)

(a)

Compute both \(Df(x,y)\) and \(D\vec r(t)\) as matrices. One should have two columns. The other should have one column (but two rows).

(b)

We can write the temperature at any time \(t\) symbolically as \(f(r(t))\text{.}\) First semester calculus suggests the derivative should be the product \((f(\vec r(t))) ' = f'(\vec r(t))\vec r'(t)\text{.}\)

(i)

Write this using \(D\) notation instead of prime notation.

(ii)

Compute the matrix product \(DfD\vec r\text{,}\) and then substitute \(x=2\cos t\) and \(y=3\sin t\text{.}\)

(c)

What is the change in temperature with respect to time at \(t=\pi/4\text{?}\) Is it positive or negative? Compare with the previous exercise.

Subsection7.4.3Formally Defining the Chain Rule

The previous three exercises all focused on exactly the same concept. The first looked at the concept graphically, showing what it means to write \((f\circ \vec r)(t)=f(\vec r(t))\text{.}\) The second reduced the exercise to first-semester calculus. The third tackled the exercise by considering matrix derivatives. In all three cases, we wanted to understand the following question:

If \(z=f(x,y)\) is a function of \(x\) and \(y\text{,}\) and both \(x\) and \(y\) are functions of \(t\) ( \(\vec r(t)=(x(t),y(t))\)), then how do we discover how quickly \(f\) changes as we change \(t\text{.}\) In other words, what is the derivative of \(f\) with respect to \(t\text{.}\) Notationally, we seek \(\ds \frac{df}{dt}\) which we formally write as \(\ds \frac{d}{dt}[f(x(t),y(t))]\) or \(\ds \frac{d}{dt} [f(\vec r(t))].\)

To answer this question, we use the chain rule, which is just matrix multiplication.

This is exactly the same as the chain rule in first-semester calculus. The only difference is that now we have vectors above every variable and function, and we replaced the one-by-one matrices \(f'\) and \(g'\) with potentially larger matrices \(Df\) and \(Dg\text{.}\) If we write everything in vector notation, the chain rule in all dimensions is the EXACT same as the chain rule in one dimension.

Subsection7.4.4Practicing with the Chain Rule

Exercise7.4.6

Suppose that \(f(x,y) = x^2+xy\) and that \(x=2t+3\) and \(y=3t^2+4\text{.}\)

(a)

Rewrite the parametric equations \(x=2t+3\) and \(y=3t^2+4\) in vector form, so we can apply the chain rule. This means you need to create a function \(\vec r(t) = (?, ?)\text{.}\)

(b)

Compute the derivatives \(Df(x,y)\) and \(D\vec r(t)\text{,}\) and then multiply the matrices together to obtain \(\dfrac{df}{dt}\text{.}\)

(c)

How can you make your answer only depend on \(t\) (not \(x\) or \(y\))? Do so.

(d)

The chain rule states that \(D(f\circ \vec r)(t) = Df(\vec r(t))D\vec r(t)\text{.}\) Explain why we write \(Df(\vec r(t))\) instead of \(Df(x,y)\text{.}\)

If you'd like to make sure you are correct, try the following. Replace \(x\) and \(y\) in \(f=x^2+xy\) with what they are in terms of \(t\text{,}\) and then just use first-semester calculus to find \(df/dt\text{.}\) Is it the same?

Exercise7.4.7

Suppose \(f(x,y,z) = x+2y+3z^2\) and \(x=u+v\text{,}\) \(y=2u-3v\text{,}\) and \(z=uv\text{.}\) Our goal is to find how much \(f\) changes if we were to change \(u\) (so \(\partial f/\partial u\)) or if we were to change \(v\) (so \(\partial f/\partial v\)). Try doing this exercise without looking at the steps below, but instead try to follow the patterns in the previous exercise on your own.

(a)

Rewrite the equations for \(x,y,\) and \(z\) in vector form \(\vec r(u,v)=(x,y,z)\text{.}\)

(b)

Compute the derivatives \(Df(x,y,z)\) and \(D\vec r(u,v)\text{,}\) and then multiply them together. Notice that since this composite function has 2 inputs, namely \(u\) and \(v\text{,}\) we should expect to get two columns when we are done.

(c)

What are \(\partial f/\partial u\) and \(\partial f/\partial v\text{?}\)

Hint

remember, each input variable gets a column.

Exercise7.4.8

Let \(\vec F(s,t) = (2s+t,3s-4t,t)\) and \(s=3pq\) and \(t=2p+q^2\text{.}\) This means that changing \(p\) and/or \(q\) should cause \(\vec F\) to change. Our goal is to find \(\partial \vec F/\partial p\) and \(\partial \vec F/\partial q\text{.}\) Note that since \(\vec F\) is a vector-valued function, the two partial derivatives should be vectors. Try doing this exercise without looking at the steps below, but instead try to follow the patterns in the previous exercises.

(a)

Rewrite the parametric equations for \(s\) and \(t\) in vector form.

(b)

Compute \(D\vec F(s,t)\) and the derivative of your vector function from part a, and then multiply them together to find the derivative of \(\vec F\) with respect to \(p\) and \(q\text{.}\) How many columns should we expect to have when we are done multiplying matrices?

(c)

What are \(\partial \vec F/\partial p\) and \(\partial \vec F/\partial q\text{?}\)

Review7.4.9

Suppose \(f(x,y)=x^2+3xy\) and \((x,y) = \vec r(t) = (3t,t^2)\text{.}\) Compute both \(Df(x,y)\) and \(D\vec r(t)\text{.}\) Then explain how you got your answer by writing what you did in terms of partial derivatives and regular derivatives. See 1 We have \(Df(x,y) = \begin{bmatrix}2x+3y\amp 3y \end{bmatrix}\) and \(D\vec r(t) = \begin{bmatrix}3\\2t \end{bmatrix}\text{.}\) We just computed \(f_x\) and \(f_y\text{,}\) and \(dx/dt\) and \(dy/dt\text{,}\) which gave us \(Df(x,y) = \begin{bmatrix}\partial f/\partial x\amp \partial f/\partial y \end{bmatrix}\) and \(D\vec r(t) = \begin{bmatrix}dx/dt\\dy/dt \end{bmatrix}\text{.}\)for an answer.

Subsection7.4.5The General Chain Rule

Exercise7.4.10General Chain Rule Formulas

Complete the following:

(a)

Suppose that \(w=f(x,y,z)\) and that \(x,y,z\) are all function of one variable \(t\) (so \(x=g(t), y=h(t), z=k(t)\)). Use the chain rule with matrix multiplication to explain why

\begin{equation*} \frac{dw}{dt} = \frac{\partial f}{\partial x}\frac{dg}{dt}+\frac{\partial f}{\partial y}\frac{dh}{dt}+\frac{\partial f}{\partial z}\frac{dk}{dt} . \end{equation*}

which is equivalent to writing

\begin{equation*} \frac{dw}{dt} = \frac{\partial f}{\partial x}\frac{dx}{dt}+\frac{\partial f}{\partial y}\frac{dy}{dt}+\frac{\partial f}{\partial z}\frac{dz}{dt} . \end{equation*}
Hint

Rewrite the parametric equations for \(x\text{,}\) \(y\text{,}\) and \(z\) in vector form \(\vec r(t) = (x,y,z)\) and compute \(Dw(x,y,z)\) and \(D\vec r(t)\text{.}\)

(b)

Suppose that \(R=f(V,T,n,P)\text{,}\) and that \(V,T,n,P\) are all functions of \(x\text{.}\) Give a formula (similar to the above) for \(\dfrac{dR}{dx}.\)

Exercise7.4.11

Suppose \(z=f(s,t)\) and \(s\) and \(t\) are functions of \(u\text{,}\) \(v\) and \(w\text{.}\) Use the chain rule to give a general formula for \(\partial z/\partial u\text{,}\) \(\partial z/\partial v\text{,}\) and \(\partial z/\partial w\text{.}\)

Review7.4.12

If \(w=f(x,y,z)\) and \(x,y,z\) are functions of \(u\) and \(v\text{,}\) obtain formulas for \(\dfrac{\partial f}{\partial u}\) and \(\dfrac{\partial f}{\partial v}\text{.}\) See 2 We have \(Df(x,y,z) =\begin{bmatrix}\dfrac{\partial f}{\partial x}\amp \dfrac{\partial f}{\partial y}\amp \dfrac{\partial f}{\partial z} \end{bmatrix}\text{.}\) The parametrization \(\vec r(u,v)=(x,y,z)\) has derivative \(D\vec r =\begin{bmatrix}\dfrac{\partial x}{\partial u}\amp \dfrac{\partial x}{\partial v}\\ \dfrac{\partial y}{\partial u}\amp \dfrac{\partial y}{\partial v}\\ \dfrac{\partial z}{\partial u}\amp \dfrac{\partial z}{\partial v} \end{bmatrix}\text{.}\) The product is \(D(f(\vec r(u,v))) =\begin{bmatrix}\dfrac{\partial f}{\partial x}\dfrac{\partial x}{\partial u}+ \dfrac{\partial f}{\partial y}\dfrac{\partial y}{\partial u}+ \dfrac{\partial f}{\partial z}\dfrac{\partial z}{\partial u}\amp \dfrac{\partial f}{\partial x}\dfrac{\partial x}{\partial v}+ \dfrac{\partial f}{\partial y}\dfrac{\partial y}{\partial v}+ \dfrac{\partial f}{\partial z}\dfrac{\partial z}{\partial v} \end{bmatrix}\text{.}\) The first column is \(\dfrac{\partial f}{\partial u}\text{,}\) and the second column is \(\dfrac{\partial f}{\partial v}\text{.}\)for an answer.

You've now got the key ideas needed to use the chain rule in all dimensions. You'll find this shows up many places in upper-level math, physics, and engineering courses. The following exercise will show you how you can use the general chain rule to get an extremely quick way to perform implicit differentiation from first-semester calculus.

Exercise7.4.13

Suppose \(z=f(x,y)\text{.}\) If \(z\) is held constant, this produces a level curve. As an example, if \(f(x,y) = x^2+3xy-y^3\) then \(5=x^2+3xy-y^3\) is a level curve. Our goal in this exercise is to find \(dy/dx\) in terms of partial derivatives of \(f\text{.}\)

Suppose \(x=x\) and \(y=y(x)\text{,}\) so \(y\) is a function of \(x\text{.}\) We can write this in parametric form as \(\vec r(x) = (x,y(x))\text{.}\) We now have \(z=f(x,y)\) and \(\vec r(x)=(x,y(x))\text{.}\)

To practice the idea developed in this exercise, show that if \(w=F(x,y,z)\) is held constant at \(w=c\) and we assume that \(z=f(x,y)\) depends on \(x\) and \(y\text{,}\) then \(\frac{\partial z}{\partial x} = -\frac{F_x}{F_z}\) and \(\frac{\partial z}{\partial y} = -\frac{F_y}{F_z}\text{.}\)

(a)

Compute both \(Df(x,y)\) and \(D\vec r(x)\) symbolically. Don't use the function \(f(x,y)=x^2+3xy-y^3\) until the last step.

(b)

Use the chain rule to compute \(D(f(\vec r(x)))\text{.}\) What is \(dz/dx\) (i.e., \(df/dx\))?

(c)

Since \(z\) is held constant, we know that \(dz/dx=0\text{.}\) Use this fact, together with part b to explain why \(\ds \frac{dy}{dx} = -\frac{f_x}{f_y} = -\frac{\partial f/ \partial x}{\partial f/ \partial y}\text{.}\)

(d)

For the curve \(5=x^2+3xy-y^3\text{,}\) use this formula to compute \(dy/dx\text{.}\)

Subsection7.4.6Computational Practice

These are provided to help you achieve better skills in basic computational answers.

1
2
3
4