0.1 The Chain Rule

0.1 The Chain Rule

A basic illustration of the chain rules comes in thinking about runners in a race. Suppose two

brothers, Mark and Brian, hold an annual race to see who is the fastest. Last year Mark won the

race, so this year he will be considered the pace setter. Let M and B be functions that represent the

position of each of the brothers at any given time. Since Mark is the pace setter, his position only

depends on time, so we write M (t). In contrast, we measure Brian's velocity (and thus position)

relative to Mark's, so we write B(M (t)). If Brian is running 2/3 as fast as Mark, then Brian's

velocity is simply

dB 2 dM =? ,

dt 3 dt

meaning that Brian's position changes at a rate 2/3 of that of Mark's. Knowing Mark's velocity (or

instantaneous rate of change of position with time) we can find Brian's velocity just by multiplying

by 2/3, the rate at which Brian's position is changing relative to Mark's. Thus, if Mark is running

at 10 mph, it follows Brian is running at 20/3 6.67 mph. Written symbolically, we observe

dB dB dM = ?.

dt dM dt

This is called the chain rule. It states that if we know the rate of change of Brian's position with

respect to Mark's and we know the rate of change of Mark's position with respect to time, we

can find the rate of change of Brian's position with respect to time, simply by multiplying them

together. If we add a third runner into the race, Kevin, who is running twice as fast as Brian, then

his velocity is given by

dK

dB

dB dM

=2? =2? ? .

dt

dT

dM dt

Noting that 2 is just the rate of change of Kevin's position with respect to Brian's, we find that

dK dK dB dM = ? ?.

dt dt dM dt

In this way we can chain together any number of functions, and find the overall rate of change by looking at the relative rates of change at each step. Note that here we have the composition K(B(M (t))), so we first differentiate K with respect to B, B with respect to M , and finally M with respect to t.

Theorem 0.1.1 (The Chain Rule). Suppose f is differentiable at g(x) and g is differentiable at x. It follows that the composition (f g) is differentiable at x, and

d (f g)(x) = f (g(x)) ? g (x).

dx

In the composition f (g(x)) we call f the outer function and g the inner function. The basic mechanism of the chain rule is as follows: differentiate the outer function holding the inner function as a constant, then multiply the result by the derivative of the inner function. If there is a composition of more than two functions, the above process is simply repeated as many times as necessary.

Example 1. Find the derivative of et (with respect to t), R. Solution The above function is a composition of two functions, eu and u = t. Thus, we can

apply the chain rule. We take the derivative of the outer function (which is eu), evaluate the result

1

at the inner function (u = t), differentiate the inner function (yielding ), and then multiply the

results.

d et = et ? = et. dt

Thus, to find the derivative of an exponential function where the argument is multiplied by a

constant, simply multiply the exponential function by that constant. This is consistent with the

derivative of the exponential function, where that constant is simply a 1.

Example 2. Find the derivative of ex2. Solution Once again we can apply the chain rule, to the composition of eu and u = x2. We

take the derivative of the outer function (which is eu), evaluate the result at the inner function (u = x2), differentiate the inner function (yielding 2x), and then multiply the results.

d ex2 = ex2 ? 2x. dx

Example 3. Find the derivative of at (with respect to t), where a R+. Solution Using the chain rule we can find the derivative of a base a exponential using the

derivative of the base e exponential. Write a = eln(a), which can be done as the exponential function and natural logarithm are inverses (as long as a is a positive real number, as we have not defined the natural logarithm for negative numbers). Now we take the derivative using the chain rule, finding

d at = d eln(a)t = eln(a)t ? ln(a) = ln(a) ? at. dt dt

Example

4.

Find

the

derivative

of

1 1+y2

(with

respect

to

y).

Solution In order to solve this problem it is helpful to first rewrite the function as (1 + y2)-1.

Then we can see there is a composition of two functions u-1 and u = 1 + y2. Applying the chain

rule we find

d (1

dy

+

y2)-1

=

-(1

+

y2)-2

?

2y

=

-2y (1 + y2)2 .

Example 5. Find the derivative of ee2x+1. Solution In this case we're actually looking at a composition of three functions we know how

to differentiate. How can we apply the chain rule in this case? Let us choose our two functions as eu and u = e2x+1. The chain rule tells us

d ee2x+1 = d eu ? d e2x+1 = ee2x+1 ? d e2x+1.

dx

du dx

dx

Now we need to find the derivative of e2x+1 to complete the problem. Choosing eu and u = 2x + 1

we find

d e2x+1 = e2x+1 ? 2 = 2e2x+1. dx

Substituting this into our first equation we find

d ee2x+1 = 2ee2x+1 ? e2x+1. dx

2

The above example illustrates an extremely useful observation we made earlier. If we have a composition of three functions, to find the derivative we simply the multiply the derivatives of the three component functions (at each respective step) together. Thus, we find that

d

df dg dh

f (g(h(x))) =

,

dx

dg dh dx

where we evaluate df /dg at g(h(x)) and dg/dh at h(x). We can easily extend the formula above to a composition of as many functions as we like.

Example 6. Find the derivative of (x2 + 1)3 + 2.

Solution There is nothing specific about the chain rule that tells us how we must choose

our inner and outer functions; the only requirement is we choose functions that we know how to differentiate (even if we need to use the chain rule again to do so). Let us choose u3 + 2 and u = x2 + 1 as our functions, because we know how to differentiate both of them. We find

d ((x2 + 1)3 + 2) = 3(x2 + 1)2 ? 2x. dx

Choosing those two functions as our chain of functions worked well because we knew how to differentiate both of them. Nevertheless, another chain of functions would work just as well. Let us choose u3 + 2, u = v + 1 and v = x2. Then we find

d ((x2 + 1)3 + 2) = 3(x2 + 1)2 ? 1 ? 2x = 3(x2 + 1)2 ? 2x, dx

where du/dv = 1 and dv/dx = 2x. This result is exactly the same as our previous result, as it must be. The lesson to be learned here is that the most important thing to consider with the chain rule is choosing functions that are easy to differentiate. As long as the composition or chain of functions chosen to represent the original function is equivalent, it doesn't matter how many functions are chosen.

Example 7. Find the derivative of y = (1 + a2t)-4 with respect to t. Solution Letting our representative functions be u-4, u = 1 + av and v = 2t we find

dy dt

=

-4(1

+

a2t)-5

?

ln(a)a2t

?

2

=

8 ln(a)a2t - (1 + a2t)5 .

As stated in the previous section, while it is hard to find an intuitive justification for the quotient rule, it follows simply enough from the chain and product rules. We simply begin by writing

u(x)/v(x) = u(x) ? (v(x))-1,

where (v(x))-1 is raised the negative first power, not an inverse function. Differentiating we find

d (u(x) ? (v(x))-1) = u (x) ? (v(x))-1 + u(x) ? (-(v(x))-2 ? v (x) dx

=

(v(x))2 (v(x))2 ?

u (x) ? (v(x))-1 + u(x) ? (-(v(x))-2 ? v (x)

u (x)v(x) - u(x)v (x)

=

(v(x))2

.

In fact, one may sometimes find in practice that is it simply easier to use the product and chain rules in conjunction, rather than working with (and remembering) the quotient rule.

3

There is a final application of the chain rule which is extremely useful for increasing the amount of functions we can differentiate. If we have a function f and its inverse f -1, by definition

(f -1 f )(x) = (f f -1)(x) = x.

Now if two sides of an expression are equal, it follows that they must remain equal after differentiation (after all, how can the same function have two different derivatives?). Using this observation along with the chain rule, we find that

(f f -1) (x) = f (f -1(x))(f -1(x)) = 1.

where the right hand side comes from the fact dx/dx = 1. Solving this equation we find

(f -1)

(x)

=

f

1 (f -1(x)) .

This amazing result is enough to allow us to find the derivative of a function we don't know, if we can find the derivative of its inverse function. Through this technique we can find the derivatives of logarithmic functions, and later will be able to use it to find the derivatives of inverse trigonometric functions.

Theorem 0.1.2 (Derivative of Inverse Functions). Let f be a function differentiable at f , with inverse function f -1. It follows that (f -1) exists, and

(f -1)

(x)

=

f

1 (f -1(x)) .

Example 8. Find the derivative of ln(x) Solution If f -1(x) = ln(x) then f (x) = f (x) = ex, so using the above result we find

d

11

dx

ln(x)

=

eln(x)

=

, x

because the exponential undoes the action of the logarithm (leaving us with x) in the above equation.

Example 9. Find the derivative of loga(x).

Solution If f -1(x) = loga(x) then f (x) = ax and f (x) = ln(a)ax. Using the above result we

find

d

1

1

dx

loga(x)

=

ln(a)aloga(x)

=

, ln(a)x

because the base a exponential undoes the action of the logarithm (leaving us with x) in the above equation.

Before finishing the current discussion on the chain rule, it is worth mentioning a subtlety in the chain rule. Given two functions that are differentiable, the chain rule tells us how to find the derivative of their composition. Nevertheless, it may be possible to compose functions which are not differentiable, and get a result which is differentiable. Let us consider the functions

f (u) =

1 uQ 0 u / Q

and

u(x) =

0 uQ 1 u / Q.

4

These in and of themselves seem to be very esoteric functions, so let's take a minute to think about what they really are. For the function f , whenever the input is a rational number, we get an output of 1. Otherwise, for an irrational number, the output is 0. This function is so badly broken up that we cannot draw it, but we can note that it is discontinuous everywhere (because between any two rationals is an irrational, and between any two irrationals is a rational). The second function u behaves similarly to f , except that it is 0 for rationals and 1 for irrationals. It is equally badly behaved. However, when we look at the composition f (u(x)), something remarkable happens. No matter the input, u outputs a rational number (either 0 or 1), so the output of f (u(x)) = 1 for all x. This function is continuous everywhere, as well as differentiable (f u) (x) = 0 for all x. There is no inconsistency here with the chain rule, because the component functions do not satisfy the hypothesis of the chain rule; the chain rule tells us nothing about the derivative of a composition of functions which are not differentiable.

5

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download