The MATLAB Notebook v1.5.2



Section 4 (Application to Multivariate Calculus): Classification of Critical Points for Functions of Several Variables.

We begin by describing some simple classes of functions of several variables. The simplest such function is, of course, a constant function. Next simplest is the class of linear functions. A linear function of several variables is the sum of terms, each of which is either a constant or a constant multiple of a variable. Thus the most general linear function of three variables, say x1, x2 and x3 has the form a1x1+a2xz+a3x3 + c where a1, a2, a3 and c are constants. Clearly such a function can be written as a•x +c, where a denotes a constant vector and x a variable vector, and this notation generalizes to any number of dimensions (even one). For this discussion, we are going to reverse our usual convention and regard x as a row vector instead of a column vector.

A quadratic function of several variables is the sum of a linear function (possibly 0 or constant) and additional terms, each of which is a constant multiple of either the square of a variable or the product of two distinct variables or, in other words, the sum of a linear function and a quadratic form. Such a function can be expressed in the form xAxT + a•x +c, where a and c are as above and A is a constant symmetric matrix.

In order to go beyond quadratic functions in vector notation, we would have to generalize symmetric matrices to symmetric multi-dimensional arrays, which you will be glad to know we have no intention of doing. What we want to do is relate these ideas to calculus, most particularly to first and second derivatives.

Let us look at a quadratic function f(x)=xAxT+a•x +c, let [pic]x be a vector of the same dimension as x, and write out f(x+[pic]x)= (x+[pic]x)A(xT+[pic]xT) + a•x+ a•[pic]x +c or, expanding the first term and reordering,

f(x+[pic]x)=xAxT + a•x + c + [pic]xAxT + xA[pic]xT + a•[pic]x + [pic]xA[pic]x.T

Because A is symmetric, the third and fourth term here are equal and their sum can be rendered as 2Ax•Δx, enabling us to write the entire expression as f(x+[pic]x)=f(x) + (2Ax +a)•[pic]x + [pic]xA[pic]x.T Regarding x as constant, this is a quadratic function of [pic]x.

If f is not quadratic, we can still fix x and consider the best quadratic approximation to f(x+[pic]x). This has the form

f(x+[pic]x) [pic] f(x)+(f(x)•[pic]x + ½ [pic]xHf(x)[pic]x.T

This is the best quadratic approximation in the sense that the difference between f(x+[pic]x) and the expression on the right is of the order of ||[pic]x||3 or smaller when [pic]x is small. (f, called the gradient of f, is the vector values function whose components are the partial derivatives of f, and Hf, called the Hessian of f, is the matrix valued function whose entries are the second partial derivatives of f. In the one dimensional case, this approximation is the Taylor polynomial of degree two of f at x.

In the case where f is quadratic, the Hessian is constant, but the gradient is not. In the more general case, neither the gradient nor the Hessian is constant. A critical point of f is a point at which the gradient of f is the zero vector or the gradient fails to exist.

Proposition 4.1: Local extreme values of a function occur only at critical points.

Proof: If f has a local extreme value, then f(x + [pic]x)-f(x) can have only one sign for small values of [pic]x. Let us assume (f(x) [pic] 0, and set [pic]x = t(f(x). Then

f(x+[pic]x) – f(x) [pic] t(f(x)•(f(x) + ½t2 (f(x)Hf(x)(f(x)T.

Since the error in this approximation is of the order of t3 for small values of t, it follows that, for small values of t, the sign of f(x+[pic]x)-f(x) is determined by the sign of t(f(x)•(f(x), which is the same as the sign of t. Consequently, f(x+[pic]x)-f(x) can be both positive and negative for small [pic]x, and f does not have an extreme value at x. The proposition is now proved by contraposition.

Although local extrema always occur at critical points, not every critical point is a local extremum. At a critical point, we have the approximation

f(x+[pic]x) – f(x) [pic] ½ [pic]xHf(x)[pic]xT,

with an error term of the order of ||[pic]x||3. The following is an almost immediate consequence.

Proposition 4.2:

5 A critical point for which the Hessian matrix is positive definite is a local minimum.

6 A critical point for which the Hessian matrix is negative definite is a local maximum.

7 A critical point for which the Hessian matrix is indefinite is not a local extremum.

Proof: If the Hessian matrix is positive definite, all its eigenvalues are positive and the quadratic approximation is at least ½[pic]||[pic]x||2, where [pic]>0 is the smallest eigenvalue of H. For sufficiently small ||[pic]x||, it follows that f(x+[pic]x)-f(x) is positive, and part a. is proved. Part b. is proved similarly. To prove part c., we observe that if the Hessian matrix is indefinite, then the quadratic approximation takes both signs. Sufficiently close to the origin, the error is too small to change the sign and it follows that f(x+[pic]x)-f(x) also takes both signs for small [pic]x so that x is not an extremum of f.

A critical point for which the Hessian is indefinite and non-singular is called a saddle point (in two dimensions) or cone point (in three dimensions or higher). A critical point for which the Hessian is singular is called a degenerate critical point.

The local behaviour of a degenerate critical point cannot be determined by the best quadratic approximation. To see this, consider the three functions x2 + y4, x2 - y4 and x2 + y3. Each of these functions has a degenerate critical point at the origin, and they all have the same best quadratic approximation, namely x2, since they have the same first and second partial derivatives at the origin. However their qualitative behaviour, even very close to the origin, is quite different. Only x2 + y4 takes an extreme value at the origin. Distinguishing among the behaviour of x2 - y4 x2 + y3 is left to a problem.

Let us look now at some specific examples, with the help of MATLAB.

Example 1:

syms x y

f=cos(x+x^3)+cos(y)

f =

cos(x+x^3)+cos(y)

The MATLAB function jacobian will be used to compute both the gradient and the Hessian of f as functions of the variables.

gradf=jacobian(f,[x,y])

gradf =

[ -sin(x+x^3)*(1+3*x^2), -sin(y)]

hessf=jacobian(gradf,[x,y])

hessf =

[ -cos(x+x^3)*(1+3*x^2)^2-6*sin(x+x^3)*x, 0]

[ 0, -cos(y)]

The critical points of f are the points at which the gradient vanishes. In this case, f many critical points, but we shall look only at the critical points at [0,0] and [0,pi]. In each case, we will classify the quadratic form corresponing to the Hessian and we will plot contours, both of f and of its quadratic approximation near the critical point. We will do this in a slightly more complicated way than is necessary in this case in order to see what we might do in general.

crit1=[0,0]

crit2=[0,pi]

val1=subs(f,[x,y],crit1)

val2=subs(f,[x,y],crit2)

hess1=subs(hessf,[x,y],crit1)

hess2=subs(hessf,[x,y],crit2)

crit1 =

0 0

crit2 =

0 3.1416

val1 =

2

val2 =

0

hess1 =

-1 0

0 -1

hess2 =

-1 0

0 1

In this case, the eigenvalues of the Hessian are self-evident and we can see that the critical point at the origin is a local maximum, while the other is a saddle point. If the Hessian were not diagonal, we would use eig to make the classification.

We now compute the quadratic approximation to f(x+[pic]x) – f(x) [pic] ½ [pic]xHf(x)[pic]xT. For this purpose, the MATLAB variables x and y correspond to the components of [pic]x.

q1=.5*[x,y]*hess1*[x,y]'

q2=.5*[x,y]*hess2*[x,y]'

q1 =

-1/2*x*conj(x)-1/2*y*conj(y)

q2 =

-1/2*x*conj(x)+1/2*y*conj(y)

Do not be disturbed by the appearance of conj what is happening here is that MATLAB assumes all symbolic variables are able to take on complex values and, for complex arrays, MATLAB computes the conjugate-transpose rather than just the transpose. For the real numbers with which we are concerned, this makes no difference. We will now use contours 2 to compare the contours of ½ [pic]xHf(x)[pic]xT for [pic]x near the origin with contours of f(x,y) for [x,y] near each of the critical points.

contours2(q1,[0,0],1,.3)

[pic]

contours2(f,[0,0],1,.3)

[pic]

contours2(q2,[0,0],1,.3)

[pic]

contours2(f,[0,pi],1,.3)

[pic]

We see, in each case, that the contours of the approximation are qualitatively very similar to the contours of the actual function. If we enlarged the domain of the plot considerably in each case, we would see much more divergence.

contours2(q1,[0,0],10,.3)

[pic]

contours2(f,[0,0],10,.3)

[pic]

contours2(q2,[0,0],10,.3)

[pic]

contours2(f,[0,pi],10,.3)

[pic]

This emphasizes the fact that the qualitative validity of the quadratic approximation is a local phenomenon. Precisely how local depends on the particular case, so you may need to experiment with the size of range variable.

Example 2:

Let us look now at a three dimensional example.

syms z

g=x^3+y^3+z^3+x^2+y^2+z^2+4*x*y+5*y*z

g =

x^3+y^3+z^3+x^2+y^2+z^2+4*x*y+5*y*z

gradg=jacobian(g,[x,y,z])

hessg=jacobian(gradg,[x,y,z])

gradg =

[ 3*x^2+2*x+4*y, 3*y^2+2*y+4*x+5*z, 3*z^2+2*z+5*y]

hessg =

[ 6*x+2, 4, 0]

[ 4, 6*y+2, 5]

[ 0, 5, 6*z+2]

g clearly has a critical point at the origin; let us see if we can find another. The arguments of MATLAB's solve command are functions that are to be set to 0, in this case the three components of the gradient of g. Solve will accept additional arguments specifying the variables to be solved for, but in this case they are not necessary because MATLAB simply assumes it is supposed to solve the three equations for the three variables. The semi-colon suppresses what would otherwise be a messy and uninformative output. The following double command displays the solutions in an easily readable form. (Try it with simply [xc,yc,zc] after the semi-colon to see what double accomplishes here.)

[xc,yc,zc]=solve(gradg(1),gradg(2),gradg(3));double([xc,yc,zc])

ans =

0 0 0

-2.3959 -3.1072 -2.6333

-1.5507 -1.0282 1.0175

-0.5106 - 0.9289i 0.7069 - 0.2469i -0.1376 + 1.0514i

-0.5106 + 0.9289i 0.7069 + 0.2469i -0.1376 - 1.0514i

0.5422 - 1.3557i 0.8868 + 1.7803i 0.6413 - 1.5221i

0.5422 + 1.3557i 0.8868 - 1.7803i 0.6413 + 1.5221i

1.2168 -1.7187 -2.0583

We see from the output that g has four real critical points. We now select the four real ones and name them crit1 through crit4.

crits=[xc,yc,zc];

crit1=double(crits(1,1:3))

crit2=double(crits(2,1:3))

crit3=double(crits(3,1:3))

crit4=double(crits(8,1:3))

crit1 =

0 0 0

crit2 =

-2.3959 -3.1072 -2.6333

crit3 =

-1.5507 -1.0282 1.0175

crit4 =

1.2168 -1.7187 -2.0583

Now we proceed to classify the critical points:

hess1=subs(hessg,[x,y,z],crit1)

eig(hess1)

hess1 =

2 4 0

4 2 5

0 5 2

ans =

8.4031

2.0000

-4.4031

We see that the critical point at the origin is a cone point.

hess2=subs(hessg,[x,y,z],crit2)

hess2 =

-12.3752 4.0000 0

4.0000 -16.6433 5.0000

0 5.0000 -13.7997

eig(hess2)

ans =

-8.2536

-12.9754

-21.5892

This is a local maximum.

hess3=subs(hessg,[x,y,z],crit3)

eig(hess3)

hess3 =

-7.3045 4.0000 0

4.0000 -4.1694 5.0000

0 5.0000 8.1052

ans =

-10.5076

-2.8547

9.9937

Another cone point.

hess4=subs(hessg,[x,y,z],crit4)

eig(hess4)

hess4 =

9.3005 4.0000 0

4.0000 -8.3125 5.0000

0 5.0000 -10.3501

ans =

10.2242

-4.8750

-14.7112

and still another cone point. Let us now use contours3 to compare the contour of g through the fourth critical point with the contour of the quadratic approximation.

contours3([x,y,z]*hess4*[x,y,z]',[0,0,0],1,0)

[pic]

contours3(g,crit4,1,0)

[pic]

Problems:

Use solve to find the critical points of the function (x2+1)(y2+4)(z2+9)+x3+y3+z3. (Solve will find many complex critical points, but only four real ones. ) Use the eigenvalues of the Hessian matrix at each real critical point to classify it as a local minimum, local maximum, cone point, or degenerate critical point.

Obtain contour plots near the origin for x2 + y4, x2 - y4 ,x2 + y3, and their common quadratic approximation. Explain how these plots illustrate the fact that the Hessian matrix at a degenerate critical point does not determine, even locally, the qualitative behaviour of the function.

Use contours3 to plot a contour of the function g in Example 2 near the local maximum crit2 . Compare it with the corresponding contour of the best quadratic approximation at crit.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download