Max/minnotes - Saint Mary's College



Math programming

Classical optimization (Max/min) – text 4.1-4.4

We now move to optimization of non-linear functions.

The first situation we will deal with is looking for max/min values of a differentiable function, with no constraints on our domain. We will therefore be concentrating on local extrema – with some nods toward the problem of global extrema.

You have seen most of this before:

1-variable case in Calc 1, recall questions on base concepts test

2-variable case in calc 3, also on base concepts (remember gradient)

Sections 4.1-4.3 mainly review these, some different points of view – in particular, use of the one-variable case to give ideas for the multi-variable case.

Terms defined::

absolute max/min, relative (local) max/min [strong/weak forms – depending on inequality] . “neighborhood” of point – all points within some fixed (usually small) distance ( of pt. We rarely care about size of neighborhood – usually just is there a neighborhood of point in which ....

Discussion is focusing on differentiable functions [so not consider possible critical points where derivative is undefined] and on local optima - no consideration of restrictions on variables

4.2 covers Necessary conditions for existence of extreme value at a point:

Reminder of results from 1-variable situation:

Must be at stationary pt (f((X0) = 0) because f(X0 +h) must be no smaller than f(X0) for h close to 0 (whether positive or negative), that is we must have ≥ 0 for h > 0, ≤ 0 for h < 0 so limit (at h = 0) can only be 0.

Extend for multivariable function f:Rn ( R , this must hold for every vector (direction) h (divide by |h| , of course, not h) so Gradient must be 0

4.3 discusses sufficient condition for relative optimum at a stationary pt. – for single-variable case

Still assuming function has continuous derivatives in neighborhood. of pt X0 .

One-variable case: based on mean value theorem:

If f: R ( R differentiable on [a,b], then there is point q between a & b for which f((q) = or (more useful for us) f(b) = f(a) + (b–a) f((q)

If we represent interval as x0 to some x0 + h , says, “If f and its derivatives are continuous at x0, then for any h in some neighborhood of 0,

there is some c with 0 < c < 1 s.t f(x0 + h) = f(x0) + h f((c x0 + (1-c) (x0 + h))

[The c x0 + (1-c) (x0 + h) expression represents some point on the segment from x0 to x0 + h useful when we extend this idea to multivariate case]

Take higher derivatives, repeat, get Taylor’s theorem:

If f has enough derivatives in a neighborhood of x0 , then for any pt x0 + h in the neighborhood, there is a c, 0 < c < 1, s.t.

f(x0 + h) = f(x0) + h f(((x0) + f(((x0) + f(3)(x0) + ... + f(n) (c x0 + (1-c) (x0 + h)) [This is usually written using x-x0 rather than h and is the Taylor Series expansion . If we extend to power n (of x-x0) and leave off the degree n+1 error term, this is the is nth Taylor polynomial approximation

One result you may not have seen:

Theorem 4.3.1 (p156) If f and its first n derivatives are continuous at x0, then f has a relative optimum at x0 if and only if n , the order of the first non-vanishing derivative at x0 , is even. In addition, in this case f has a relative maximum at x0 if f(n)(x0) < 0 and a relative minimum at x0 if f(n)(x0) > 0 .

Section 4.4 carries the analogues of these results into functions of several variables

Necessary conditions for existence of an optimum:

If a function f:Rn→R is to have a maximum at f(X0) we must have f(X0) ≥ f(X) for all X in a neighborhood of X0 That is, for every vector h of small enough length, f(X0) must be larger than f(X + h).

Looking at the standard basis vectors E1 E2 , . . . En, we see we must have, for any number number with

| h | small

≥ 0 when h is positive and

≤ 0 when h is negative. Taking the limit as h→0 , we see that if there is a limit [if f is differentiable) we must have ≤ 0 and ≥ 0 - that is, all the partial derivatives of f must be 0 at X0 .

The points at which all partials are 0 are called stationary points of f.

Gradient of f at X0 (symbol ∇f(X0) ) is the vector of partial derivatives, evaluated at X0.

It will be convenient to refer to the Hessian matrix of f - which is the nxn matrix of second -order partials

H = [pic] (If the second-order partials are continuous, this matrix will be symmetric)

Taylor's theorem for several variables says:

If f(X) is continuous and has continuous second-order partials over an open convex set in Rn , then for any two points X1 and X2 = X1 + h in the convex set, there is a number c , 0 < c < 1 for which

f(X2) = f(X1) + ∇f(X1) hT + h H [cX1 + (1-c)X2] hT

If ∇f(X1) =0 (i.e. if we have a stationary point), then f(X2) will be always smaller than f(X1) ( f(X1) will be a local maximum) if h H(X0) hT is negative for every vector h close to 0 . [Notice that for small (in length) h, sign of H [cX1 + (1-c)X2] will be same as sign of H [X1] because the partials are continuous]

Similarly, f(X1) will be a local minimum if h H(X0) hT is positive for every vector h close to 0

To make use of this, we take advantage of some linear algebra (again):

Definition:

A symmetric nxn matrix A is

positive definite if X A XT > 0 for all nonzero X ∈ Rn

positive semidefinite if X A XT ≥ 0 for all nonzero X ∈ Rn

negative definite if X A XT < 0 for all nonzero X ∈ Rn

negative semidefinite if X A XT ≤ 0 for all nonzero X ∈ Rn

indefinite otherwise

Easiest sufficient conditions for existence of an optimum at a stationary point:

A stationary point gives a (local) minimum if the Hessian matrix at the point is positive definite and gives a maximum if the Hessian matrix is negative definite.

[These are sufficient - not necessary - conditions]

Theorem 4.4.4 The definiteness of a nonsingular symmetric matrix A is the same as the definiteness of the diagonal matrix D which contains, on the diagonal, the inverse of the pivot operations on the pivot elements obtained during the reduction of A to reduced form.

We can also test directly from the definition of “positive definite”: is [h1, h2, ...hn] H(X0) [h1, h2, . . . hn]T positive for all values of h1, h2, . . .,hn

Example:

let f(X) = f(x1,x2) = x12 + x1x2 + x22 - 6x1 + 2

∇f(x1,x2) = (2x1 + x2 - 6, x1 + 2x2)

The only stationary point is (4, –2)

H = so H(4,–2) =

For any h = [h1,h2], h H(4,–2) hT = h12 + h1h2 + + = + which is clearly positive,

This says 1.) the matrix is positive definite.

2.) For every point (x1,x2) close to (4, –2) (every point of the form (4+h1, –2 + h2 ) with h1, h2 close to 0), f(x1,x2) is larger than f(4,–2), so (4,–2) gives a local minimum for f.

Using Theorem 4.4.4:

In reducing this matrix, we multiply the pivot element H11 row 1 by 1/2 to get and perform the pivot to get We then multiply the pivot element H22 by 1/1.5. The diagonal matrix is D = . This diagonal matrix is positive definite since (x1,x2)D(x1,x2)T = 2x12 + 1.5x22 (which is always positive)

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download