Normal Probability Density Functions

[Pages:15]Intelligent Systems: Reasoning and Recognition

James L. Crowley

ENSIMAG 2 / MoSIG M1

Second Semester 2011/2012

Lesson 17

18 april 2012

Normal Probability Density Functions

Notation .............................................................................2

Bayesian Classification......................................................3

Quadratic Discrimination...................................................4

Discrimination using Log Likelihood ........................................... 6 Example for K > 2 and D > 1........................................................ 7 Canonical Form for the discrimination function ........................... 9 Noise and Discrimination ............................................................. 11 Decision Surfaces for different Noise assumptions ....................... 13 Two classes with equal means ...................................................... 15

Sources Bibliographiques : "Pattern Recognition and Machine Learning", C. M. Bishop, Springer Verlag, 2006. "Pattern Recognition and Scene Analysis", R. E. Duda and P. E. Hart, Wiley, 1973.

Bayesien Discriminant Functions

Lesson 17

Notation

x!!

A vector of D variables.

X

A vector of D random variables.

D

The number of dimensions for the vector

x !

or

! X

!

E

An observation. An event.

!

Ck

The class k.

k

Class index

! !

K

Total number of classes

k

The statement (assertion) that E Ck

p(k) =p(E Ck) Probability that the observation E is a member of the class k.

Note that p(k) is lower case.

P(X)

!

P( X )

!

P( X | k)

Probability density function for X

!

Probability density function for X

!

Probability density for X the class k. k = E Tk.

!

!

!

!

17-2

Bayesien Discriminant Functions

Lesson 17

Bayesian Classification

!

Our problem is to build a box that maps a set of features X from an Observation, E into a class Tk from a set of K possible Classes.

x1

!

x2 ...

Class(x1,x2, ..., x d)}

!^

xd

Let k be the proposition that the event belongs to class k: k = E Tk k Proposition that event E the class k

!

!

In order to minimize the number of mistakes, we will maximize the probability that

"k # E $ Tk

!

"^ k = arg# max{Pr("k | X)} k

We will call on two tools for this:

1) Baye's Rule :

p(" k

|

! X )

=

! P(X

| "k!)p("k P(X )

)

2) Normal Density Functions

!

!

1

P(X |"k ) = D

e?

1( 2

! X

?

! ?

k

)T

C

?1 k

! (X

?

! ?

k

)

1

(2#) 2 det(Ck )2

Last week we looked at Baye's rule. Today we concentrate on Normal Density ! Functions.

17-3

Bayesien Discriminant Functions

Quadratic Discrimination

Lesson 17

The classification function can be decomposed into two parts: d() and gk():

( )!

( ) "^ k = d gk X

g(X ) :

A discriminant function : RD RK

!

d() : a decision function RK {K}

The discriminant is a vector of functions:

!

g!(

! X

)

=

" $ $ $ $

g1 g2

( X! ) (X ) "!

% ' ' ' '

#gK (X)&

Quadratic discrimination functions can be derived directly from p(k | X)

!

p("k

|

! X )

=

! P(X

| "k!) p("k P(X )

)

To minimize the number of errors, we will choose k such that

!

!

"^ k

=

arg#

"k

P(X max{

| "k!) p("k P(X )

) }

but because P(X) is constant for all k, it is common to use:

!

!

"^ k = arg# max{P(X | "k )p("k )}

"k

Remember that the confidence is

!

CF"^ k

=

p("^ k

|

! X) =

! P(X

| "^ k!) p("^ k ) P(X )

Thus the classifier can be decomposed to a selection among a set of parallel ! discriminant functions.

17-4

Bayesien Discriminant Functions

g1 x1

x2

?

g2

?

?

?

?

xn

?

gK

This is easily applied to the multivariate norm:

P(X |k)

=

N(

X ;

?k

,Ck

)

Lesson 17

Max

17-5

Bayesien Discriminant Functions Discrimination using Log Likelihood

As a simple example, let D=1

Lesson 17

(x??)2

e P(X = x | "k ) = N(x; ?, ) =

1 2

? 22

! The discrimination function takes the form:

1 gk (X) = P(X | "k )P("k ) = p(k) 2k

(x??k)2

e ? 2k2

!

Note that k = arg" max{gk (X)} = arg" max{Log{gk (X)}}

k

k

because Log{} is a monotonic function.

!

k = arg" max{Log{ p(#k )N (X;?k ,$ k )}

k

(x??k)2

!

e 1

k = arg-max {Log{

k

2k

? 2k2

} + Log{p(k)} }

(x??k)2

e 1

k = arg-max {Log{

k

2k

} + Log{

? 2k2

} + Log{p(k)} }

(x??k)2

k = arg-max {?Log{

k

2

k} ?

2k2

+ Log{p(k)} }

(x??k)2

k = arg-max {?Log{k}

k

?

2k2

+ Log{p(k)} }

17-6

Bayesien Discriminant Functions Example for K > 2 and D > 1

In the general case, there are D characteristics.

Lesson 17

gk(X ) = p(k | X ) p(k)

Thus the classifier is a machine that calculates K functions gk(X) Followed by a maximum selection.

The discrimination function is gk(X ) = p(X | k ) p(k)

Choose the class k for which arg-max {gk(X )}

k

From Bayes rule:

arg-max {p(k | X ) } = k = arg-max { p(X | k ) p(k) }

k

k

= arg-max {Log{p(X | k )} + Log{p(k)}

k

For a Gaussian (Normal) density function

p(X | wk )

=N(X; ?k ,Ck)

!

1

Log(P(X | "k )} = Log{ D

e } ?

1 2

! (X

? ?! k

)T

C

?1 k

(

! X

? ?! k

)

1

(2#) 2 det(Ck )2

!

! Log( P( X

|

"k

)}

=

?

D 2

Log(2#)

$

1 2

Log{Det(Cx

)}

?

1 2

! (X

?

! ? k

)T

! Ck?1 ( X

?

! ? k

)

!

We can observe that

"

D 2

Log(2#)

can

be

ignored

because

it

is

constant

for

all

k.

The discrimination function becomes:

!

gk

! (X )

=

?

1 2

Log{det(Ck

)}

?

1 2

! (X

?

! ? k

)T

! Ck"1( X

?

! ? k

)

+

Log{

p(#k

)}

17-7

!

Bayesien Discriminant Functions

Lesson 17

gk

! (X )

=

?

1 2

Log{det(Ck

)}

?

1 2

! (X

?

! ? k

)T

! Ck"1( X

?

! ? k

)

+

Log{ p(#k

)}

Different families of Bayesian classifiers can be defined by variations of this formula. This becomes more evident if we reduce the equation to a quadratic polynomial.

!

17-8

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download