Learning From Data Lecture 9 Logistic Regression and Gradient Descent

[Pages:23]Learning From Data Lecture 9

Logistic Regression and Gradient Descent

Logistic Regression Gradient Descent

M. Magdon-Ismail

CSCI 4100/6100

recap: Linear Classification and Regression

The linear signal:

s = wtx

Good Features are Important

y

Before looking at the data, we can reason that

symmetry and intensity should be good features

x1

based on our knowledge of the problem.

Algorithms

Linear Classification. Pocket algorithm can tolerate errors Simple and efficient

Linear Regression. Single step learning:

w = Xy = (XtX)-1Xty Very efficient O(N d2) exact algorithm.

x2

c AML Creator: Malik Magdon-Ismail

Logistic Regression and Gradient Descent: 2 /23

Predicting a probability -

Predicting a Probability

Will someone have a heart attack over the next year?

age gender blood sugar HDL LDL Mass Height ...

62 years male 120 mg/dL40,000 50 120 190 lbs 5 10 ...

Classification: Yes/No Logistic Regression: Likelihood of heart attack

logistic regression y [0, 1]

d

h(x) =

wixi = (wtx)

i=0

c AML Creator: Malik Magdon-Ismail

Logistic Regression and Gradient Descent: 3 /23

What is ? -

Predicting a Probability

Will someone have a heart attack over the next year?

age gender blood sugar HDL LDL Mass Height ...

62 years male 120 mg/dL40,000 50 120 190 lbs 5 10 ...

Classification: Yes/No Logistic Regression: Likelihood of heart attack

logistic regression y [0, 1]

1

d

h(x) =

wixi = (wtx)

i=0

0

(s) s

es

(s) =

=

1.

1 + es 1 + e-s

(-s)

=

e-s 1 + e-s

=

1 1 + es

=

1 - (s).

c AML Creator: Malik Magdon-Ismail

Logistic Regression and Gradient Descent: 4 /23

Data is binary ?1 -

The Data is Still Binary, ?1

D = (x1, y1 = ?1), ? ? ? , (xN, yN = ?1)

xn yn = ?1

a person's health information did they have a heart attack or not

We cannot measure a probability. We can only see the occurence of an event and try to infer a probability.

c AML Creator: Malik Magdon-Ismail

Logistic Regression and Gradient Descent: 5 /23

f is noisy -

The Target Function is Inherently Noisy

f (x) = P[y = +1 | x].

The data is generated from a noisy target function:

f (x)

P (y | x) = 1 - f (x)

for y = +1; for y = -1.

c AML Creator: Malik Magdon-Ismail

Logistic Regression and Gradient Descent: 6 /23

When is h good? -

What Makes an h Good?

`fitting' the data means finding a good h

h is good if:

h(xn)

1

h(xn) 0

whenever yn = +1; whenever yn = -1.

A simple error measure that captures this:

Ein(h)

=

1 N

N n=1

h(xn) -

1 2

(1

+

yn)

2.

Not very convenient (hard to minimize).

c AML Creator: Malik Magdon-Ismail

Logistic Regression and Gradient Descent: 7 /23

Cross entropy error -

The Cross Entropy Error Measure

Ein(w)

=

1 N

N

ln(1

n=1

+ e-yn?wtx)

It looks complicated and ugly (ln, e(?), . . .),

But, ? it is based on an intuitive probabilistic interpretation of h.

? it is very convenient and mathematically friendly (`easy' to minimize).

Verify: yn = +1 encourages wtxn 0, so (wtxn) 1; yn = -1 encourages wtxn 0, so (wtxn) 0;

c AML Creator: Malik Magdon-Ismail

Logistic Regression and Gradient Descent: 8 /23

Probabilistic interpretation -

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download