Modern regression 2: The lasso - Carnegie Mellon University

Modern regression 2: The lasso

Ryan Tibshirani Data Mining: 36-462/36-662

March 21 2013

Optional reading: ISL 6.2.2, ESL 3.4.2, 3.4.3

1

Reminder: ridge regression and variable selection

Recall our setup: given a response vector y Rn, and a matrix X Rn?p of predictor variables (predictors on the columns)

Last time we saw that ridge regression,

^ridge = argmin

y - X

2 2

+

2 2

Rp

can have better prediction error than linear regression in a variety of scenarios, depending on the choice of . It worked best when there was a subset of the true coefficients that are small or zero

But it will never sets coefficients to zero exactly, and therefore cannot perform variable selection in the linear model. While this didn't seem to hurt its prediction ability, it is not desirable for the purposes of interpretation (especially if the number of variables p is large)

2

Recall our example: n = 50, p = 30; true coefficients: 10 are nonzero and pretty big, 20 are zero

0.8

0.6

0.4

Linear MSE Ridge MSE Ridge Bias^2 Ridge Var

0

5

10

15

20

25

Coefficients

-0.5

0.0

0.5

1.0

q

True nonzero

True zero

qq q

qq q q

qq q q

q

q

q q q qqq qq q q qqq

q

q

0

5

10

15

20

25

0.2

0.0

3

Example: prostate data

Recall the prostate data example: we are interested in the level of prostate-specific antigen (PSA), elevated in men who have prostate cancer. We have measurements of PSA on n = 97 men with prostate cancer, and p = 8 clinical predictors. Ridge coefficients:

lcavol q

q lcavol

0.6

0.6

0.4

0.4

Coefficients

Coefficients

0.2

svi q lweightq

plgbgp4h5

q q

gleasonq

0.2

q svi qlweight

q q

plgbgp4h5

qgleason

0.0

0.0

alcgpe

q q

q q

alcgpe

0

200

400

600

800

1000

0

2

4

6

8

df()

What if the people who gave this data want us to derive a linear model using only a few of the 8 predictor variables to predict the level of PSA?

4

Now the lasso coefficient paths:

lcavol q

q lcavol

0.6

0.6

0.4

0.4

Coefficients

Coefficients

0.2

svi q lweightq

plgbgp4h5

q q

gleasonq

0.2

q svi qlweight

q q

plgbgp4h5

qgleason

0.0

0.0

alcgpe

q q

q q

alcgpe

0

20

40

60

80

0

2

4

6

8

df()

We might report the first 3 coefficients to enter the model: lcavol (the log cancer volume), svi (seminal vesicle invasion), and lweight (the log prostate weight)

How would we choose 3 (i.e., how would we choose ?) We'll talk about this later

5

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download