Lecture 5: Multiple Linear Regression

[Pages:50]Lecture 5: Multiple Linear Regression

CS109A Introduction to Data Science

Pavlos Protopapas and Kevin Rader

Lecture Outline

Simple Regression:

? Predictor variables Standard Errors

? Evaluating Significance of Predictors ? Hypothesis Testing ? How well do we know "? ? How well do we know $?

Multiple Linear Regression:

? Categorical Predictors ? Collinearity ? Hypothesis Testing ? Interaction Terms

Polynomial Regression

CS109A, PROTOPAPAS, RADER

1

Standard Errors

The variances of & and ' are also called their standard errors, "& , "' .

If our data is drawn from a larger set of observations then we can empirically estimate the standard errors, "& , "' of & and ' through bootstrapping.

If we know the variance . of the noise , we can compute "& , "' analytically, using the formulae below:

SE b0 =

s

1

x2

n

+

P

i

(xi

x)2

SE b1 = qP i (xi

x)2

CS109A, PROTOPAPAS, RADER

2

Standard Errors

MLBaeortrgteeersdtdaactaota:v:era.gSSaenEE:dbb015((==)5 -oqrs)P.5n1(i+( 5x-iPi)x(.x)x2i2x)2

In practice, we do not know the theoretical value of since we do not know the exact distribution of the noise .

Remember:

5 = 5 + 5 5 = 5 - (5)

CS109A, PROTOPAPAS, RADER

3

Standard Errors

In practice, we do not know the theoretical value of since we do not know the exact distribution of the noise . However, if we make the following assumptions,

? the errors 5 = 5 - $5 and B = B - $B are uncorrelated, for ,

? each 5 is normally distributed with mean 0 and variance .,

then, we can empirically estimate ., from the data and our regression line:

r

sP

n ? MSE =

i (yi

ybi)2

n2

n2

s

X (f^(x) yi)2

n 2 CS109A, PROTOPAPAS, RADER

4

Standard Errors

More data: and 5(5 - ). Largest coverage: () or 5(5 - ). Better data: .

SE b0 =

s

1

x2

n

+

P

i

(xi

SE b1 = qP i (xi

x)2

x)2

Better model: (" - 5)

s X (f^(x) yi)2

n2

Question: What happens to the F&, F' under these scenarios?

CS109A, PROTOPAPAS, RADER

5

Standard Errors

The following results are for the coefficients for TV advertising:

Method

"

Analytic Formula

0.0061

Bootstrap

0.0061

The coefficients for TV advertising but restricting the coverage of x are:

Method Analytic Formula Bootstrap

" 0.0068 0.0068

The coefficients for TV advertising but with added extra noise:

This makes no sense?

Method Analytic Formula Bootstrap

" 0.0028 0.0023

CS109A, PROTOPAPAS, RADER

6

Importance of predictors

We have discussed finding the importance of predictors, by determining the cumulative distribution from to 0.

.

CS109A, PROTOPAPAS, RADER

7

................
................

In order to avoid copyright disputes, this page is only a partial summary.

To fulfill the demand for quickly locating and searching documents.

It is intelligent file search solution for home and business.

Literature Lottery

To fulfill the demand for quickly locating and searching documents.

Related download

Related searches