Multiple Linear Regression (MLR) Handouts

[Pages:50]Multiple Linear Regression (MLR) Handouts

Yibi Huang

? Data and Models ? Least Square Estimate, Fitted Values, Residuals ? Sum of Squares ? Do Regression in R ? Interpretation of Regression Coefficients ? t-Tests on Individual Regression Coefficients ? F -Tests on Multiple Regression Coefficients/Goodness-of-Fit

MLR - 1

Data for Multiple Linear Regression

Multiple linear regression is a generalized form of simple linear

regression, in which the data contains multiple explanatory

variables.

SLR

MLR

xy

case 1: x1 y1 case 2: x2 y2

... ...

x1 x2 . . . xp y

x11 x12 . . . x1p y1

x21 x22 . . . x2p y2 ... ... . . . ... ...

case n: xn yn

xn1 xn2 . . . xnp yn

For SLR, we observe pairs of variables. For MLR, we observe rows of variables. Each row (or pair) is called a case, a record, or a data point yi is the response (or dependent variable) of the ith observation There are p explanatory variables (or covariates, predictors, independent variables), and xik is the value of the explanatory variable xk of the ith case

MLR - 2

Multiple Linear Regression Models

yi = 0 + 1xi1 + . . . + pxip + i where i 's are i.i.d. N(0, 2)

In the model above, i 's (errors, or noise) are i.i.d. N(0, 2) Parameters include:

0 = intercept; k = regression coefficients (slope) for the kth

explanatory variable, k = 1, . . . , p 2 = Var(i ) is the variance of errors

Observed (known): yi , xi1, xi2, . . . , xip Unknown: 0, 1, . . . , p, 2, i 's Random variables: i 's, yi 's Constants (nonrandom): k 's, 2, xik 's

MLR - 3

Questions

What are the mean, the variance, and the distribution of yi ?

We assume i 's are independent. Are yi 's independent? MLR - 4

Fitting the Model -- Least Squares Method

0 2 4 6 8 10 14

Recall for SLR, the least

squares estimates (0, 1)

for (0, 1) is the intercept and slope of the straight line y

with the minimum sum of

squared vertical distance to

the data points

n i =1

(yi

-

0

-

1xi )2

q

regression line

q

an arbitrary

q

q

q q

q

q straight line

q

q

012345678 x

MLR is just like SLR. The least squares estimates (0, 1, . . . , p) for 0, . . . , p is the intercept and slopes of the (hyper)plane with the minimum sum of squared vertical distance to the data points

n

(yi - 0 - 1xi1 - . . . - pxip)2

i =1

MLR - 5

Solving the Least Squares Problem (1)

From now on, we use the "hat" symbol to differentiate the estimated coefficient j from the actual unknown coefficient j . To find the (0, 1, . . . , p) that minimize

n

L(0, 1, . . . , p) = (yi - 0 - 1xi1 - . . . - pxip)2

i =1

one can set the derivatives of L with respect to j to 0

L

n

= -2 (yi - 0 - 1xi1 - . . . - pxip)

0

i =1

L

n

= -2 xik (yi - 0 - 1xi1 - . . . - pxip), k = 1, 2, . . . , p

k

i =1

and then equate them to 0. This results in a system of (p + 1) equations in (p + 1) unknowns.

MLR - 6

Solving the Least Squares Problem (2)

The least square estimate (0, 1, . . . , p) is the solution to the following system of equations, called normal equations.

n0 + 1

0

n i =1

xi 1

+

1

0

n i =1

xik

+

1

0

n i =1

xip

+

1

n i =1

xi

1

n i =1

xi21

+ ? ? ? + p

+ ? ? ? + p ...

n i =1

xik xi1

+???

+ p

...

n i =1

xip xi 1

+???

+ p

n i =1

xip

=

n i =1

xi 1 xip

=

n i =1

xik xip

=

n i =1

xi2p

=

n i =1

yi

n i =1

xi

1yi

n i =1

xik

yi

n i =1

xip

yi

Don't worry about solving the equations. R and many other softwares can do the computation for us.

In general, j = j , but they will be close under some conditions

MLR - 7

Fitted Values

The fitted value or predicted value: yi = 0 + 1xi1 + . . . + pxip

Again, the "hat" symbol is used to differentiate the fitted value yi from the actual observed value yi .

MLR - 8

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download