Multiple Regression Analysis - UW Courses Web Server



Multiple Regression Analysis

An extension of simple regression, in which we add independent variables to the analysis.

We try to incorporate all possible explanations of variation in dependent variable (unlike simple, in which it was just one, and with a nod to the impossibility of our task…)

The Generic Equation

Yi = β0 + β1X1i + β2X2i + β3X3i + … + βkXki + εi

The Estimating Equation

[pic] = b0 +b1X1i + b2X2i + … bkXki

where the b’s are now partial regression slope coefficients:

Partial regression slope coefficients: the expected change in Y with a one-unit change in X1. That is, when all other independent variables are held constant.

In multiple regression, we follow our standard 11 steps to happiness, and next class we’ll get into step six, checking for multicollinearity. For now, we will just work on the changes to step 8, the statistical tests, and some definitions of key concepts.

Step 8a) Overall Goodness of Fit

Was Goodness of Fit, using R2, in single regression. For multiple regression, the overall goodness of fit test is:

Ho: B1 = B2 = Bk = 0

Ha: Bi ≠ 0

Test statistic for this is the F test: [pic] (where k is # of independent variables)

Based on R-squared, SSD/TSS

As your model gets better at predicting, the F increases. The F is bounded to the left by zero. Reading the F table, you’ll need an alpha, and the numerator d.f. is k, and the denominator d.f. is n-k-1, written:

Falpha, k, n-k-1

If Ho is rejected, conclude that at least one partial regression slope coefficient does not equal 0.

If you reject Ho, then you can interpret R-squared as variation in Y about its mean that is explained by the combined linear influence of independent variables as a percentage of total variation of Y about its mean.

That is also to say

The variation of X1 explains some of the variation in Y,

The variation of X2 explains some of the variation in Y… etc.

R-squared is still bounded by 0 and 1, and higher is still better.

Example:

(discuss this column after R-bar squared)

Yi = β0 + β1X1i + εi R2 = .46 R-bar squared = .42

Yi = β0 + β1X1i + β2X2i + εi R2 = .47 R-bar squared = .40

BUT, the problem here is that whenever you add another independent variable you will pick up more variation in Y just due to the addition of more variation, so R2 will increase. (price of OJ example)

So we use R2 adjusted, or R-bar squared to correct for this interpretive problem.

[pic] and R-bar squared is always less than or equal to R-squared.

It tells us if a variable should be added or subtracted from the equation (used or not used). You add a variable only if the R-bar squared increases with its addition. If R-bar squared goes down, you should not have added it.

If R-bar squared increases, then the increase in SSDue grew more than k (the number of variables)

Step 8b) Stays the same, done on every coefficient:

t-test assesses each coefficient to determine if it is statistically different than zero.

Ho: βk = 0

Ha: βk ≠ 0

[pic] where d.f.= n-k-1 where k is the number of independent variables

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download