WordPress.com



BUAD 300Dr. R. RobinsonMultiple Regression and ErrorsExplanatory Power MeasurementConsider the following single independent-variable model (1).Y = β0 + β1X + ?(1)As previously reviewed in Handout 10, the F ratio reported on your Minitab output table is a way of measuring the “explanatory power” of your model. (See page 628 of the 12th edition of your text, and the ANOVA table 14.6 in your text.) This ratio is given by F= SSR1SSE(n-2) (2)where SSE is the “sum of squared residuals,” and SSR is the “regression sum of squares.” To find the critical value of the F ratio, check the F table with 1 degree of freedom in the numerator, and n-2 degrees of freedom in the denominator. The “p value” next to the F statistic reported in Minitab is the probability that β1 is zero (the regression has no explanatory power).Multiple RegressionConsider the following “multiple regression” model (3) where Y, X and W are random variables.Y = β0 + β1X + β2W + ?(3)The formulae for the OLS regression coefficients are given below. Note that β1 is multiplying X, and its variance is the denominator of the OLS estimate. The covariance σx,y of X with the dependent variable Y is the numerator of the regression coefficient. We have a similar OLS formula for β2. β0 = Y - β1X - β2Wβ1 = σx,yσX2β2 = σw,yσw2 R2 = 1 – Var(∈)Var(Y) The F ratio reported by Minitab (in the Analysis of Variance table) is now given by (4). F= SSRkSSE(n-k-1)(4) where for the model (3) above, k = 2, the number of independent variables (on the right side of the equals sign).This F-ratio is used to test the joint hypothesis H0: β1 = β2 = 0. (See pages 699-703 of the 12th edition of the text.) If this hypothesis is true then there is no explanatory power to the model. The p-value next to the F-ratio tells you the probability that this H0 is true. CollinearityCollinearity refers to the possible correlation between independent variables in a multiple regression. When this occurs, it is difficult to estimate the true population coefficient from OLS estimates since one coefficient estimate may rob the explanatory ability of the others. For this case, the t-tests are unreliable even when the errors are normal. For this reason, when you build regression models by adding independent variables, you need to calculate F ratios to judge whether the explanatory power of the model has increased. (This is more reliable than testing whether a coefficient is significantly different from zero.) Suppose you begin with k variables in your model, and you add s variables to the original k variables. The appropriate F ratio for judging whether you added explanatory power is given by (5) F= (SSEk- SSEk+S)sSSEk+S /(n-k-s-1) (5)For this F ratio, there are “s” degrees of freedom in the numerator, and “n-k-s-1” in the denominator. (See pages 767-770 of the 12th edition.) If you are only adding 1 independent variable to k original variables, then this F ratio of equation (5) reduces to F= (SSEk- SSEk+1)1SSEk+1 /(n-k-1-1)(6)This is how we will handle the “model building problem” in this class, i.e. use the F ratio in (6). We will only add one variable at a time to the model.Model building is an important topic in statistics. It depends on the combination of (i) the creative envisioning of what independent variables might impact the dependent variable, and (ii) the use of proper statistical techniques. Model BuildingWhen building a multiple regression model, one should calculate the correlation coefficients between the potential independent variables. Then one should try different combinations of independent variables while leaving out some of those that are correlated with those included, i.e. those with correlation coefficients that are large in magnitude. Calculate the F ratios for each combination tried, and select the one with the best explanatory power while leaving out variables that are correlated with others included if they add no explanatory power. The increase – or lack of change – in “explanatory power” as measured by (6), should be the deciding factor as to what should be included.Error AssumptionsOLS assumptions for the error term ∈ are ? is normal with E(?) = 0.Var(?) is constant across observations (homogeneous variance of errors).? is not serially correlated.We previously explored the meaning of each of these assumptions in previous handouts. To these we add the following:Independent Variables Assumption for OLS:The independent variables are statistically independent of each other (not correlated).Review questions: From the ANOVA table for an OLS regression (see Minitab), know how to calculate the F ratio and to test the hypothesis H0: β1 = β2 = 0. Note: See the p-value next to the F ratio in the ANOVA table. This is the probability that H0 is true.Know the OLS assumptions concerning the error term in a regression. Know the OLS assumption concerning the independent variables in a regression.Review questions:For model equation (3) above, given the information below, what are the OLS estimates of β0, β1, β2? What is R2? σY 2 = 10, σε2 = 2, σX2 = 1, σW2 = 2, σX,W = 2, σX,Y = 3, σY,W = 1, Y = 8 , X=4 , W = 5β0 = Y - β1X - β2W = 8 – (3)(4) – (12)(5) = - 14.5β1 = σx,yσX2 = 31 = 3β2 = σw,yσw2 = 12 R2 = 1 – Var(∈)Var(Y) = 1 - 210 = .8Given the data previously sent to you, estimate via Minitab the models (a) and (b) below.Earningst = β0,a + β1,a Hourst + εt (a)Earningst = β0,b + β1,b Hourst + β2,bAget + εt (b)Test the following hypotheses each at 95% significance:H0: Model (b) has greater explanatory power than model (a). (Note that k = 2 for model (b).)Fcalc. = (SSEa- SSEb)1SSEb /(n-k-1-1) = 4,729,128-3,620,5913,620,591(22) = 6.74Fcritical = 4.30Decision rule: If Fcalc. > Fcritical then do not reject H0.Decision: Do not reject H0!H0: β1,b = β2,b = 0.Fcalc = SSR2SSE(n-2-1) = 893.44Fcritical = 3.44Decision rule: If Fcalc. > Fcritical then reject H0.Decision: Reject H0!H0: β2,b > 0.tcalc. = 2.60p value = .017Decision rule: If p value > 5% then do not accept H0.Decision: Do not reject H0!H0: β1,a > β1,b. (Note: d.f. ≈ 56.)tcalc. = OLS β1,a-OLS β1,b SEa2+SEb2 = 244.43 - 244.77 6.48a2+5.79b2 = -.0045tcritical = 1.673Decision rule: If tcalc > tcritical then do not reject H0.Decision: Reject H0!Estimation of model (a):Earningst = β0,a + β1,a Hourst + εt (a)Regression EquationEarnings=409 +?244.43?HoursCoefficientsTermCoefSE CoefT-ValueP-ValueVIFConstant4091512.710.013?Hours244.436.4837.750.0001.00Model SummarySR-sqR-sq(adj)R-sq(pred)453.44798.41%98.34%98.21%Analysis of VarianceSourceDFAdj SSAdj MSF-ValueP-ValueRegression12929624182929624181424.820.000? Hours12929624182929624181424.820.000Error234729128205614??? Lack-of-Fit1616137581008600.230.993? Pure Error73115370445053??Total24297691546???Estimation of model (B):Regression EquationEarnings=-39 +?244.77?Hours +?13.54?AgeCoefficientsTermCoefSE CoefT-ValueP-ValueVIFConstant-39219-0.180.860?Hours244.775.7942.240.0001.00Age13.545.222.600.0171.00Model SummarySR-sqR-sq(adj)R-sq(pred)405.67598.78%98.67%98.50%Analysis of VarianceSourceDFAdj SSAdj MSF-ValueP-ValueRegression2294070954147035477893.440.000? Hours12936310792936310791784.210.000? Age1110853711085376.740.017Error223620591164572??Total24297691546???Repeat the analysis and hypotheses tests presented above but for model equations (a) and (c) as presented below:Given the data previously sent to you, estimate via Minitab the models (a) and (c) below.Earningst = β0,a + β1,a Hourst + εt (a)Earningst = β0,b + β1,b Hourst + β2,Gendert + εt (c)Test the following hypotheses each at 95% significance:H0: Model (c) has greater explanatory power than model (a). (Note that k = 2 for model (c).)Fcalc. = (SSEa- SSEc)1SSEc /(n-k-1-1) = Fcritical = Decision rule: If Fcalc. > Fcritical then do not reject H0.Decision: (ii) H0: β1,c = β2,c = 0.Fcalc = SSR2SSE(n-2-1) = Fcritical = Decision rule: If Fcalc. > Fcritical then reject H0.Decision: H0: β2,c > 0.tcalc. = p value = Decision rule: If p value > 5% then do not accept H0.Decision: H0: β1,a > β1,c. (Note: d.f. ≈ 56.)tcalc. = OLS β1,a-OLS β1,c SEa2+SEc2 = tcritical = Decision rule: If tcalc > tcritical then do not reject H0.Decision: ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download