A BRIEF INTRODUCTION TO MULTIPLE REGRESSION AS A ...



A Brief Introduction to Multiple Correlation/Regression as a Simplification of the Multivariate General Linear ModelIn its most general form, the GLM (General Linear Model) relates a set of p predictor variables (X1 through Xp) to a set of q criterion variables (Y1 through Yq). We shall now briefly survey three special cases of the GLM, univariate means, bivariate correlation/regression and multiple correlation/regression.The Univariate Mean: A One Parameter (a) ModelIf there is only one Y and no X, then the GLM simplifies to the computation of a mean. We apply the least squares criterion to reduce the squared deviations between Y and predicted Y to the smallest value possible for a linear model. The prediction equation is . Error in prediction is estimated by . See this demonstration with SPSS.Bivariate Regression: A Two Parameter (a and b) ModelIf there is only one X and only one Y, then the GLM simplifies to the simple bivariate linear correlation/regression with which you are familiar. We apply the least squares criterion to reduce the squared deviations between Y and predicted Y to the smallest value possible for a linear model. That is, we find a and b such that for , theis minimal. The GLM is reduced to , where e is the "error" term, the deviation of Y from predicted Y. The coefficient "a" is the Y-intercept, the value of Y when X = 0 (the intercept was the mean of Y in the one parameter model above), and "b" is the slope, the average amount of change in Y per unit change in X. Error in prediction is estimated by .Although the model is linear, that is, specifies a straight line relationship between X and Y, it may be modified to test nonlinear models. For example, if you think that the function relating Y to X is quadratic, you employ the model .It is often more convenient to work with variables that have all been standardized to some common mean and some common SD (standard deviation) such as 0, 1 (Z-scores). If scores are so standardized, the intercept, "a," drops out (becomes zero) and the standardized slope, the number of standard deviations that predicted Y changes for each change of one SD in X, is commonly referred to as . In a bivariate regression, is the Pearson r. If r = 1, then each change in X of one SD is associated with a one SD change in predicted Y.The variables X and Y may be both continuous (Pearson r), one continuous and one dichotomous (point biserial r), or both dichotomous ().Multiple Correlation/RegressionIn multiple correlation/regression, one has two or more predictor variables but only one criterion variable. The basic model is or, employing standardized scores, . Again, we wish to find regression coefficients that produce a predicted Y that is minimally deviant from observed Y, by the least squares criterion. We are creating a linear combination of the X variables, , that is maximally correlated with Y. That is, we are creating a superordinate predictor variable that is a linear combination of the individual predictor variables, with the weighting coefficients (b1 bp) chosen such that the Pearson r between the criterion variable and the linear combination is maximal. The value of this r between Y and the best linear combination of X’s is called R, the multiple correlation coefficient. Note that the GLM is not only linear, but additive. That is, we assume that the weighted effect of X1 combines additively with the weighted effect of X2 to determine their joint effect, , on predicted Y.The sample R2 is a biased estimator of the corresponding parameter, especially when the number of cases is small relative to the number of predictors. Go to a table of random numbers and randomly select a three digit number. This is the X score for Case 1. Select another. This is the Y score for Case 1. Select another. This is the X score for Case 2. Select another. This is the Y score for Case 2. Now plot these data. Note that you can fit them perfectly with a straight line, so the R2 = 1, even though the corresponding parameter equals zero. Please run the SAS program Variables-Cases-R. Read the comments in the program and pay special attention to the obtained value of R2 in the output. The values of the scores for every variable were randomly selected from a normal distribution with a mean of zero and a variance of one. Draw a conclusion with respect to the relationship between R2 and the number of cases and number of variables. Notice that the output includes an adjusted R2, which is less biased that the unadjusted R2.As a simple example of multiple regression, consider using high school GPA and SAT scores to predict college GPA. R would give us an indication of the strength of the association between college GPA and the best linear combination of high school GPA and SAT scores. We could additionally look at the weights (also called standardized partial regression coefficients) to determine the relative contribution of each predictor variable towards predicting Y. These coefficients are called partial coefficients to emphasize that they reflect the contribution of a single X in predicting Y in the context of the other predictor variables in the model. That is, how much does predicted Y change per unit change in Xi when we partial out (remove, hold constant) the effects of all the other predictor variables. The weight applied to Xi can change dramatically if we change the context (add one or more additional X or delete one or more of the X variables currently in the model). An X which is highly correlated with Y could have a low weight simply because it is redundant with another X in the model.Rather than throwing in all of the independent variables at once (a simultaneous multiple regression) we may enter them sequentially . With an a priori sequential analysis (also called a hierarchical analysis), we would enter the predictors variables in some a priori order. For example, for predicting college GPA, we might first enter high school GPA, a predictor we consider “high-priority” because it is cheap (all applicants can provide it at low cost). We would compute r2 and interpret it as the proportion of variance in Y that is “explained” by high school GPA. Our next step might be to add SAT-V and SAT-Q to the model and compute the multiple regression for . We entered SAT scores with a lower priority because they are more expensive to obtain - not all high school students have them and they cost money to obtain. We enter them together because you get both for one price. This is called setwise entry. We now compare the R2 (squared multiple correlation coefficient) with the r2 previously obtained to see how much additional variance in Y is explained by adding X2 and X3 to the X1 already in the model. If the increase in R2 seems large enough to justify the additional expense involved in obtaining the X2 and X3 information, we retain X2 and X3 in the model. We might then add a yet lower priority predictor, such as X4, the result of an on-campus interview (costly), and see how much further the R2 is increased, etc.In other cases we might first enter nuisance variables (covariates) for which we wish to achieve “statistical control” and then enter our predictor variable(s) of primary interest later. For example, we might be interested in the association between the amount of paternal care a youngster has received (Xp) and how healthy e is (Y). Some of the correlation between Xp and Y might be due to the fact that youngsters from “good” families get lots of maternal (Xm) care and lots of paternal care, but it is the maternal care that causes the youngsters' good health. That is, Xp is correlated with Y mostly because it is correlated with Xm which is in turn causing Y. If we want to find the effect of Xp on Y we could first enter Xm and compute r2 and then enter Xp and see how much R2 increases. By first entering the covariate, we have statistically removed (part of) its effect on Y and obtained a clearer picture of the effect of Xp on Y (after removing the confounded nuisance variable’s effect). This is, however, very risky business, because this adjustment may actually remove part of (or all) of the actual causal effect of Xp on Y. For example, it may be that good fathers give their youngsters lots of care, causing them to be healthy, and that mothers simply passively respond, spending more time with (paternally caused) healthy youngsters than with unhealthy youngsters. By first removing the noncausal “effect” of Xm on Y we, with our maternal bias, would have eliminated part of the truly causal effect of Xp on Y. Clearly our a priori biases can affect the results of such a squential analyses.Stepwise multiple regression analysis employs one of several available statistical algorithms to order the entry (and/or deletion) of predictors from the model being constructed. I opine that stepwise analysis is one of the most misunderstood and abused statistical procedures employed by psychologists. Many psychologists mistakenly believe that such an analysis will tell you which predictors are importantly related to Y and which are not. That is a very dangerous delusion. Imagine that among your predictors are two, let us just call them A and B, each of which is well correlated with the criterion variable, Y. If A and B are redundant (explain essentially the same portion of the variance in Y), then one, but not both, of A and B will be retained in the final model constructed by the stepwise technique. Whether it is A or B that is retained will be due to sampling error. In some samples A will, by chance, be just a little better correlated with Y than is B, while in other samples B will be, by chance, just a little better correlated with Y than is A. With your sample, whether it is A or B that is retained in the model does not tell you which of A and B is more importantly related to Y. I strongly recommend against persons using stepwise techniques until they have received advanced instruction in their use and interpretation. See this warning.AssumptionsThere are no assumptions involved in computing point estimates of the value of R, a, bi, or sest_Y, but as soon as you use t or F to put a confidence interval on your estimate of one of these or test a hypothesis about one of these there are assumptions. Exactly what the assumptions are depends on whether you have adopted a correlation model or a regression model, which depends on whether you treat the X variable(s) as fixed (regression) or random (correlation). Review this distinction between regression and correlation in the document Bivariate Linear Correlation and then work through my lesson on Producing and Interpreting Residuals Plots in SAS.Regression GeometryIn a univariate regression the regression solution is a point in one-dimensional space. It is that point (the mean) that makes the (error) sum of squares as small as possible.In a bivariate regression the regression solution is a line in two-dimensional space. It is that line that makes the error sum of squares (the residuals, the differences between actual and predicted scores) as small as possible. In the two-dimensional scatter plot below the line is the regression solution, the blue dots are the observed values and the red bars the residuals.In a trivariate regression the regression solution is a plane in three-dimensional space. It is that plane that makes the error sum of squares as small as possible. In the three-dimensional scatter plot below the solution is the gray plane. The blue and gray dots are the observed values and the red bars are the residuals.If we have more than two predictors our scatter plot is in hyperspace and the solution is a “regression surface.”Return to Wuensch’s Stats Lessons PageCopyright 2020, Karl L. Wuensch - All rights reserved. ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download