Example of Three Predictor Multiple Regression



Example of Three Predictor Multiple Regression/Correlation Analysis: Checking Assumptions, Transforming Variables, and Detecting SuppressionThe data are from Guber, D.L. (1999). Getting what you pay for: The debate over equity in public school expenditures. Journal of Statistics Education, 7, 1-8. The research units are the fifty states in the USA. We shall pretend they represent a random sample from a population of interest. The criterion variable is mean SAT in the state. The predictors are Expenditure ($ spent per student), Salary (mean salary of teachers), and Pupil/Teacher Ratio. If we consider the predictor variables to be fixed (the regression model), then we do not worry about the shape of the distributions of the predictor variables. If we consider the predictor variables to be random (the correlation model) we do. It turns out that each of the predictors has a distinct positive skewness which can be greatly reduced by a negative reciprocal transformation.Descriptive StatisticsNMeanSkewnessStatisticStatisticStatisticStd. ErrorExpenditure Expenditure per student505.905261.107.337Expend_nr50-.1774-.109.337Pupil_Teacher Pupil Teacher Ratio5016.8581.334.337PupTeach_nr50-.0603.490.337Salary Teacher Salary5034.82892.757.337Salary_nr50-.0295.090.337Valid N (listwise)50Here are the zero-order correlations for the untransformed variables:CorrelationsTotal_SATExpenditure Expenditure per studentPupil_Teacher Pupil Teacher RatioSalary Teacher SalaryTotal_SATPearson Correlation1-.381**.081-.440**Sig. (2-tailed).006.575.001N50505050Expenditure Expenditure per studentPearson Correlation-.381**1-.371**.870**Sig. (2-tailed).006.008.000N50505050Pupil_Teacher Pupil Teacher RatioPearson Correlation.081-.371**1-.001Sig. (2-tailed).575.008.994N50505050Salary Teacher SalaryPearson Correlation-.440**.870**-.0011Sig. (2-tailed).001.000.994N50505050**. Correlation is significant at the 0.01 level (2-tailed).Notice that the more spent on each student and the greater the teachers’ salaries, the lower the SAT scores. Oh my.Here is a regression analysis with the untransformed variables. I asked SPSS for a plot of the standardized residuals versus the standardized predicted scores. I also asked for a histogram of the residuals.Model SummaryModelRR SquareAdjusted R SquareStd. Error of the Estimate1.458a.210.15868.653a. Predictors: (Constant), Salary Teacher Salary, Pupil_Teacher Pupil Teacher Ratio, Expenditure Expenditure per studentANOVAaModelSum of SquaresdfMean SquareFSig.1Regression57495.745319165.2484.066.012bResidual216811.935464713.303Total274307.68049a. Dependent Variable: Total_SATb. Predictors: (Constant), Salary Teacher Salary, Pupil_Teacher Pupil Teacher Ratio, Expenditure Expenditure per studentCoefficientsaModelStandardized CoefficientstSig.CorrelationsBetaZero-orderPart1(Constant)9.639.000Expenditure per student.300.747.459-.381.098Pupil Teacher Ratio.192.968.338.081.127Teacher Salary-.701-1.878.067-.440-.246a. Dependent Variable: Total_SATIf you compare the beta weights with the zero-order correlations, it is obvious that we have some sort of suppression taking place. The beta for expenditure is positive but the zero-order correlation between SAT and expenditure is negative. For the other two predictors the absolute value of beta exceeds the value of their zero-order correlation with SAT.Here is a histogram of the residuals with a normal curve superimposed:The residuals appear to be approximately normally distributed. The plot of standardized residuals versus standardized predicted scores will allow us visually to check for heterogeneity of variance, nonlinear trends, and normality of the residuals across values of the predicted variable. I have drawn in the regression line (error = 0). I see no obvious problems here.Under the homoscedasticity assumption there should be no correlation between the predicted scores and error variance. The vertical spread of the dots in the plot above should not vary as we move left to right. I squared the residuals and correlated them with the predicted values. If the residuals were increasing in variance as the predicted values increase (a common sort of heteroscedasticity) this correlation would be positive. It is close to zero, confirming my eyeball conclusion that there is no problem with that fairly common sort of heteroscedasticity.Now let us look at the results using the transformed data.CorrelationsTotal_SATExpend_nrSalary_nrPupTeach_nrTotal_SATPearson Correlation1-.398**-.467**.089Sig. (2-tailed).004.001.537N50505050Expend_nrPearson Correlation-.398**1.816**-.425**Sig. (2-tailed).004.000.002N50505050Salary_nrPearson Correlation-.467**.816**1.015Sig. (2-tailed).001.000.920N50505050PupTeach_nrPearson Correlation.089-.425**.0151Sig. (2-tailed).537.002.920N50505050**. Correlation is significant at the 0.01 level (2-tailed).The correlation matrix looks much like it did with the untransformed data.Model SummaryModelRR SquareAdjusted R SquareStd. Error of the Estimate1.482a.232.18267.653a. Predictors: (Constant), PupTeach_nr, Salary_nr, Expend_nrThe R2 has increased a bit.ANOVAaModelSum of SquaresdfMean SquareFSig.1Regression63771.502321257.1674.644.006bResidual210536.178464576.873Total274307.68049a. Dependent Variable: Total_SATb. Predictors: (Constant), PupTeach_nr, Salary_nr, Expend_nrCoefficientsaModelStandardized CoefficientstSig.Beta1(Constant)6.519.000Expend_nr.181.530.598Salary_nr-.618-1.997.052PupTeach_nr.176.889.379a. Dependent Variable: Total_SATNo major changes caused by the transformation, which is comforting. Trust me that the residuals plots still look OK too.I wonder what high school teachers would think about the negative relationship between average state salary for teachers and average state SAT score? If we want better education should we lower teacher salaries? There is an important state characteristic that we should have but have not included in our model. Check out the JSE article to learn what that characteristic is.Now, can we figure out what sort of suppression is going on here?CoefficientsaModelStandardized CoefficientsCorrelationsr to BetaBetaZero-order1Expend_nr.181-.398From negative to positiveSalary_nr-.618-.467Gets larger, same signPupTeach_nr.176.089Gets larger, same signa. Dependent Variable: Total_SATIt looks like the expenditures variable is suppressing irrelevant variance in one or both or a linear combination of the other two predictors. Put another way, if we hold constant the effects of teacher salary and number of teachers per pupil, then the relationship between expenditures and SAT goes from negative to positive. Maybe the money is best spent on things other than hiring more teachers or better paid teachers?Let us look at two-predictor models.No suppression between expenditures and teacher salary.CoefficientsaModelStandardized CoefficientsCorrelationsr to BetaBetaZero-order1Expend_nr-.439-.398Up a bitPupTeach_nr-.097.089Changed signa. Dependent Variable: Total_SATA little bit of classical suppression here, but not dramatic.CoefficientsaModelStandardized CoefficientsCorrelationsr to BetaBetaZero-order1Salary_nr-.469-.467Beta strongerPupTeach_nr.096.089Beta strongera. Dependent Variable: Total_SATA little bit of cooperative suppression here, but not dramatic.Maybe the expenditures variable is suppressing irrelevant variance in a linear combination of teacher salary and teacher/pupil ratio. I predicted SAT from salary and pupil to teacher ratio and saved the predicted scores as “predicted23.” Those predicted scores are a linear combination of teacher salary and pupil to teacher ratio, with lower salaries and higher pupil to teacher ratios being associated with higher SAT scores. When I correlate predicted23 with SAT I get .477, the R for SAT predicted from salary and pupil to teacher ratio. Watch what happens when I add the expenditures variable to the predicted23 combination.As you can see, the expenditures variable suppresses irrelevant variance in the predicted23 combination of the other two predictor variables. When you hold total amount of expenditures constant, there is an increase in the predictive value of a linear combination of teacher salary and teacher/pupil ratio.The complete data set, with additional variables, SPSS format.CodebookDESCRIPTIVE ABSTRACT: This dataset contains variables that address the relationship between public school expenditures and academic performance, as measured by the SAT. SOURCE: The variables in this dataset, all aggregated to the state level, were extracted from the 1997 _Digest of Education Statistics_, an annual publication of the U.S. Department of Education. Data from a number of different tables were downloaded from the National Center for Education Statistics (NCES) website (Available at: ) and merged into a single data file.VARIABLE DESCRIPTIONS: Name of state Current expenditure per pupil in average daily attendance in public elementary and secondary schools, 1994-95 (in thousands of dollars)Average pupil/teacher ratio in public elementary and secondary schools, Fall 1994Estimated average annual salary of teachers in public elementary and secondary schools, 1994-95 (in thousands of dollars)Percentage of all eligible students taking the SAT, 1994-95Average verbal SAT score, 1994-95Average math SAT score, 1994-95Average total score on the SAT, 1994-95 SPECIAL NOTES:While an initial scatterplot shows that SAT performance is lower, on average, in high-spending states than in low-spending states, this statistical relationship is misleading because of an omitted variable. Once the percentage of students taking the exam is controlled for, the relationship between spending and performance reverses to become both positive and statistically significant. THE STORY BEHIND THE DATA: I compiled this dataset in response to recent controversy over equity in public school expenditures. While some argue that the prevailing system of financing local schools is unfair, aggregate data reported in the new media seem to suggest, paradoxically, that school spending and academic performance are statistically unrelated. My goal was to create a workable dataset that would introduce students to this continuing debate and allow them the opportunity to build their own conclusions upon a base of solid statistical reasoning. Karl L. WuenschEast Carolina University, Dept. of PsychologyFebruary, 2020Return to Wuensch’s Stats Lessons Page ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download