Plots of Residuals



Producing and Interpreting Residuals Plots in SPSS(

[pic]

In a linear regression analysis it is assumed that the distribution of residuals, [pic], is, in the population, normal at every level of predicted Y and constant in variance across levels of predicted Y. I shall illustrate how to check that assumption. Although I shall use a bivariate regression, the same technique would work for a multiple regression.

Start by downloading Residual-Skew.dat and Residual-Hetero.dat from my StatData page and ANOVA1.sav from my SPSS data page. Each line of data has four scores: X, Y, X2, and Y2. The delimiter is a blank space.

Create new variable SQRT_Y2 this way: Transform, Compute,

[pic] OK.

First some descriptive statistics on the variables:

[pic]

Notice that variables X and Y are not skewed – I generated them with a normal random number generator. Notice that X2 and Y2 are skewed and that taking the square root of Y2 reduces its skewness greatly.

Here we predict Y from X, produce a residuals plot, and save the residuals.

[pic]

[pic]

[pic]

[pic]

Here is a histogram of the residuals with a normal curve superimposed. The residuals look close to normal.

[pic]

Here is a plot of the residuals versus predicted Y. The pattern show here indicates no problems with the assumption that the residuals are normally distributed at each level of Y and constant in variance across levels of Y. SPSS does not automatically draw in the regression line (the horizontal line at residual = 0). I double clicked the chart and then selected Elements, Fit Line at Total to get that line.

[pic]

SPSS has saved the residuals, unstandardized (RES_1) and standardized (ZRE_1) to the data file:

[pic]

Analyze, Explore ZRE_1 to get a better picture of the standardized residuals. The plots look fine. As you can see, the skewness and kurtosis of the residuals is about what you would expect if they came from a normal distribution:

[pic]

[pic]

Now predict Y from the skewed X2.

You conduct this analysis with the same plots and saved residuals as above.

You will notice that the residuals plots and exploration of the saved residuals indicate no problems for the regression model. The skewness of X2 may be troublesome for the correlation model, but not for the regression model.

[pic]

Next, predict skewed Y2 from X.

[pic]

[pic]

[pic]

Notice that the residuals plots shows the residuals not to be normally distributed – they are pulled out (skewed) towards the top of the plot. Explore also shows trouble:

[pic]

[pic]

Notice the outliers in the boxplot.

[pic]

Maybe we can solve this problem by taking the square root of Y2. Predict the square root of Y from X.

[pic]

[pic]

[pic]

Notice that the transformation did wonders, reducing the skewness of the residuals to a comfortable level.

[pic]

We are done with the Residual-Skew data set now. Read into SPSS the ANOVA1.sav data file. Conduct a linear regression analysis to predict illness from dose of drug. Save the standardized residuals and obtain the same plots that we produced above.

[pic]

[pic]

Look at the residuals plot. Oh my. Notice that the residuals are not symmetrically distributed about zero. They are mostly positive with low and high values of predicted Y and mostly negative with medium values of predicted Y. If you were to find the means of the residuals at each level of Y and connect those means with the line you would get a curve with one bend. This strongly suggests that the relationship between X and Y is not linear and you should try a nonlinear model. Notice that the problem is not apparent when we look at the marginal distribution of the residuals.

Produce the new variable Dose_SQ by squaring Dose, OK.

[pic]

Now predict Illness from a combination of Dose and Dose_SQ. Ask for the usual plots and save residuals and predicted scores.

[pic]

Model Summary(b)

|Model |R |R Square |Adjusted R |Std. Error of the |

| | | |Square |Estimate |

|1 |.657(a) |.431 |.419 |9.238 |

a Predictors: (Constant), Dose_SQ, dose

b Dependent Variable: illness

[pic]

Notice that the R has gone up a lot and is now significant, and the residuals plot looks fine.

Let us have a look at the regression line. We saved the predicted scores (PRE_1), so we can plot their means against dose of the drug:

Click Graphs, Line, Simple, Define.

[pic]

Select Line Represents Other statistic and scoot PRE_1 into the variable box. Scoot Dose into the Category Axis box. OK.

[pic]

Wow, that is certainly no straight line. What we have done here is a polynomial regression, fitting the data with a quadratic line. A quadratic line can have one bend in it.

Let us get a scatter plot with the data and the quadratic regression line. Click Graph, Scatter, Simple Scatter, Define. Scoot Illness into the Y-axis box and Dose into the X-axis box. OK. Double-click the graph to open the graph editor and select Elements, Fit line at total. SPSS will draw a nearly flat, straight line. In the Properties box change Fit Method from Linear to Quadratic.

[pic]

Click Apply and then close the chart editor.

[pic][pic]

We are done with the ANOVA.sav data for now. Bring into SPSS the Residual-HETERO.dat data. Each case has two scores, X and Y. The delimiter is a blank space. Conduct a regression analysis predicting Y from X. Create residuals plots and save the standardized residuals as we have been doing with each analysis.

[pic]

As you can see, the residuals plot shows clear evidence of heteroscedasticity. In this case, the error in predicted Y increases as the value of predicted Y increases. I have been told that transforming one the variables sometimes reduces heteroscedasticity, but in my experience it often does not help.

Return to Wuensch's SPSS Lessons Page

Copyright 2007, Karl L. Wuensch - All rights reserved.

( Copyright 2007, Karl L. Wuensch - All rights reserved.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download