SPSS for Beginners - MedStatWeb



Regression explained in simple terms



A Vijay Gupta Publication

SPSS for Beginners ( Vijay Gupta 2000. All rights reside with author.

Regression explained

Copyright © 2000 Vijay Gupta

Published by VJBooks Inc.

All rights reserved. No part of this book may be used or reproduced in any form or by any means, or stored in a database or retrieval system, without prior written permission of the publisher except in the case of brief quotations embodied in reviews, articles, and research papers. Making copies of any part of this book for any purpose other than personal use is a violation of United States and international copyright laws. For information contact Vijay Gupta at vgupta1000@.

You can reach the author at vgupta1000@.

Library of Congress Catalog No.: Pending

ISBN: Pending

First year of printing: 2000

Date of this copy: April 23, 2000

This book is sold as is, without warranty of any kind, either express or implied, respecting the contents of this book, including but not limited to implied warranties for the book's quality, performance, merchantability, or fitness for any particular purpose. Neither the author, the publisher and its dealers, nor distributors shall be liable to the purchaser or any other person or entity with respect to any liability, loss, or damage caused or alleged to be caused directly or indirectly by the book.

Publisher: VJBooks Inc.

Editor: Vijay Gupta

Author: Vijay Gupta

About the Author

Vijay Gupta has taught statistics and econometrics to graduate students at Georgetown University. A Georgetown University graduate with a Masters degree in economics, he has a vision of making the tools of econometrics and statistics easily accessible to professionals and graduate students.

In addition, he has assisted the World Bank and other organizations with statistical analysis, design of international investments, cost-benefit and sensitivity analysis, and training and troubleshooting in several areas.

He is currently working on:

• a package of SPSS Scripts "Making the Formatting of Output Easy"

• a manual on Word

• a manual for Excel

• a tutorial for E-Views

• an Excel add-in "Tools for Enriching Excel's Data Analysis Capacity"

Expect them to be available during fall 2000. Early versions can be downloaded from .

LINEAR REGRESSION

Interpretation of regression output is discussed in section 1[1]. Our approach might conflict with practices you have employed in the past, such as always looking at the R-square first. As a result of our vast experience in using and teaching econometrics, we are firm believers in our approach. You will find the presentation to be quite simple - everything is in one place and displayed in an orderly manner.

The acceptance (as being reliable/true) of regression results hinges on diagnostic checking for the breakdown of classical assumptions[2]. If there is a breakdown, then the estimation is unreliable, and thus the interpretation from section 1 is unreliable. The table in section 2 succinctly lists the various possible breakdowns and their implications for the reliability of the regression results[3].

Why is the result not acceptable unless the assumptions are met? The reason is that the strong statements inferred from a regression (i.e. - "an increase in one unit of the value of variable X causes an increase in the value of variable Y by 0.21 units") depend on the presumption that the variables used in a regression, and the residuals from the regression, satisfy certain statistical properties. These are expressed in the properties of the distribution of the residuals (that explains why so many of the diagnostic tests shown in sections 3-4 and the corrective methods are based on the use of the residuals). If these properties are satisfied, then we can be confident in our interpretation of the results.

The above statements are based on complex formal mathematical proofs. Please check your textbook if you are curious about the formal foundations of the statements.

Section 3 provides a brief schema for checking for the breakdown of classical assumptions. The testing usually involves informal (graphical) and formal (distribution-based hypothesis tests like the F and T) testing, with the latter involving the running of other regressions and computing of variables.

1. Interpretation of regression results

Assume you want to run a regression of wage on age, work experience, education, gender, and a dummy for sector of employment (whether employed in the public sector).

wage = function(age, work experience, education, gender, sector)

or, as your textbook will have it,

wage = (1 + (2*age + (3*work experience + (4*education + (5*gender + (6*sector

|Always look at the model fit (“ANOVA”) |[pic] |

|first. Do not make the mistake of looking | |

|at the R-square before checking the | |

|goodness of fit. | |

| | |

|Significance of the model (“Did the model | |

|explain the deviations in the dependent | |

|variable”) | |

|The last column shows the goodness of fit | |

|of the model. The lower this number, the | |

|better the fit. Typically, if “Sig” is | |

|greater than 0.05, we conclude that our | |

|model could not fit the data[4]. | |

|The F is comparing the two models below: |

|wage = (1 + (2*age + (3*work experience + (4*education + (5*gender + (6*sector |

|wage = (1 |

| |

|(In formal terms, the F is testiong the hypothesis: (1 = (2 = (3 = (4= (5= (6=0 |

|If the F is not significant, then we cannot say that model 1 is any better than model 2. The implication is obvious-- the use of the |

|independent variables has not assisted in predicting the dependent variable. |

| |

|Sum of squares |

|The TSS (Total Sum of Squares) is the total deviations in the dependent variable. The aim of the regression is to explain these deviations |

|(by finding the best betas that can minimize the sum of the squares of these deviations). |

|The ESS (Explained Sum of Squares) is the amount of the TSS that could be explained by the model. |

|The R-square, shown in the next table, is the ratio ESS/TSS. It captures the percent of deviation from the mean in the dependent variable |

|that could be explained by the model. |

|The RSS is the amount that could not be explained (TSS minus ESS). |

|In the previous table, the column "Sum of Squares" holds the values for TSS, ESS, and RSS. The row "Total" is TSS (106809.9 in the example), |

|the row "Regression" is ESS (54514.39 in the example), and the row "Residual" contains the RSS (52295.48 in the example). |

| |[pic] |

|Adjusted R-square |Std Error of Estimate |

|Measures the proportion of the variance in |Std error of the estimate measures the dispersion of the dependent variables estimate around its|

|the dependent variable (wage) that was |mean (in this example, the “Std. Error of the Estimate” is 5.13). Compare this to the mean of |

|explained by variations in the independent |the “Predicted" values of the dependent variable. If the Std. Error is more than 10% of the |

|variables. In this example, the “Adjusted |mean, it is high. |

|R-Square” shows that 50.9% of the variance | |

|was explained. | |

| | |

|R-square | |

|Measures the proportion of the variation in | |

|the dependent variable (wage) that was | |

|explained by variations in the independent | |

|variables. In this example, the "R-Square"' | |

|tells us that 51% of the variation (and not | |

|the variance) was explained. | |

The reliability of individual coefficients

The table “Coefficients” provides information on the confidence with which we can support the estimate for each such estimate (see the columns “T” and “Sig.”.) If the value in “Sig.” is less than 0.05, then we can assume that the estimate in column “B” can be asserted as true with a 95% level of confidence[5]. Always interpret the "Sig" value first. If this value is more than 0 .1 then the coefficient estimate is not reliable because it has "too" much dispersion/variance.

The individual coefficients

The table “Coefficients” provides information effect of individual variables (the "Estimated Coefficients" or “beta” --see column “B”) on the dependent variable

Confidence Interval

[pic]

Plot of residual versus predicted dependent variable

|This is the plot for the standardized |[pic] |

|predicted variable and the standardized | |

|residuals. The pattern in this plot | |

|indicates the presence of | |

|mis-specification[6] and/or | |

|heteroskedasticity[7]. | |

| | |

Plot of residuals versus independent variables

|The definite positive pattern indicates[8] |[pic] |

|the presence of heteroskedasticity caused, at| |

|least in part, by the variable education. | |

|The plot of age and the residual has no |[pic] |

|pattern[9], which implies that no | |

|heteroskedasticity is caused by this | |

|variable. | |

| | |

| | |

Plots of the residuals

| |[pic] |

|The histogram and the P-P plot of the residual suggest that | |

|the residual is probably normally distributed[10]. You can | |

|also use other tests to check for normality. | |

| | |

|[pic] | |

Regression output interpretation guidelines

|Name Of Statistic/ Chart |What Does It Measure Or Indicate? |Critical Values |Comment |

|Sig.-F |Whether the model as a whole is significant. It|- below .01 for 99% confidence in the ability of the model to |The first statistic to look for in SPSS output. If |

| |tests whether R-square is significantly |explain the dependent variable |Sig.-F is insignificant, then the regression as a |

| |different from zero | |whole has failed. No more interpretation is |

| | |- below .05 for 95% confidence in the ability of the model to |necessary (although some statisticians disagree on |

| | |explain the dependent variable |this point). You must conclude that the "Dependent |

| | | |variable cannot be explained by the |

| | |- below 0.1 for 90% confidence in the ability of the model to |independent/explanatory variables." The next steps |

| | |explain the dependent variable |could be rebuilding the model, using more data |

| | | |points, etc. |

|RSS, ESS & TSS |The main function of these values lies in |The ESS should be high compared to the TSS (the ratio equals |If the R-squares of two models are very similar or |

| |calculating test statistics like the F-test, |the R-square). Note for interpreting the SPSS table, column |rounded off to zero or one, then you might prefer to |

| |etc. |"Sum of Squares": |use the F-test formula that uses RSS and ESS. |

| | | | |

| | |"Total" =TSS, | |

| | | | |

| | |"Regression" = ESS, and | |

| | | | |

| | |"Residual" = RSS | |

|SE of Regression |The standard error of the estimate predicted |There is no critical value. Just compare the std. error to |You may wish to comment on the SE, especially if it |

| |dependent variable |the mean of the predicted dependent variable. The former |is too large or small relative to the mean of the |

| | |should be small ( ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download