An Introduction to Multiple Regression

Life is complex and one variable alone is usually not able to explain a social problem. For example, child abuse may have multiple factors that are associated with it, including:

• Family level of stress

• Family income

• Child’s age

• Parent’s age

• Degree of social isolation

• Parenting skills

• Quality of support system

• Parent’s coping skills

• Family size

These variables are called predictor variables, and knowledge of the extent to which they exist can be helpful in predicting the likelihood that child abuse will occur. We choose predictor variables based on theory, prior research, and on our experience. Some predictor variables (independent variables) are more important than others, that is, they have a stronger relationship to what is being predicted (the dependent variable). We might also say that they explain more of the variance in the dependent variable. All of the predictor variables taken together may be more helpful than any one predictor variable by itself. The combination of predictor variables that we believe are important in predicting the dependent variable is sometimes called a model. For example, knowing a family’s level of stress, income, and support system may be more helpful in predicting child abuse than just knowing the income alone. Finally, we rarely have perfect prediction with our list of predictor variables. Other things not considered or measured may also influence the likelihood of child abuse, for example, whether the parent previously experienced abuse as a child, the parent’s involvement with drugs and alcohol, the child’s temperament, and even random events that cause stress or anger. Finally, some things are simply not useful in predicting child abuse. They explain little or none of the variance in child abuse, for example, the child’s hair color, the parent’s political affiliation, and the number of movie theaters in the neighborhood. Knowing which variables are good predictor variables is very useful in making decisions about how to handle child abuse cases. It helps to better prioritize who receives immediate and intensive services and which services are most useful in developing an intervention plan.

Multiple regression is a statistical procedure that finds the relationship between several independent or predictor variables and a dependent or criterion variable. Multiple regression is based on a number of assumptions that include:

• Data are at the interval or ratio level.

• The relationship between the independent and dependent variables is linear.

• Scores should be normally distributed and vary about equally (homoscedasticity).

• Independent variables should not correlate highly with each other, no more than about .60 (otherwise, they may simply be two measures of the same thing).

• There is sufficient sample size. It is recommended that there be at least 10 to 20 cases (observations) for each variable in the analysis (Cohen & Cohen, 1983).

Multiple regression is based on multiple correlation. As noted above, several variables may be used as predictor variables. The correlation of these combined variables may be higher than the correlation of any one predictor variable. For example, a child’s self-esteem may be predicted by the number of friends the child has, as well as the score on a Family Relationships scale. The Family Relationship Scale correlates .70 with self-esteem; Number of Friends correlates .51. In this case, Family Relationship is a stronger predictor than Number of Friends. The correlation of the combination of the predictors and the dependent variable is known as multiple R; R is called the coefficient of multiple correlation. The combination of Family Relationship and Number of Friends may correlate more highly than either variable alone, R = .84. Notice that multiple R is not just the addition of two predictor variables. The simple addition of the variables is 1.21 (.70 +.51), an impossible score for correlation, which has values only between -1 and 1. The reason that R is .84 is because some of the variance explained by each predictor variable is accounted for by the correlation of Family Relationships and Number of Friends; in other words, children with good family relationships also have a higher number of friends. Each variable, however, independently accounts for some of the variance in self-esteem. To summarize, if we know both the Family Relationship score and the Number of Friends, we can predict self-esteem better than if we know only one of the predictor variables. Finally, R² is a measure of how much of the variance is accounted for by the multiple correlation. In this case, R² = .71; 71% of the variance in self-esteem can be explained by the combination of the two predictor variables.

Reading Multiple Regression Tables

Statistical programs return a number of statistics when computing multiple regression. SPSS, for example, first calculates multiple R. The significance of multiple R is tested with the F statistic. If the p-value of the F test is ................

