An Introduction in Structural Equation Modeling

Introduction Structural Equation Modeling

1

Family Science Review, 11, 354-373.

An Introduction to Structural Equation Modeling1

J.J. Hox

University of Amsterdam/Utrecht University T.M. Bechger CITO, Arnhem

Abstract This article presents a short and non-technical introduction to Structural

Equation Modeling or SEM. SEM is a powerful technique that can combine complex path models with latent variables (factors). Using SEM, researchers can specify confirmatory factor analysis models, regression models, and complex path models. We present the basic elements of a structural equation model, introduce the estimation technique, which is most often maximum Likelihood (ML), and discuss some problems concerning the assessment and improvement of the model fit, and model extensions to multigroup problems including factor means. Finally, we discuss some of the software, and list useful handbooks and Internet sites.

What is Structural Equation Modeling?

Structural Equation Modeling, or SEM, is a very general statistical modeling technique, which is widely used in the behavioral sciences. It can be viewed as a combination of factor analysis and regression or path analysis. The interest in SEM is often on theoretical constructs, which are represented by the latent factors. The relationships between the theoretical constructs are represented by regression or path coefficients between the factors. The structural equation model implies a structure for the covariances between the observed variables, which provides the alternative name covariance structure modeling. However, the model can be extended to include means of observed variables or factors in the model, which makes covariance structure modeling a less accurate name. Many researchers will simply think of these models as `Lisrel-models,' which is also less accurate. LISREL is an abbreviation of LInear Structural RELations, and the name used by J?reskog for one of the first and most popular SEM programs. Nowadays structural equation models need not be linear, and the possibilities of SEM extend well beyond the original Lisrel program. Browne (1993), for instance, discusses the possibility to fit nonlinear curves.

Structural equation modeling provides a very general and convenient framework for statistical analysis that includes several traditional multivariate procedures, for example factor analysis, regression analysis, discriminant analysis, and canonical correlation, as special cases. Structural equation models are often visualized by a graphical path diagram. The statistical model is usually represented in a set of matrix equations. In the early seventies, when this technique was first introduced in social and behavioral research, the software usually required setups that specify the model in terms of these matrices. Thus, researchers had to distill the matrix representation from the path diagram, and provide the software with a series of matrices for the different sets of

1 Note: The authors thank Alexander Vazsonyi and three anonymous reviewers for their comments on a previous version. We thank Annemarie Meijer for her permission to use the quality of sleep data.

Introduction Structural Equation Modeling

2

parameters, such as factor loadings and regression coefficients. A recent development is software that allows the researchers to specify the model directly as a path diagram. This works well with simple problems, but may get tedious with more complicated models. For that reason, current SEM software still supports the command- or matrix-style model specifications too.

This review provides a brief and non-technical review of the basic issues involved in SEM, including issues of estimation, model fit, and statistical assumptions. We include a list of available software, introductory books, and useful Internet resources.

Examples of SEM-Models

In this section, we set the stage by discussing examples of a confirmatory factor analysis, regression analysis, and a general structural equation model with latent variables.

Structural equation modeling has its roots in path analysis, which was invented by the geneticist Sewall Wright (Wright, 1921). It is still customary to start a SEM analysis by drawing a path diagram. A path diagram consists of boxes and circles, which are connected by arrows. In Wright's notation, observed (or measured) variables are represented by a rectangle or square box, and latent (or unmeasured) factors by a circle or ellipse. Single headed arrows or `paths' are used to define causal relationships in the model, with the variable at the tail of the arrow causing the variable at the point. Double headed arrows indicate covariances or correlations, without a causal interpretation. Statistically, the single headed arrows or paths represent regression coefficients, and double-headed arrows covariances. Extensions of this notation have been developed to represent variances and means (cf. McArdle, 1996). The first example in Figure 1 is a representation of a confirmatory factor analysis model, with six observed variables and two factors.

Confirmatory Factor Analysis The model in Figure 1 is a confirmatory factor model for data collected by Holzinger and Swineford, extracted from the AMOS manual (Arbucle, 1997, p. 375, see also J?reskog & S?rbom, 1989, p. 247). The data are the scores of 73 girls on six intelligence tests. There are two hypothesized intelligence factors, a verbal and a spatial ability factor, which are drawn as latent factors which are assumed to cause the variation and covariation between the six observed variables. There is a double-headed arrow between the two factors, which indicates that we assume that the two factors are correlated. The arrows from the factors to the variables represent linear regression coefficients or `factor loadings'. We do not assume that the latent factors completely explain the observed variation; each observed variable is associated with a residual error term, which is also unmeasured and depicted by a circle.

Introduction Structural Equation Modeling

3

1

spatial

1

verbal

visperc

1 e_v

cubes

1 e_c

lozenges 1 e_l paragraph 1 e_p

sentence 1 e_s

wordmean 1 e_w

Figure 1. Confirmatory Factor analysis; Holzinger and Swineford data.

Factor analysis assumes that the covariances between a set of observed variables can be explained by a smaller number of underlying latent factors. In exploratory factor analysis, we proceed as if we have no hypothesis about the number of latent factors and the relations between the latent factors and the observed variables. Statistical procedures are used to estimate the number of underlying factors, and to estimate the factor loadings. In exploratory factor analysis, the model is arbitrary: all variables load on all factors. Typically, a transformation method such as Varimax rotation is used to improve the interpretation of the results. In contrast, the path diagram in Figure 1 represents a clear hypothesis about the factor structure. Models of this kind are called restricted or confirmatory factor analysis (CFA) models. In structural equation modeling, the confirmatory factor model is imposed on the data. In this case, the purpose of structural equation modeling is twofold. First, it aims to obtain estimates of the parameters of the model, i.e. the factor loadings, the variances and covariances of the factor, and the residual error variances of the observed variables. The second purpose is to assess the fit of the model, i.e. to assess whether the model itself provides a good fit to the data. We will deal with these issues in more detail later.

Typically, some of the factor loadings are constrained or fixed to be zero. In Figure 1, the absence of arrows going from the verbal factor to `visperc,' `cubes' and `lozenges,' means that the corresponding loadings in the factor matrix are fixed to zero. Similarly, the loadings of `paragraph,' `sentence' and `wordmean' on the spatial factor are also fixed to zero. The factor model in Figure 1 shows a perfect simple structure: each variable loads on one factor only. Confirmative factor analysis can specify such a structure exactly, and test whether it is plausible, while explorative factor analysis can only approximate such simple structures by rotation.

For each factor, we also must fix one loading to one. This is needed to give the latent factor an interpretable scale. If we do not fix one factor loading to one for (or to another number not equal to zero), the scale of the latent factor is undetermined. For each latent factor, we can estimate the loadings given a fixed variance for the latent factor,

Introduction Structural Equation Modeling

4

which standardizes the scale of the factor to a Z-score, or we can estimate the factor variance given at least one fixed loading. Since the loadings are a function of the variance of the latent factor, and the variance of the latent factor is a function of the loadings, we cannot simultaneously estimate unique values for all of these. Thus, one solution here is to fix the variance of all factors to one, and estimate all factor loadings. In SEM, it is more customary to use the other solution, which is to fix one loading for each factor to one, and estimate the factor variances. Only the Lisrel program fixes the variances of the factors by default.

The factors `e_v' to `e_w' to the right of Figure 1 represent measurement errors. We expect that the verbal and spatial factor will not perfectly predict the observed variables, and this is modeled by specifying a specific error factor for each observed variable. These measurement errors are often pictured as single arrows. The representation in Figure 1, which is the standard representation in the program AMOS (Arbucle, 1997), makes clear that these errors are also unobserved factors.

In SEM, we must specify a model before we start the analysis. The model specification is usually guided by a combination of theory and empirical results from previous research. Once we have specified a model, we can estimate factor loadings and (co)variances. We can also conduct a statistical chi-square test to assess how well the hypothesized model fits the data. If the chi-square is highly significant, the hypothesized model is rejected, and we could search for a better model. Of course, the chi-square test in SEM shares with all statistical tests the problems of the need for assumptions and the dependence of its power on the sample size. We will address these problems later.

The chi-square for the two-factor model in Figure 1 is 7.9 with 8 degrees of freedom and a p-value of 0.45. Apparently, the two-factor model is a plausible model for these test data. The estimates of the factor loadings are presented in Table 1.

Table 1. Factor loadings Holzinger & Swineford data (loading (s.e.)) and Squared Multiple Correlations (SMC)

visperc cubes lozenges paragraph sentence wordmean

unstandardized loadings

spatial

verbal

1.00

.61 (.14)

1.20 (.27)

1.00

1.33 (.16)

2.23 (.26)

standardized loadings

spatial

verbal

.70

.65

.74

.88

.83

.84

SMC

.49 .43 .54 .77 .68 .71

The standard errors that are in parentheses next to the unstandardized loadings can be used to assess their significance. The statistic formed by dividing an estimate by its standard error is called the critical ratio (C.R.). With large samples, the critical ratio can be referred to the standard normal distribution. Thus, a value for the C.R. of 1.96 or higher (and ?1.96 and lower), indicates two-sided significance at the customary 5% level. In SEM, for instance in the program EQS, the C.R. test is sometimes referred to as the Wald test. In our confirmatory factor analysis, the critical ratio tests indicate that all loadings are significant.

In SEM, it is usual to analyze the covariance matrix and not the correlation matrix, for sound statistical reasons (see Bollen, 1989, or Loehlin, 1998, for details). However, the software also produces standardized estimates, which are generally used for interpretation. The standardized loadings in Table 1 can be interpreted in the same way as

Introduction Structural Equation Modeling

5

in exploratory factor analysis. The squared multiple correlations (SMC's) in table 1 are in fact the communalities of the variables. Finally, the model estimates the correlation between the two factors as 0.49, which is moderately high. Exploratory factor analysis can also produce factor intercorrelations, but these are strongly influenced by details of the specific rotation method used. The ability of SEM to produce a meaningful identification of the correlations between factors is a key strength.

Multiple Regression Analysis It is instructive to see how a familiar analysis procedure, such as multiple regression analysis, looks when represented as a path model. Figure 2 is a multiple regression model to predict the perceived burden of having children from the variables `child care by neighbors,' `integration in the neighborhood,' `inability to be alone,' `relatives in the area,'and `child care by relatives.' The example is taken from Goldsteen and Ross (1989).

neigcare integrat cantalon relativs relacare

b_err

1

burden

Figure 2. Perceived burden of children, multiple regression. NOTE: neigcare = child care by neighbors; integrat = integration in neighborhood; cantalon = inability to be alone; relativs = relatives in area; relacare = child care by relatives.

Figure 2 makes two things quite clear. Firstly, in multiple regression analysis, we generally assume that the independent variables are correlated; in Figure 2 we find that assumption as the two-headed arrows between the predictor variables. Secondly, the residual error in multiple regression analysis is actually an unobserved, latent variable. Note that we again must fix the loading of the residual error factor to one, to achieve identification.

If we estimate the model in Figure 2, we obtain unstandardized and standardized regression weights, a variance estimate for the residual errors, and the squared multiple correlation of the dependent variable `burden.' In this case, we do not get a chi-square test for the model fit. The reason is that the model in Figure 2 estimates precisely as many parameters as we have data points. Let us count them. We have six variables, which gives us a 6?6 variance/covariance matrix. This matrix has six variances and 15 unique covariances, together 21 data points. The path model in Figure 2 implies estimation of the following parameters: the variances and covariances of the three independent variables

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download