A Multilevel Structural Equation Model for Dyadic Data



A Multilevel Structural Equation Model for Dyadic Data

Jason T. Newsom

Portland State Unversity

I begin by giving a brief overview of latent growth models and multilevel regression (i.e., hierarchical linear models). I've assumed some familiarity with both of these techniques, but I've summarized them to illustrate their parallels in the case of repeated measures. I then proceed by proposing a (new, I think) structural equation model approach to hierarchical regression in the case of dyadic data. All comments and questions are welcome and encouraged (newsomj@pdx.edu).

Latent Growth Curve Models

Structural equation modeling (SEM) can be used to estimate individual growth curves by using repeated measures as indicators of two latent variables, an intercept variable (η0) and a slope variable (η1), called “latent growth curve models”. The interpretation of the intercept variable depends on how the loadings on the slope factor are fixed. For instance, one approach to defining the slope variable is to fix loadings to values 0, 1, 2, 3,4, . . . t-1. In this case, the intercept latent variable represents the initial value, because the first loading on η1 is set to 0. One can also “center “ these loadings by setting the middle time point to 0 (e.g., -3.,-2,-1,0,1,2,3), giving the intercept factor the value of the average score across all time points [see here for a Figure showing the general model specification (with estimated loadings on the slope factor, discussed later)].

Mathematically, the latent growth curve model is represented by the following set of formulas:

level-1 equation (measurement model):

|[pic] |(1.1) |

level-2 equations (structural model):

|[pic] |(1.2) |

|[pic] |(1.3) |

yit is the dependent variable. The subscripts i and t indicate a measurement within an individual, i, for each time point, t. η0 is a latent variable that represents the level-1 intercept, η1 is a latent variable that represents the relationship between the time code and the dependent variable (i.e., the growth trajectory), λit are the loadings for each time point on the intercept latent variable (η0) and the slope latent variable (η1). (The intercept, ν, associated with each loading is assumed to be zero and is not shown above.) For simplicity, no level-2 predictors are presented in (1.2) or (1.3), but could be included as predictors of the intercept or slope variables. In the level-2 equations, α0 and α1 are the intercepts or average value of η0 and η1 respectively, and ζ0 and ζ1 are error terms.

More traditionally, the structural model would be represented by grouping each variable into matrices:

|[pic] |(1.4) |

|[pic] |(1.5) |

In these equations, Λ is a 2 X t matrix representing the relationship between 2 latent variables, η0 and η1, and t indicators, one for each time point. The column in Λ which corresponds to η0 is comprised of all 1s, because each loading for this variable is set equal to 1 to define it as the intercept. θε is the matrix of measurement errors for each indicator at each time point. α is a 2 X 1 vector containing latent means for the intercept and slope, representing the average intercept across individuals and the average slope (i.e., trajectory) across individuals. ζ is the error term. The variance of the intercepts and slopes, η0 and η1, are obtained by estimation of the ψ matrix.

Multilevel Regression Models (Hierarchical Linear Models: HLM)

Multilevel regression models estimate predictive relationships when the data are nested or hierarchically structured, as in the case of students nested within schools. The statistical model used for hierarchically structured data is the same statistical model used for longitudinal analysis of individual growth curves. With growth curve models, longitudinal data measurements are considered to be nested within individuals. In general, a multilevel regression with a single level-1 predictor and no level-2 predictors can be written with two sets of equations:

level-1 equation:

|[pic] |(1.6) |

level-2 equations:

|[pic] |(1.7) |

|[pic] |(1.8) |

The first equation is the familiar regression equation, with r representing error or unexplained variance. The subscripts i and g indicate whether the value is for each individual or each group. In the second equation, the intercept values for each group serve as the dependent variable. For simplicity sake, there are no predictors in equation (1.7) or (1.8). In (1.7), γ00 is the intercept (mean of all group intercepts), and u0 is the error or remaining variance. u0 can also be interpreted as the variance of the intercept values across groups. Since the intercepts represent adjusted means for each group (i.e., adjusting or controlling for the effects of xi), u0 is the variance of the adjusted means for each group. In the third equation (1.8), the slopes from the level-1 equation, β1g, serve as dependent variables for each group. γ01 is the intercept in this equation, and represents the average of all slopes, β1g, interpreted as the average effect of xi on the dependent variable across all groups. u1 is the error term or the variance of the slopes across groups (i.e., the variability in the relationship between x and y across the groups). By substituting equations (1.7) and (1.8) into equation (1.6), the HLM model can be expressed as,

|[pic] |(1.9) |

or, by rearranging the terms,

|[pic] |(1.10) |

If growth curve models are tested, the level-1 x variable is replaced by time codes, xt (e.g., 0,1,2,3...t-1). The dependent variable at each time point is regressed on the time code at level-1. Instead of individuals nested within groups, repeated measures are nested within individuals. Level 2 consists of individuals rather than groups.

Comparing the SEM and HLM growth models

Both approaches to latent growth models are essentially equivalent. One important difference between the two approaches is that the SEM approach accounts for measurement error at each time point (in the θε matrix). The parallels between the SEM and HLM approaches can be seen by comparing their algebraic formulas.

| |SEM |HLM |

|Level-1 |[pic] (1.1) |[pic] (1.6) |

|Level-2 |[pic] (1.2) |[pic] (1.7) |

| |[pic] (1.3) |[pic] (1.8) |

These formulas are parallel, although this may not be fully apparent at first glance. In SEM, loadings are used in place of level-1 regression coefficients. In equation (1.1), the level-1 intercept is represented by the product term λitη0i, which refers to the loadings for latent intercept variable and the intercept variable itself. The Λ matrix is analogous to the X matrix in matrix regression in which the first column is a vector of 1's used to produce the intercept. By setting the loadings for the intercept (η0i) to 1, the product of the loadings and the intercept (λitη0i ) of (1.1) is simply equivalent to the intercept term, β0, of equation (1.6). The next term in equation (1.1), λitη1i, representing the slope factor, can also be considered identical to the slope in equation (1.6) as long as the loadings in the Λ matrix are set to values that would be used as predictors in growth curve analysis, such as 0, 1, 2, . . . t-1. Here, λit in (1.1) is equivalent to xt in (1.6). Because λ's are equivalent to x's and the η's are equivalent to β's, it would make more sense to re-express equation (1.1) as,

|[pic] |(1.11) |

By estimating the means and variances of η0i and η1i , we can obtain estimates of the average latent intercept and average latent slope and the extent to which they vary across individuals.

A Multilevel Structural Equation Model for Dyadic Data

Because growth models (repeated measures) and two-level hierarchical regression models are identical statistical models, as I've shown above, it is possible to specify a multilevel SEM for certain hierarchical data situations which uses the same model specifications as those used in the growth model case. I start by describing a dyadic data situation (e.g., couples, twins, mother-child dyads), for which this approach is clearly the simplest and possibly the best suited. I describe the model specification and give an example. I then discuss the possibilities of applying this technique to other data analytic situations. The generalization of the approach should be relatively straight forward for small groups sizes where the sample size is balanced (equal in all groups). Finally, I will discuss the possibilities of generalizing to situations in which group sizes are not equal but remain fairly small.

The usual multilevel approach has been to set up between and within covariance matrices, corresponding to level-1 and level-2 variables, and use a multigroup strategy to estimate the model (eg., Muthen, 1997; Muthen and Satorra, 1995). This approach can be implemented in any of the current SEM software packages by creating between and within covariance matrices, which are then analyzed in a multigroup structural model. This process has been recently automated in Mplus. This approach has a few limitations. First, except for Mplus, the procedure of constructing and reading separate covariance matrices can be cumbersome and analytically intensive, especially for larger models. Second, the multigroup approach is limited to separate within and between models. That is, one cannot analyze relationships between level-1 and level-2 variables. Multilevel regression, in contrast, allows for prediction of level-1 intercepts by level-2 predictors and for prediction of level-1 slopes by level-2 variables (called cross-level interactions"). Third, with small groups, nonconvergence of multilevel structural equation models can be a problem, especially with lower intraclass correlations (see Muthen & Satorra, 1995).

Data requirements. At minimum, one merely needs to have a single dependent measure obtained from each member of the dyad. Multiple indicators for each individual can also be used (see Specifications 2 and 3 below). For dyadic data, it would be optimal to have at least three indicators for each member of the couple. Members of each couple must also be nonexchangable. That is, there must be a basis for distinguishing members of each couple in an identical manner in all groups. Examples might include husbands and wives, mother and child, first born and second born, or caregiver and care recipient. The data set is set up in a so-called "repeated measures" format, in which each case in the data matrix contains information about the dyad. For instance, each record contains information about the husband and the wife, recorded under different variable names (e.g., y1h, y2h, y3h, y4w, y5w, y6w).

Example. To illustrate, I will use an example from a study I conducted recently examining interactions between spousal caregivers and care recipients. There are 118 couples (236 individuals), in which each member of the couple was interviewed separately. I examine five items from the Veit and Ware (1983) positive affect subscale of the Mental Health Inventory. Items such as "How much of the time have you felt the future look hopeful and promising?" on a 6-point scale of frequency of occurrence. Thus the analysis is based on 10 variables--5 items for caregivers and 5 items for care recipients.

Model specification 1. I first take the simplest case in which there is only one measure (i.e., indicator) of positive affect for caregivers and for care recipients (the measure was computed by averaging the five items for each). This model specification follows that of the growth curve model described above in the case in which there is only two time points tested. The first approach is depicted in Figure 1 below. Two latent variables are defined: a latent intercept, η0, and a latent slope, η1,. There is only one indicator for each of these latent variables. The loadings are fixed to specified values and the measurement errors are fixed to zero. The intercept variable, η0, is defined by fixing loadings on each of the two indicators to 1. The slope variable, η1, is defined by fixing loadings on the same two indicators to 0 and 1. The average intercept and average slope across couples (i.e., γ00 and γ01 in HLM notation) are obtained by estimating the mean structures of each (not estimated by default in SEM software packages). The variances of the slopes and intercepts across couples are indicated by the variances of η0 and η1 (i.e., the PSI matrix).

Results. Parallel analyses were conducted using Mplus (Muthen & Muthen, 1999) and HLM 5 (Raudenbush, Bryk, Cheong, & Congdon, 2000). In HLM, it is not possible to estimate variances (i.e., random effects) for the slopes and intercepts simultaneously with dyadic data. One can obtain estimates by running separate models fixing the random effect of either the intercept of the slope to zero, but these estimates will differ from a simultaneous estimation of both. Therefore, they are not presented here. Analyses are presented for dummy coding of the caregiving variable (0 and 1) and for group-mean centering. When dummy coding is used, the intercept represents the average score for caregivers (because they were coded as 0). When group-centering is used, the average intercept represents the grand mean for all couples (caregivers and care recipients combined). In multilevel regression, group-mean centering is achieved by subtracting each individual’s score from the mean of the dyad (this can be done automatically in the HLM software). To obtain the group-mean centered solution using the SEM approach, however, one simply sets the loadings of the slope variable to -.5 and +.5, rather than 0 and 1.

As can be seen in Table 1, the means (and their standard errors) for the intercept and slope variable are highly similar SEM method and the HLM method. The average intercept, approximately 4.2, represents the average positive affect for caregivers. Table 2 presents results when a group-centered approach is used. Notice that the average intercept obtained with centering differs little from that obtained with dummy coding. This is because there is very little difference between caregivers and care recipients on positive affect scores. Thus the average of caregivers and care recipients combined is similar to the average for caregivers only. The average slope (-.056 using the SEM method) represents the relationship between the difference variable (caregivers vs. care recipients) and the dependent variable, and is identical to a test between the caregiver and care recipient means on positive affect.

Table 1. Mean and variance estimates when the caregiving variable is uncentered/dummy coded (0,1).

| |Means (SE) | |Variances | |

| |HLM |SEM |HLM |SEM |

|Intercept |4.221 (.088) |4.214 (.082) |NA |.791 (.103) |

|Slope |-.031 (.126) |-.056 (.099) |NA |1.141 (.149) |

Table 2. Mean and variance estimates when the caregiving variable is group-mean centered (-.5,+.5).

| |Means (SE) | |Variances (SE) | |

| |HLM |SEM |HLM |SEM |

|Intercept |4.205 (.072) |4.186 (.069) |NA |.562 (.073) |

|Slope |-.031 (.103) |-.056 (.099) |NA |1.141 (.149) |

Model specification 2. The above approach has two shortcomings: it assumes no measurement error and only single indicators are used. Another specification is possible when there are several indicators. In this specification true latent variables are used for the intercept, η0, and a latent slope, η1,. Each of the ten indicators (5 caregiver items, 5 care recipient items) define the latent intercept and each loading is set equal to 1. The slope variable represents a difference variable, or dummy variable, distinguishing between the caregiver and care recipient. In Figure 2, I have coded caregivers as 0 and care recipients as 1. To accomplish this, the loadings for each of the caregiver items are set to 0 and each of the loadings for care recipients is set to 1 for the slope factor. The slope variable, η1, then represents the difference between caregivers and recipients on positive affect. In this specification, each of the five items are assumed to be equivalent indicators for caregivers and for care recipients (as is the case when computes an equally weighted composite score). Because of the 0-1 coding used for the latent slope variable, the intercept variable represents the latent mean for caregivers. One could also use an effects code (-1,+1 or -.5,+.5) for the slope variable, to center the variable within each group. In this case, the latent intercept would represent the average score for each couple. To account for intercorrelations between parallel items for caregivers and care recipients, I also estimate the correlations among measurement errors. The error correlations represent association between the errors over and above the variance accounted for by the the intercept and the difference factor. Ideally, one should test these correlations for significance. If they are not significant, the correlations can be set to zero. As in growth curve analysis, the individual intercepts in the measurement equation, ν, are set to zero

The model shown in Figure 2 below, corresponds to a simple multilevel regression model, with one level-1 predictor and no level-2 predictors as in equations (1.6) through (1.8).

Results. Table 3 and Table 4 present the results previously obtained from HLM and new results using Mplus and Specification 2. The most notable difference between results obtained with Specification 1 and Specification 2 is that the variances (or random effects) are smaller under Specification 2. This is the result of estimation of measurement error by the use of true latent variables for the intercept and slope.

Table 3. Mean and variance estimates when the caregiving variable is uncentered/dummy coded (0,1).

| |Means (SE) | |Variances | |

| |HLM |SEM |HLM |SEM |

|Intercept |4.221 (.088) |4.253 (.075) |NA |.538 (.086) |

|Slope |-.031 (.126) |-.043 (.096) |NA |.824 (.142) |

Similar analyses were conducted using group-mean centering of the caregiving variable. Using SEM, the equivalent model to the group-mean centered model in HLM constrains the loadings on η1 to -.5 and +.5. With group-mean centering, the intercept value is the mean of both groups.

Table 4. Mean and variance estimates when the caregiving variable is group-mean centered (-.5,+.5).

| |Means (SE) | |Variances (SE) | |

| |HLM |SEM |HLM |SEM |

|Intercept |4.205 (.072) |4.226 (.068) |NA |.483 (.071) |

|Slope |-.031 (.103) |-.107 (.095) |NA |.820 (.139) |

Model specification 3. In Model Specification 2 above, loadings on η1 were assumed to be equal for caregivers and for care recipients. An alternative specification, using a second-order factor model for the slope variable would not require this assumption. Using this model specification, the loadings for the first order factor can be freely estimated, allowing for unequal loadings for each item. Two second-order factors, η0 and η1 can be used to represent the intercept and slope. As before, an uncentered, dummy coding (0,1) or group-mean centered coding (-.5,+.5) can be used to provide different interpretations for the intercept variable η0 . If dummy coding is used, the intercept represents the mean for the member of the dyad who is assigned 0 for the slope variable; and, if centered codes are used, the intercept represents the mean for the full sample. Several details of the specification are important. First, the intercepts for the first order latent variables (ηcg and ηcr ) should be constrained to zero. Second, it is best to constrain loadings for the first order factors to be equal for parallel items. Third, the disturbances and the correlation for the first order factors should be set to zero. Fourth, the individual intercepts in the first order measurement equation, ν, are set to zero. Figure 3 below illustrates the model using dummy coding.

Results. As can be seen in Table 3, this model specification produces comparable values for the intercept and the slope mean. The two models, centered and uncentered, produce identical estimates for the slope and its standard error. The intercept values, however, differ because in the uncentered case (0,1), the intercept represents the mean of caregivers, whereas, in the centered case (-.5,+.5), the intercept represents the mean of both caregivers and care recipients. As before, because the mean slope is not significant, caregivers and care recipients do not differ on positive affect, and this explains the relatively small difference between the mean intercept value under centered and uncentered coding.

Table 3. Results from the second-order factor model approach.

| |Means (SE) | |Variances (SE) | |

| |Uncentered |Centered |Uncentered |Centered |

|Intercept |4.213 (.093) |4.173 (.083) |.610 (.094) |.468 (.069) |

|Slope |-.079 (.096) |-.079 (.096) |.870 (.142) |.870 (.142) |

Discussion

The SEM approaches described above provide several advantages over existing multilevel structural equation models. First, multilevel models with small groups (e.g., under 10) often fail to converge, and in some cases do not have sufficient degrees of freedom to estimate all random effects. The dyadic approaches presented here should not have similar problems with convergence. Second, although the models presented above did not include level-2 predictors (e.g., household income), these variables would be simple to include. This would provide a substantial advantage over the current multigroup specification of multilevel models, because slopes and intercepts could be used as predicted outcomes (or predictors). Incorporation of level-1 type predictors could be included by predicting first order factors (either with standard latent variables or other slopes and intercepts). Third, the model specification proposed does not require separate covariance matrices for within and between components or special data configurations in which the data are disaggregated. This makes implementation relatively easy in most SEM packages that do not have special features for multilevel models.

Generalization of the Approach

(In this section I present some very rough, preliminary ideas regarding generalization of the dyadic multilevel model. )

The dyadic model specifications above could be generalized to data involving larger groups (e.g., 2, 3, 4,or 5 individuals per group), such as family units or small working groups. The application of these models, however, would be limited to the nonexchangeable case, in which each group member has a role consistently present across groups. As the number of individuals per group increases, however, models would become more complex. Of particular difficulty is the specification of the difference slope variable. A simple difference variable becomes more complicated to implement, because there must be s-1 dummy variables to represent comparisons (where s is the number of individuals per group). These dummy variables would become unwieldy as the number of group members increases. However, one could omit latent slopes under these conditions. If the second order factor approach is used, other level-1 predictors could be included as predictors of the first order factors. Through equality constraints on the causal parameters or use of the multilevel models similar to those specified above, prediction of slopes by level-2 variables (cross-level interactions) could be implemented. Finally, although the examples presented above are based on equal group sizes (i.e, balanced n), application of these models would not necessarily be restricted to balanced data. Through the use of data imputation methods (such as the EM algorithm) now available in several SEM software packages, unbalanced data could also be analyzed. Data imputation with the EM algorithm can be shown to be equivalent to Bayesian estimates obtained in multilevel regression when group sizes are unequal.

| | |

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download