Introduction



Introduction

You can get the datasets and SAS commands for this workshop on the web page:



The lab examples come from the book, “Linear Mixed Models: A Practical Guide Using Statistical Software”, by Brady T. West, Kathleen B. Welch, and Andrzej T. Gałecki, Chapman & Hall/CRC, 2006 (referred to as WWG). The model and figure numbers in the labs correspond to those in WWG.

Go to

to get more detailed analysis for each example in the book, along with the appropriate syntax in SAS (using SAS 9.1.3), SPSS, Stata, R, and HLM software programs for each analysis presented in the book.

In this workshop we will be using four actual datasets plus the hypothetical banana data.

1. rat_pup.dat This data set was provided by Jose Pinheiro and Doug Bates in their book, “Mixed-Effects Models in S and S-Plus” (2000). In the analysis of this data set we compare the birth weights of 322 rat pups in 27 litters (litter size varies from 2 to 18 pups) born to mothers treated with High, Low, and Control doses of a chemical.

2. classroom.csv This data set was from a study by of Instructional Improvement (SII; Hill, Rowan, and Ball, 2004) and includes data from 1190 first grade students from 312 classrooms in 107 schools.

3. ratbrain.dat This data set was reported by Douglas et. al., 2004 and shows measurement of the effects of two different chemicals on three different regions of the brain in five adult male rats.

4. autism.csv This data set was derived from a study of 156 children with Autism Spectrum Disorder (Oti, Anderson, Lord, 2006). Measurements were made at five basic ages for each child: 2, 3, 5, 9, and 13 years. Not all children were measured at all time points.

Lab Example 2

Two-Level Clustered Data

The Rat Pup Data

(Chapter 3 in WWG)

This is a two-level clustered data set, in which the clusters are litters, and individual pups are the units of analysis. This analysis is displayed in more detail than the analyses for later examples.

In this data set, 30 pregnant rats were treated with one of three doses of a drug (High, Low, or Control), and the birth weights of the rat pups that were born were measured. There were originally 10 litters for each of the drug doses. However, 3 litters in the high dose did not survive, resulting in 27 litters. There were 322 rat pups in the final study.

The table below shows a portion of the Rat Pup data. We have Level 1 covariates (e.g. Sex) that are specific for each rat pup and Level 2 covariates (e.g. Treatment and Litsize) that are constant for all rat pups within a given litter. The response variable, Weight, varies for each rat pup.

Portion of the Rat Pup Data Set

|LITTER |TREATMENT |LITSIZE |PUP_ID |WEIGHT |SEX |

|1 |Control |12 |2 |7.40 |Male |

|1 |Control |12 |3 |7.15 |Male |

|1 |Control |12 |4 |7.24 |Male |

|1 |Control |12 |5 |7.10 |Male |

|1 |Control |12 |6 |6.04 |Male |

|1 |Control |12 |7 |6.98 |Male |

|1 |Control |12 |8 |7.05 |Male |

|1 |Control |12 |9 |6.95 |Female |

|… | | | | | |

|11 |Low |16 |132 |5.65 |Male |

|11 |Low |16 |133 |5.78 |Male |

|… | | | | | |

|21 |High |14 |258 |5.09 |Male |

|21 |High |14 |259 |5.57 |Male |

Data Exploration

We first examine the data using descriptive statistics and take a graphical look at the data using Boxplots.

data ratpup;

infile "rat_pup.dat" firstobs=2 dlm="09"X;

input pup_id weight sex $litter litsize treatment $;

if treatment = "High" then treat = 1;

if treatment = "Low" then treat = 2;

if treatment = "Control" then treat = 3;

run;

proc format;

value trtfmt 1 = "High"

2 = "Low"

3 = "Control";

run;

In the data step we set up the new variable, Treat, so that the highest value (=3) is coded for Control, so that Control will be the “reference level” for Treat.

title "Summary statistics for weight by treatment and sex";

proc means data=ratpup maxdec=2;

class treat sex;

var weight;

format treat trtfmt.;

run;

|Analysis Variable : weight |

|Treat |Sex |N Obs |N |Mean |Std Dev |Minimum |Maximum |

| |Male |33 |33 |5.92 |0.69 |5.01 |7.70 |

|Low |Female |65 |65 |5.84 |0.45 |4.75 |7.73 |

| |Male |61 |61 |6.03 |0.38 |5.25 |7.13 |

|Control |Female |54 |54 |6.12 |0.69 |3.68 |7.57 |

| |Male |77 |77 |6.47 |0.75 |4.57 |8.33 |

There is a tendency for the mean Weight to be lower for the High and Low treatment groups, compared to the Control. We also see that the mean for males is somewhat higher than for females within each level of Treat.

We now look at boxplots of Weight for each combination of treatment and sex.

[pic]

We now graphically examine the relationship between Litter Size and Weight within each Treatment. We first create a new variable called RANKLIT that ranks the litters by size.

/*Create Ranklit*/

proc sort data=ratpup;

by litsize litter;

run;

data ratpup2;

set ratpup;

by litsize litter;

if first.litter then ranklit+1;

label ranklit="New Litter ID (Ordered by Size)";

run;

There appears to be a relationship between litter size and birthweight, so that larger litters tend to have smaller pups, within each treatment and sex, except for Low Males.

[pic]

[pic]

Models

We consider several competing models for the Rat Pup Data. We use a top-down model fitting strategy, where we first set up a “loaded” model that includes all the candidate fixed effects, then we decide on an appropriate covariance structure for the random effects, and finally we reduce the fixed effects portion of the model to arrive at our final model.

The table below shows a summary of each of the models we fit. You can use the model numbers to keep track of what we're doing as we move through the examples.

| | |General |Model |

| |Term / Variable |Notation | |

| | |

|Label |Null |Alternative |

| |(H0) |(HA) |

|1 |1 |14 |

|1 |2 |17 |

|1 |3 |21 |

|2 |1 |16 |

|2 |2 |16 |

|2 |3 |19 |

|… | | |

Note that there are three records per subject, corresponding to the three time points of data collection, and that the response variable has values which vary over time. There is also an explicit variable indicating the time points of the repeated measures.

There may be differing numbers of observations for each subject due to missing data in a repeated measures study, or perhaps due to loss to follow-up in a longitudinal study. Missing data is not a problem in a LMM setting, as long as the data can be considered to be MAR, i.e., Missing at Random.

If you have a data set that has a “horizontal” structure, in which the repeated measurements on the same subject are contained in different variables across the same row, you will need to restructure the data. This can be done using a data step, as shown in the example below:

The SAS commands below rearrange the Rat Brain data set so it is in long form appropriate for analysis using SAS Proc Mixed.

data ratbrain(keep=animal treat region activate);

set ratbrain_wide;

array origvar (2,3) Basal_BST Basal_LS Basal_VDB

Carb_BST Carb_LS Carb_VDB;

do treatment = 1 to 2;

do region = 1 to 3;

activate = origvar(treat,region);

output;

end;

end;

run;

The printout of the Rat Brain data set in the long form shows that we now have 6 observations for each rat, one for each combination of the two treatments (Basal and Carbachol), and three brain regions (BST, LS, and VDB). The dependent variable, ACTIVATE, is now present once on each row of data.

Long Data Set

Obs animal treat region activate

1 R111097 1 1 366.19

2 R111097 1 2 199.31

3 R111097 1 3 187.11

4 R111097 2 1 371.71

5 R111097 2 2 302.02

6 R111097 2 3 449.70

7 R111397 1 1 375.58

8 R111397 1 2 204.85

9 R111397 1 3 179.38

10 R111397 2 1 492.58

11 R111397 2 2 355.74

12 R111397 2 3 459.58

13 R100797 1 1 458.16

14 R100797 1 2 245.04

15 R100797 1 3 237.42

16 R100797 2 1 664.72

17 R100797 2 2 587.10

18 R100797 2 3 726.96

19 R100997 1 1 479.81

20 R100997 1 2 261.19

21 R100997 1 3 195.51

22 R100997 2 1 515.29

23 R100997 2 2 437.56

24 R100997 2 3 604.29

25 R110597 1 1 462.79

26 R110597 1 2 278.33

27 R110597 1 3 262.05

28 R110597 2 1 589.25

29 R110597 2 2 493.93

30 R110597 2 3 621.07

Analysis of the Rat Brain Data Using Proc Mixed

We will first examine using different ways to choose covariance structures for a simple repeated measures design. For the rat brain data, nucleotide activation was measured in three brain regions, with two treatments (Carbachol and Basal).

Analysis comparing regions for Basal (control) treatment using Proc Mixed

We begin the analysis using Proc Mixed by comparing a set of different models, using a marginal model approach (with no random effects), and different covariance structures for R. In the first model, we use an UN structured covariance structure for R. Note that because we are fitting a marginal model, there is no random statement.

title "Repeated Measures Anova Using Proc Mixed";

title2 "Compare Region for Basal Only";

proc mixed data=ratbrain covtest;

where treat = 1;

class region animal;

model activate=region;

repeated /subject=animal type=un r rcorr;

run;

Repeated Measures Anova Using Proc Mixed

Compare Region for Basal Only

The Mixed Procedure

Model Information

Data Set WORK.RATBRAIN

Dependent Variable activate

Covariance Structure Unstructured

Subject Effect animal

Estimation Method REML

Residual Variance Method None

Fixed Effects SE Method Model-Based

Degrees of Freedom Method Between-Within

Class Level Information

Class Levels Values

region 3 1 2 3

animal 5 R100797 R100997 R110597

R111097 R111397

Dimensions

Covariance Parameters 6

Columns in X 4

Columns in Z 0

Subjects 5

Max Obs Per Subject 3

Number of Observations

Number of Observations Read 15

Number of Observations Used 15

Number of Observations Not Used 0

Iteration History

Iteration Evaluations -2 Res Log Like Criterion

0 1 128.65412382

1 1 114.63447657 0.00000000

Convergence criteria met.

Estimated R Matrix for animal R100797

Row Col1 Col2 Col3

1 2842.82 1736.67 1225.30

2 1736.67 1202.34 964.95

3 1225.30 964.95 1276.56

Estimated R Correlation Matrix

for animal R100797

Row Col1 Col2 Col3

1 1.0000 0.9394 0.6432

2 0.9394 1.0000 0.7789

3 0.6432 0.7789 1.0000

Covariance Parameter Estimates

Standard Z

Cov Parm Subject Estimate Error Value Pr Z

UN(1,1) animal 2842.82 2010.18 1.41 0.0786

UN(2,1) animal 1736.67 1268.27 1.37 0.1709

UN(2,2) animal 1202.34 850.18 1.41 0.0786

UN(3,1) animal 1225.30 1132.52 1.08 0.2793

UN(3,2) animal 964.95 785.17 1.23 0.2191

UN(3,3) animal 1276.56 902.66 1.41 0.0786

Fit Statistics

-2 Res Log Likelihood 114.6

AIC (smaller is better) 126.6

AICC (smaller is better) 143.4

BIC (smaller is better) 124.3

Null Model Likelihood Ratio Test

DF Chi-Square Pr > ChiSq

5 14.02 0.0155

Solution for Fixed Effects

Standard

Effect region Estimate Error DF t Value Pr > |t|

Intercept 212.29 15.9785 4 13.29 0.0002

region 1 216.21 18.2690 4 11.83 0.0003

region 2 25.4500 10.4786 4 2.43 0.0721

region 3 0 . . . .

Type 3 Tests of Fixed Effects

Num Den

Effect DF DF F Value Pr > F

region 2 4 185.13 0.0001

We now change the model, using Type=CS for the structure of the R matrix.

repeated region / subject = animal type=cs r rcorr;

Partial output for this model is shown below. The AIC is a bit smaller, BIC is basically unchanged.

Estimated R Correlation Matrix

for animal R100797

Row Col1 Col2 Col3

1 1.0000 0.7379 0.7379

2 0.7379 1.0000 0.7379

3 0.7379 0.7379 1.0000

Covariance Parameter Estimates

Standard Z

Cov Parm Subject Estimate Error Value Pr Z

CS animal 1308.97 1038.07 1.26 0.2073

Residual 464.93 232.47 2.00 0.0228

Fit Statistics

-2 Res Log Likelihood 121.6

AIC (smaller is better) 125.6

AICC (smaller is better) 126.9

BIC (smaller is better) 124.8

We could also consider other structures, such as Compound symmetry heterogeneous (type=CSH), which is not shown here.

Two Repeated Factors (Treat and Region) Analysis Using Proc GLM

Fitting models with more than one repeated measures factor is not as simple as when there is only a single repeated measures factor.

We begin this portion of the analysis by fitting a traditional repeated measures ANOVA model, with two within-subjects factors, for the ratbrain dataset in Wide Form, using Proc GLM.

The Proc GLM syntax sets up the analysis as a multivariate dependent variable problem. We let SAS know that there are two repeated measures factors, Treat and Region, and we specify their levels, in the repeated statement.

title "Proc GLM Repeated Measures ANOVA";

proc glm data=ratbrain_wide;

model Carb_BST Carb_LS Carb_VDB

Basal_BST Basal_LS Basal_VDB = / nouni;

repeated treat 2, region 3 / summary;

run; quit;

The Proc GLM output is shown below. Notice that SAS first fits a MANOVA (multivariate ANOVA) model, which is roughly equivalent to fitting a repeated measures model using Proc Mixed, with an Unstructured R matrix. We will look at the Wilk's Lambda statistics for the multivariate analysis results.

Proc GLM Repeated Measures ANOVA

The GLM Procedure

Number of Observations Read 5

Number of Observations Used 5

Repeated Measures Level Information

Dependent Variable Carb_BST Carb_LS Carb_VDB Basal_BST Basal_LS Basal_VDB

Level of treat 1 1 1 2 2 2

Level of region 1 2 3 1 2 3

MANOVA Test Criteria and Exact F Statistics

for the Hypothesis of no treat Effect

H = Type III SSCP Matrix for treat

E = Error SSCP Matrix

S=1 M=-0.5 N=1

Statistic Value F Value Num DF Den DF Pr > F

Wilks' Lambda 0.10128044 35.49 1 4 0.0040

Pillai's Trace 0.89871956 35.49 1 4 0.0040

Hotelling-Lawley Trace 8.87357462 35.49 1 4 0.0040

Roy's Greatest Root 8.87357462 35.49 1 4 0.0040

MANOVA Test Criteria and Exact F Statistics

for the Hypothesis of no region Effect

H = Type III SSCP Matrix for region

E = Error SSCP Matrix

S=1 M=0 N=0.5

Statistic Value F Value Num DF Den DF Pr > F

Wilks' Lambda 0.00180922 827.58 2 3 ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download