Introduction
Introduction
You can get the datasets and SAS commands for this workshop on the web page:
The lab examples come from the book, “Linear Mixed Models: A Practical Guide Using Statistical Software”, by Brady T. West, Kathleen B. Welch, and Andrzej T. Gałecki, Chapman & Hall/CRC, 2006 (referred to as WWG). The model and figure numbers in the labs correspond to those in WWG.
Go to
to get more detailed analysis for each example in the book, along with the appropriate syntax in SAS (using SAS 9.1.3), SPSS, Stata, R, and HLM software programs for each analysis presented in the book.
In this workshop we will be using four actual datasets plus the hypothetical banana data.
1. rat_pup.dat This data set was provided by Jose Pinheiro and Doug Bates in their book, “Mixed-Effects Models in S and S-Plus” (2000). In the analysis of this data set we compare the birth weights of 322 rat pups in 27 litters (litter size varies from 2 to 18 pups) born to mothers treated with High, Low, and Control doses of a chemical.
2. classroom.csv This data set was from a study by of Instructional Improvement (SII; Hill, Rowan, and Ball, 2004) and includes data from 1190 first grade students from 312 classrooms in 107 schools.
3. ratbrain.dat This data set was reported by Douglas et. al., 2004 and shows measurement of the effects of two different chemicals on three different regions of the brain in five adult male rats.
4. autism.csv This data set was derived from a study of 156 children with Autism Spectrum Disorder (Oti, Anderson, Lord, 2006). Measurements were made at five basic ages for each child: 2, 3, 5, 9, and 13 years. Not all children were measured at all time points.
Lab Example 2
Two-Level Clustered Data
The Rat Pup Data
(Chapter 3 in WWG)
This is a two-level clustered data set, in which the clusters are litters, and individual pups are the units of analysis. This analysis is displayed in more detail than the analyses for later examples.
In this data set, 30 pregnant rats were treated with one of three doses of a drug (High, Low, or Control), and the birth weights of the rat pups that were born were measured. There were originally 10 litters for each of the drug doses. However, 3 litters in the high dose did not survive, resulting in 27 litters. There were 322 rat pups in the final study.
The table below shows a portion of the Rat Pup data. We have Level 1 covariates (e.g. Sex) that are specific for each rat pup and Level 2 covariates (e.g. Treatment and Litsize) that are constant for all rat pups within a given litter. The response variable, Weight, varies for each rat pup.
Portion of the Rat Pup Data Set
|LITTER |TREATMENT |LITSIZE |PUP_ID |WEIGHT |SEX |
|1 |Control |12 |2 |7.40 |Male |
|1 |Control |12 |3 |7.15 |Male |
|1 |Control |12 |4 |7.24 |Male |
|1 |Control |12 |5 |7.10 |Male |
|1 |Control |12 |6 |6.04 |Male |
|1 |Control |12 |7 |6.98 |Male |
|1 |Control |12 |8 |7.05 |Male |
|1 |Control |12 |9 |6.95 |Female |
|… | | | | | |
|11 |Low |16 |132 |5.65 |Male |
|11 |Low |16 |133 |5.78 |Male |
|… | | | | | |
|21 |High |14 |258 |5.09 |Male |
|21 |High |14 |259 |5.57 |Male |
Data Exploration
We first examine the data using descriptive statistics and take a graphical look at the data using Boxplots.
data ratpup;
infile "rat_pup.dat" firstobs=2 dlm="09"X;
input pup_id weight sex $litter litsize treatment $;
if treatment = "High" then treat = 1;
if treatment = "Low" then treat = 2;
if treatment = "Control" then treat = 3;
run;
proc format;
value trtfmt 1 = "High"
2 = "Low"
3 = "Control";
run;
In the data step we set up the new variable, Treat, so that the highest value (=3) is coded for Control, so that Control will be the “reference level” for Treat.
title "Summary statistics for weight by treatment and sex";
proc means data=ratpup maxdec=2;
class treat sex;
var weight;
format treat trtfmt.;
run;
|Analysis Variable : weight |
|Treat |Sex |N Obs |N |Mean |Std Dev |Minimum |Maximum |
| |Male |33 |33 |5.92 |0.69 |5.01 |7.70 |
|Low |Female |65 |65 |5.84 |0.45 |4.75 |7.73 |
| |Male |61 |61 |6.03 |0.38 |5.25 |7.13 |
|Control |Female |54 |54 |6.12 |0.69 |3.68 |7.57 |
| |Male |77 |77 |6.47 |0.75 |4.57 |8.33 |
There is a tendency for the mean Weight to be lower for the High and Low treatment groups, compared to the Control. We also see that the mean for males is somewhat higher than for females within each level of Treat.
We now look at boxplots of Weight for each combination of treatment and sex.
[pic]
We now graphically examine the relationship between Litter Size and Weight within each Treatment. We first create a new variable called RANKLIT that ranks the litters by size.
/*Create Ranklit*/
proc sort data=ratpup;
by litsize litter;
run;
data ratpup2;
set ratpup;
by litsize litter;
if first.litter then ranklit+1;
label ranklit="New Litter ID (Ordered by Size)";
run;
There appears to be a relationship between litter size and birthweight, so that larger litters tend to have smaller pups, within each treatment and sex, except for Low Males.
[pic]
[pic]
Models
We consider several competing models for the Rat Pup Data. We use a top-down model fitting strategy, where we first set up a “loaded” model that includes all the candidate fixed effects, then we decide on an appropriate covariance structure for the random effects, and finally we reduce the fixed effects portion of the model to arrive at our final model.
The table below shows a summary of each of the models we fit. You can use the model numbers to keep track of what we're doing as we move through the examples.
| | |General |Model |
| |Term / Variable |Notation | |
| | |
|Label |Null |Alternative |
| |(H0) |(HA) |
|1 |1 |14 |
|1 |2 |17 |
|1 |3 |21 |
|2 |1 |16 |
|2 |2 |16 |
|2 |3 |19 |
|… | | |
Note that there are three records per subject, corresponding to the three time points of data collection, and that the response variable has values which vary over time. There is also an explicit variable indicating the time points of the repeated measures.
There may be differing numbers of observations for each subject due to missing data in a repeated measures study, or perhaps due to loss to follow-up in a longitudinal study. Missing data is not a problem in a LMM setting, as long as the data can be considered to be MAR, i.e., Missing at Random.
If you have a data set that has a “horizontal” structure, in which the repeated measurements on the same subject are contained in different variables across the same row, you will need to restructure the data. This can be done using a data step, as shown in the example below:
The SAS commands below rearrange the Rat Brain data set so it is in long form appropriate for analysis using SAS Proc Mixed.
data ratbrain(keep=animal treat region activate);
set ratbrain_wide;
array origvar (2,3) Basal_BST Basal_LS Basal_VDB
Carb_BST Carb_LS Carb_VDB;
do treatment = 1 to 2;
do region = 1 to 3;
activate = origvar(treat,region);
output;
end;
end;
run;
The printout of the Rat Brain data set in the long form shows that we now have 6 observations for each rat, one for each combination of the two treatments (Basal and Carbachol), and three brain regions (BST, LS, and VDB). The dependent variable, ACTIVATE, is now present once on each row of data.
Long Data Set
Obs animal treat region activate
1 R111097 1 1 366.19
2 R111097 1 2 199.31
3 R111097 1 3 187.11
4 R111097 2 1 371.71
5 R111097 2 2 302.02
6 R111097 2 3 449.70
7 R111397 1 1 375.58
8 R111397 1 2 204.85
9 R111397 1 3 179.38
10 R111397 2 1 492.58
11 R111397 2 2 355.74
12 R111397 2 3 459.58
13 R100797 1 1 458.16
14 R100797 1 2 245.04
15 R100797 1 3 237.42
16 R100797 2 1 664.72
17 R100797 2 2 587.10
18 R100797 2 3 726.96
19 R100997 1 1 479.81
20 R100997 1 2 261.19
21 R100997 1 3 195.51
22 R100997 2 1 515.29
23 R100997 2 2 437.56
24 R100997 2 3 604.29
25 R110597 1 1 462.79
26 R110597 1 2 278.33
27 R110597 1 3 262.05
28 R110597 2 1 589.25
29 R110597 2 2 493.93
30 R110597 2 3 621.07
Analysis of the Rat Brain Data Using Proc Mixed
We will first examine using different ways to choose covariance structures for a simple repeated measures design. For the rat brain data, nucleotide activation was measured in three brain regions, with two treatments (Carbachol and Basal).
Analysis comparing regions for Basal (control) treatment using Proc Mixed
We begin the analysis using Proc Mixed by comparing a set of different models, using a marginal model approach (with no random effects), and different covariance structures for R. In the first model, we use an UN structured covariance structure for R. Note that because we are fitting a marginal model, there is no random statement.
title "Repeated Measures Anova Using Proc Mixed";
title2 "Compare Region for Basal Only";
proc mixed data=ratbrain covtest;
where treat = 1;
class region animal;
model activate=region;
repeated /subject=animal type=un r rcorr;
run;
Repeated Measures Anova Using Proc Mixed
Compare Region for Basal Only
The Mixed Procedure
Model Information
Data Set WORK.RATBRAIN
Dependent Variable activate
Covariance Structure Unstructured
Subject Effect animal
Estimation Method REML
Residual Variance Method None
Fixed Effects SE Method Model-Based
Degrees of Freedom Method Between-Within
Class Level Information
Class Levels Values
region 3 1 2 3
animal 5 R100797 R100997 R110597
R111097 R111397
Dimensions
Covariance Parameters 6
Columns in X 4
Columns in Z 0
Subjects 5
Max Obs Per Subject 3
Number of Observations
Number of Observations Read 15
Number of Observations Used 15
Number of Observations Not Used 0
Iteration History
Iteration Evaluations -2 Res Log Like Criterion
0 1 128.65412382
1 1 114.63447657 0.00000000
Convergence criteria met.
Estimated R Matrix for animal R100797
Row Col1 Col2 Col3
1 2842.82 1736.67 1225.30
2 1736.67 1202.34 964.95
3 1225.30 964.95 1276.56
Estimated R Correlation Matrix
for animal R100797
Row Col1 Col2 Col3
1 1.0000 0.9394 0.6432
2 0.9394 1.0000 0.7789
3 0.6432 0.7789 1.0000
Covariance Parameter Estimates
Standard Z
Cov Parm Subject Estimate Error Value Pr Z
UN(1,1) animal 2842.82 2010.18 1.41 0.0786
UN(2,1) animal 1736.67 1268.27 1.37 0.1709
UN(2,2) animal 1202.34 850.18 1.41 0.0786
UN(3,1) animal 1225.30 1132.52 1.08 0.2793
UN(3,2) animal 964.95 785.17 1.23 0.2191
UN(3,3) animal 1276.56 902.66 1.41 0.0786
Fit Statistics
-2 Res Log Likelihood 114.6
AIC (smaller is better) 126.6
AICC (smaller is better) 143.4
BIC (smaller is better) 124.3
Null Model Likelihood Ratio Test
DF Chi-Square Pr > ChiSq
5 14.02 0.0155
Solution for Fixed Effects
Standard
Effect region Estimate Error DF t Value Pr > |t|
Intercept 212.29 15.9785 4 13.29 0.0002
region 1 216.21 18.2690 4 11.83 0.0003
region 2 25.4500 10.4786 4 2.43 0.0721
region 3 0 . . . .
Type 3 Tests of Fixed Effects
Num Den
Effect DF DF F Value Pr > F
region 2 4 185.13 0.0001
We now change the model, using Type=CS for the structure of the R matrix.
repeated region / subject = animal type=cs r rcorr;
Partial output for this model is shown below. The AIC is a bit smaller, BIC is basically unchanged.
Estimated R Correlation Matrix
for animal R100797
Row Col1 Col2 Col3
1 1.0000 0.7379 0.7379
2 0.7379 1.0000 0.7379
3 0.7379 0.7379 1.0000
Covariance Parameter Estimates
Standard Z
Cov Parm Subject Estimate Error Value Pr Z
CS animal 1308.97 1038.07 1.26 0.2073
Residual 464.93 232.47 2.00 0.0228
Fit Statistics
-2 Res Log Likelihood 121.6
AIC (smaller is better) 125.6
AICC (smaller is better) 126.9
BIC (smaller is better) 124.8
We could also consider other structures, such as Compound symmetry heterogeneous (type=CSH), which is not shown here.
Two Repeated Factors (Treat and Region) Analysis Using Proc GLM
Fitting models with more than one repeated measures factor is not as simple as when there is only a single repeated measures factor.
We begin this portion of the analysis by fitting a traditional repeated measures ANOVA model, with two within-subjects factors, for the ratbrain dataset in Wide Form, using Proc GLM.
The Proc GLM syntax sets up the analysis as a multivariate dependent variable problem. We let SAS know that there are two repeated measures factors, Treat and Region, and we specify their levels, in the repeated statement.
title "Proc GLM Repeated Measures ANOVA";
proc glm data=ratbrain_wide;
model Carb_BST Carb_LS Carb_VDB
Basal_BST Basal_LS Basal_VDB = / nouni;
repeated treat 2, region 3 / summary;
run; quit;
The Proc GLM output is shown below. Notice that SAS first fits a MANOVA (multivariate ANOVA) model, which is roughly equivalent to fitting a repeated measures model using Proc Mixed, with an Unstructured R matrix. We will look at the Wilk's Lambda statistics for the multivariate analysis results.
Proc GLM Repeated Measures ANOVA
The GLM Procedure
Number of Observations Read 5
Number of Observations Used 5
Repeated Measures Level Information
Dependent Variable Carb_BST Carb_LS Carb_VDB Basal_BST Basal_LS Basal_VDB
Level of treat 1 1 1 2 2 2
Level of region 1 2 3 1 2 3
MANOVA Test Criteria and Exact F Statistics
for the Hypothesis of no treat Effect
H = Type III SSCP Matrix for treat
E = Error SSCP Matrix
S=1 M=-0.5 N=1
Statistic Value F Value Num DF Den DF Pr > F
Wilks' Lambda 0.10128044 35.49 1 4 0.0040
Pillai's Trace 0.89871956 35.49 1 4 0.0040
Hotelling-Lawley Trace 8.87357462 35.49 1 4 0.0040
Roy's Greatest Root 8.87357462 35.49 1 4 0.0040
MANOVA Test Criteria and Exact F Statistics
for the Hypothesis of no region Effect
H = Type III SSCP Matrix for region
E = Error SSCP Matrix
S=1 M=0 N=0.5
Statistic Value F Value Num DF Den DF Pr > F
Wilks' Lambda 0.00180922 827.58 2 3 ................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
Related searches
- introduction to financial management pdf
- letter of introduction sample
- argumentative essay introduction examples
- how to start an essay introduction examples
- introduction to finance
- introduction to philosophy textbook
- introduction to philosophy pdf download
- introduction to philosophy ebook
- introduction to marketing student notes
- introduction to marketing notes
- introduction to information systems pdf
- introduction paragraph examples for essays