Computer Lab 4: Useful Graphics for Multilevel Modeling



Computer Lab 4: Useful Graphics for Multilevel ModelingC..J. AndersonDocument Modified March/21/2020Table of ContentsTOC \o "1-3" \h \z \u1 Set Up PAGEREF _Toc35437835 \h 31.1. The following packages will be used PAGEREF _Toc35437836 \h 31.2 Read in the data. PAGEREF _Toc35437837 \h 31.3 Create some variables by re-coding PAGEREF _Toc35437838 \h 41.3.1 Dummy code gender: PAGEREF _Toc35437839 \h 41.3.2 Dummy code grade: PAGEREF _Toc35437840 \h 42.3.3 Center math around school mean math PAGEREF _Toc35437841 \h 41.3.4 Make id a factor (not really neccesary) PAGEREF _Toc35437842 \h 41.3.5 Put some variables on same scale (and check), so we’ll just do it now. PAGEREF _Toc35437843 \h 42. Draw random sample of data PAGEREF _Toc35437844 \h 53. Lattice Plot PAGEREF _Toc35437845 \h 5(a) With points for students and linear regression for each school PAGEREF _Toc35437846 \h 5(b) Smooth PAGEREF _Toc35437847 \h 64 Produce lattice (xyplots) plots for Hours TV PAGEREF _Toc35437848 \h 6(a) PAGEREF _Toc35437849 \h 6(b) Smooth regressions PAGEREF _Toc35437850 \h 65 Produce lattice (xyplots) plots for Hours Computer Games PAGEREF _Toc35437851 \h 7(a) PAGEREF _Toc35437852 \h 7(b) Smooth (loess) regressions PAGEREF _Toc35437853 \h 76 Regressions all schools in the same plot: PAGEREF _Toc35437854 \h 7(a) PAGEREF _Toc35437855 \h 7(b) Repeat part (a) but use the random sub-sample. PAGEREF _Toc35437856 \h 87 Plot mean science by hours watching TV PAGEREF _Toc35437857 \h 88 Plot mean science by hours playing computer games PAGEREF _Toc35437858 \h 99 Overlay school specific regressions of science on hours TV PAGEREF _Toc35437859 \h 9(a) Linear `regression PAGEREF _Toc35437860 \h 9(b) For smooth curves change the method PAGEREF _Toc35437861 \h 910 Overlay school specific regressions of science on hours play computer games PAGEREF _Toc35437862 \h 911 Plot science by type of community PAGEREF _Toc35437863 \h 10(a) Using linear regression PAGEREF _Toc35437864 \h 10(b) Using Smooth curves PAGEREF _Toc35437865 \h 10(c) Check frequencies PAGEREF _Toc35437866 \h 1112 R^2 and Meta R^2: Simple model PAGEREF _Toc35437867 \h 1213 R^2 and Meta R^2: More complex model PAGEREF _Toc35437868 \h 1314 R^2 and meta R^2 for categorical TV and computer games PAGEREF _Toc35437869 \h 1315 Comparision PAGEREF _Toc35437870 \h 1416 Easiest regression diagnostic to obtain PAGEREF _Toc35437871 \h 15(a) Plot of Residuals PAGEREF _Toc35437872 \h 15(b) Use HLMdiag to get conditional residuals PAGEREF _Toc35437873 \h 1517 A panel of diagnostic plots PAGEREF _Toc35437874 \h 1618 Influence diagnostics using HLMdiag functions PAGEREF _Toc35437875 \h 18(a) Cook’s distances PAGEREF _Toc35437876 \h 18(b) Mdfits PAGEREF _Toc35437877 \h 1819 Not required for Homework but informative PAGEREF _Toc35437878 \h 19Zeta plot PAGEREF _Toc35437879 \h 19Panel PAGEREF _Toc35437880 \h 2020 Not required Profile confidence intervals PAGEREF _Toc35437881 \h 2121 Not required: Plot EBLUPs PAGEREF _Toc35437882 \h 21step 1 extract random effects PAGEREF _Toc35437883 \h 21step 2 put in separate objects PAGEREF _Toc35437884 \h 21step 3 sort by ranU which is “condval” PAGEREF _Toc35437885 \h 21step 4 Add intergers for schools so that I can plot in order PAGEREF _Toc35437886 \h 21setp 5 Draw graph for U0j PAGEREF _Toc35437887 \h 22Now for U1j: PAGEREF _Toc35437888 \h 22QQplots of EBLUPs PAGEREF _Toc35437889 \h 23Graphics are powerful tools; however, producing them can be challenging. This document is a semi-tutorial that will guide you through the steps to create graphics particularly useful in multilevel modeling. I have set up this document such that if in the future you want to produce a particular graphic, you can find an example and then copy the code (and modify to suit your analysis).In this computer lab, you will produce various graphics designed for exploaratory analysis and some diagnostic plots. I assume that you already know basic R.1 Set Up1.1. The following packages will be usedThe use of each package in this document is given next to the package name. If you need additonal information on these, check online for package documentation.library(lme4) # Fitting HLM modelslibrary(lmerTest) # Sattherwait df and t-testslibrary(lattice) # Lattice/pannel plotslibrary(ggplot2) # Some nice grahicslibrary(stringi) # install before HLMdiag or HLMdiag won't load properlylibrary(HLMdiag) # Some diagnostic graphics for hlmlibrary(texreg) # Nice tabular outputlibrary(optimx) # If need to change optimizer1.2 Read in the data.Note that the data is basically the same as lab 3, but all variables are present in this data set. You need to change the path to where your data live.lab4<-read.table("D:/Dropbox/edps587/Lab_and_homework/Computer lab4/lab4_data.txt",header=TRUE)Check the datanames(lab4)## [1] "idschool" "idstudent" "grade" ## [4] "gender" "science" "math" ## [7] "genshortages" "hoursTV" "hourscomputergames"## [10] "typecommunity" "shortages" "isolate" ## [13] "rural" "suburban" "urban" ## [16] "short0" "short1" "short2" ## [19] "short3"head(lab4)## idschool idstudent grade gender science math genshortages hoursTV## 1 10 100101 3 girl 146.7 137.0 a_little 3## 2 10 100103 3 girl 148.8 145.3 a_little 2## 3 10 100107 3 girl 150.0 152.3 a_little 4## 4 10 100108 3 girl 146.9 144.3 a_little 3## 5 10 100109 3 boy 144.3 140.3 a_little 3## 6 10 100110 3 boy 156.5 159.2 a_little 2## hourscomputergames typecommunity shortages isolate rural suburban urban## 1 1 2 1 0 1 0 0## 2 3 2 1 0 1 0 0## 3 2 2 1 0 1 0 0## 4 1 2 1 0 1 0 0## 5 1 2 1 0 1 0 0## 6 1 2 1 0 1 0 0## short0 short1 short2 short3## 1 0 1 0 0## 2 0 1 0 0## 3 0 1 0 0## 4 0 1 0 0## 5 0 1 0 0## 6 0 1 0 01.3 Create some variables by re-coding1.3.1 Dummy code gender:lab4$boy = ifelse(lab4$gender=="boy", 1, 0) lab4$boy[is.na(lab4$boy)] <- 11.3.2 Dummy code grade:lab4$third <- ifelse(lab4$grade==3, 1, 0)2.3.3 Center math around school mean mathschool.mean <- as.data.frame(aggregate(math~idschool, data=lab4, "mean"))names(school.mean) <- c('idschool', 'school.mean')lab4 <- merge(lab4,school.mean, by=c('idschool'))lab4$centered.math <- lab4$math - lab4$school.mean1.3.4 Make id a factor (not really neccesary)lab4$idschool<-as.factor(lab4$idschool)1.3.5 Put some variables on same scale (and check), so we’ll just do it now.lab4$centered.math <- lab4$centered.math/sd(lab4$centered.math)lab4$school.mean <- lab4$school.mean/sd(lab4$school.mean)sd(lab4$centered.math)## [1] 1sd(lab4$school.mean)## [1] 1We are done with all the set up2. Draw random sample of dataSince this is a very large data set, it is sometimes nice to have a smaller random sample to use. The following code will do this. The 1st line determines which schools are randomly sampled (without replacement), and the 2nd line extracts the data for the randomly sampled school.groups <- unique(lab4$idschool)[sample(1:145,25)]subset <- lab4[lab4$idschool%in%groups,]3. Lattice PlotThe first graph is a lattice plot for looking at students within schools. We’ll Use the random sample and plot science by math scores (otherwise we’ld have 146 plots).(a) With points for students and linear regression for each schoolThe basic command here is xyplot. The plot will have both points (‘p’) and linear regression lines (‘r’). I have also used the layout command below to specfiy that the panel should have 5 rows and 5 columns of plots. It sometimes will default to odd arrangements.xyplot(science ~ math | idschool, data=subset, col.line=c('black','blue'), type=c('p','r'), main='Varability in Science ~ math relationship', xlab="Math Scores", ylab="Science Scores", layout=c(5,5) )A lattice plot where points are joined is illustrated in the lecture notes on longitudinal data. You do not need to do this, because it would be very hard to see what’s going on. To join points, use the option ‘l’ (“ell”) for line; that is, replace the type command by type=c('p','l'), (b) SmoothDo the same plot as in part (a), but use smooth regression rather than linear regression line. To do this change the type of plot totype=c('p','smooth')4 Produce lattice (xyplots) plots for Hours TVUsing the sub-sample of the data, draw lattice plot of science by hours of watching TV (as numeric variable). You can copy and edit the example given in part 3. Do this for the following 2 cases(a)Points and Linear regression lines(b) Smooth regressionsPoints and smooth lines5 Produce lattice (xyplots) plots for Hours Computer GamesUsing the sub-sample of the data, draw lattice plot of science by hours of of playing computer games (as numeric variable). You can copy and edit the example given in part 4. Do this for the following 2 cases(a)Points and Linear regression lines(b) Smooth (loess) regressionsPoints and smooth lines6 Regressions all schools in the same plot:There are (at least) 2 ways of doing this. An easy way and a bit harder way. I prefer the look of the hard way. The plots produced by the harder way are in most of the lectures and the R code is on the course web-site that go with the lectures. The easier way is below.Using ggplot(a)ggplot(lab4, aes(x=math, y=science, col=idschool,type='l'), main='Regressions all in one plot') + geom_smooth(method="lm", se=FALSE ) + theme(legend.position="none")(b) Repeat part (a) but use the random sub-sample.Perhaps this it is more informative using less data; that is, easier to see what’s going on.7 Plot mean science by hours watching TVFirst compute the means(y <- aggregate(science~hoursTV, data=lab4, "mean"))## hoursTV science## 1 1 147.8550## 2 2 150.4784## 3 3 151.9487## 4 4 152.0076## 5 5 147.2271and then plot Means by mean for hours watching TV,plot(y[,1], y[,2], type='b', ylab='Mean Science', xlab='Hours Watching TV', main='Marginal of Science x HoursTV')8 Plot mean science by hours playing computer gamesYou can revise the code from item 7 to do this.9 Overlay school specific regressions of science on hours TVYou will do this using linear regression and then smooth curves(a) Linear `regressionggplot(lab4, aes(x=hoursTV, y=science, col=idschool,type='l')) + geom_smooth(method="lm", se=FALSE ) + theme(legend.position="none")(b) For smooth curves change the methodgeom_smooth(method="loess", se=FALSE)10 Overlay school specific regressions of science on hours play computer gamesRepeat item 9 parts (a) and (b), except use hours play computer games.11 Plot science by type of communitySince type of community is a nominal (discrete) varaible, we will add it to graphs by putting in distinct line for each community.(a) Using linear regressionFrist change type of community to a factor and the graph.lab4$community <- as.factor(lab4$typecommunity)ggplot(lab4, aes(x=math, y=science, col=community, type='l')) + geom_smooth(method='lm', se=FALSE)(b) Using Smooth curvesFor smooth curves change the “geom_smooth” option is indicated below.ggplot(lab4, aes(x=math, y=science, col=community, type='l')) + geom_smooth(method='loess', se=FALSE)(c) Check frequenciesIt may be the case that some communities are very small so when viewing graphs we may want to place less weight on them.table(lab4$typecommunity)## ## 1 2 3 4 ## 70 1042 2483 3502Recall that “1” is isolate, “2” is rural, “3” is suburban, and “4” is urban.An even better check, “discretize” science score and create 10 groups as followslab4$group <- cut(lab4$science,10)table(lab4$group,lab4$typecommunity)## ## 1 2 3 4## (103,112] 1 0 1 2## (112,120] 0 1 2 5## (120,128] 1 5 17 39## (128,136] 1 47 109 261## (136,144] 5 161 390 708## (144,152] 11 338 817 1134## (152,161] 35 351 770 939## (161,169] 14 117 289 315## (169,177] 1 11 46 52## (177,185] 1 11 42 47This table indicates there are certian combinations of communities and science that have no data and very little.The same could be done for math.12 R^2 and Meta R^2: Simple modelWe will use a very simple model to start with: science~math.We need some basic information about the data and to set up objects to hold the resultsN <- length(unique(lab4$idschool)) # Number of schoolsnj <- table(lab4$idschool) # number per schoolnj <- as.data.frame(nj)names(nj) <- c("idschool","nj")ssmodel <- (0) # initialize SS modelsstotal <- (0) # initialize SS totalR2 <- matrix(99,nrow=N,ncol=2) # for saving R2sWe can now fit regressions for each school and save the R^2, sums of squares for model and total (corrected for total) for each school.index <-1 # initializefor (i in (unique(lab4$idschool))) { sub <- lab4[ which(lab4$idschool==i),] # data for one school model0 <- lm(science ~ math, data=sub) # fit the model a <- anova(model0) # save anova table ssmodel <- ssmodel + a[1,2] # add to SS model sstotal <- sstotal + sum(a[,2]) # sum of all SS # save the R2 R2[index,1:2] <- as.numeric(cbind(i,summary(model0)$r.squared)) index <- index+1 # up-date index }We also need to compute meta R^2 and do a bit of clean up before graphing the regsultsR2meta <- ssmodel/sstotalR2 <- as.data.frame(R2)names(R2) <- c("idschool","R2")R2.mod1 <- merge(R2,nj,by=c("idschool"))Notice that the R^2 is call “R2.mod1”. For different models, I will want a different R object name that holds the R^2 for each school.And at last a graph of R^2 by sample size…plot(R2.mod1$nj,R2.mod1$R2, type='p', ylim=c(0,1), ylab="R squared", xlab="n_j (number of observations per school)", main="Science ~ math") abline(h=R2meta,col='blue') # adds line for R2metatext(70,0.95,'R2meta=.36',col="blue") # text of R2meta 13 R^2 and Meta R^2: More complex modelCompute and plot R-squares and R^2 meta from the regression of science on math, hours watching TV, and hours playing computer games. Treate both hours watching TV and hours playing computer games as NUMERIC variables. So that we can later compare models, call the R^2 for this modelR2.mod214 R^2 and meta R^2 for categorical TV and computer gamesCompute and plot R-squares and R^2 meta from the regressions of science on math, hours watching TV, and hours playing computer games. Treate hours watching TV and hours playing comptuer games are both CATEGORICAL variables. So that we can compare R^2s from various models, call them for this modelR2.mod3## Warning in anova.lm(model0): ANOVA F-tests on an essentially perfect fit## are unreliable15 ComparisionTo compare the R-squares from the models from item 13 (numeric hours) versus those from item 14 (categorical hours), plot the R-squares against each other and draw in reference line.plot(R2.mod2$R2,R2.mod3$R2, cex=1.4, ylim=c(0,1), xlim=c(0,1), ylab='Hours as Discrete', xlab='Hours as Numeric', main='Hours Discrete vs Numeric')abline(a=0,b=1, col='blue') # adds a reference line16 Easiest regression diagnostic to obtainWe need a model to work with. We’ll use the as our model where both hours TV and hours computer games are treated as factors (categorical variables), which is defintely warented based on pervious graphs.modelxx <- lmer(science ~ 1 + centered.math + school.mean + hrTV + hrCG + (1 + centered.math|idschool), data=lab4, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead")) (a) Plot of ResidualsThis is a plot of standardized (Pearson) residuals by the fitted contional values of science)plot(modelxx, xlab='Fitted Conditional', ylab='Pearson Residuals')Why do you suppose there is straight line of points in the graph?(b) Use HLMdiag to get conditional residualsThe conditional residuals are computed using the fixed plus estimated random effects. We’ll look at these using a dotplot.# Get y-(X*gamma+Z*U) where U estimated by Empricial Bayesres1 <- HLMresid(modelxx, level=1, type="EB", standardize=TRUE)head(res1) # look at what is here## 1 2 3 4 5 6 ## 3.3748251 1.0883016 -2.2698068 -0.4434138 -0.8822511 1.3532460Now that we have residuals we want, let’s take a look at themdotplot(ranef(modelxx,condVar=TRUE), lattice.options=list(layout=c(1,2))) ## $idschool17 A panel of diagnostic plots(similar to what SAS does):# Want 4 plots in 2x2 array on pagepar(mfrow=c(2,2))# Graph 1: scatter plot res1 <- HLMresid(modelxx, level=1, type="EB", standardize=TRUE)fit <- fitted(modelxx) # get conditional fitted valuesplot(fit,res1, xlab='Conditional Fitted Values', ylab='Pearson Std Residuals', main='Conditional Residuals')# Graph 2: QQ plot and areference lineqqnorm(res1) # draws plotabline(a=0,b=9, col='blue') # reference line# Graph 3: Historgram with overlay normalh<- hist(res1,breaks=15,density=20) # draw historgramxfit <- seq(-40, 40, length=50) # sets range & number quantilesyfit <- dnorm(xfit, mean=0, sd=7.177) # should be normalyfit <- yfit*diff(h$mids[1:2])*length(res1) # use mid-points lines(xfit, yfit, col='darkblue', lwd=2) # draws normal# Graph 4: Information about the modelplot.new( ) # a plot with nothing in it.text(.5,0.8,'Modelxx') # potentially useful texttext(.5,0.5,'Deviance=48,356.5')text(.5,0.2,'AIC=48,386.5')18 Influence diagnostics using HLMdiag functions(a) Cook’s distancescook <- cooks.distance(modelxx, group="idschool")dotplot_diag(x=cook, cutoff="internal", name="cooks.distance", ylab=("Cook's distance"), xlab=("School"))(b) Mdfitsmdfit <- mdffits(modelxx,group="idschool")dotplot_diag(x=mdfit, cutoff="internal", name="mdffits", ylab=("MDFIts"), xlab=("School"))19 Not required for Homework but informativeFor more on the interpretation of these plots, see the lme4 documentation or better yet Bates et al.?(2105). Fitting linear mixed effects models using lme4. Journal of Statistical software.Zeta plotThe blue lines in this and the next plot should be straight lines.p1 <- profile(lmer(science ~ 1 + centered.math + school.mean +hrTV + hrCG + ( 1 + centered.math|idschool), lab4, REML=FALSE))## Warning in FUN(X[[i]], ...): non-monotonic profile for .sig02xyplot(p1 , aspect =0.7) (p1 , aspect =0.7, absVal=TRUE)20 Not required Profile confidence intervalsHmmm, is there a problem here?confint(p1, level=.99)## Warning in confint.thpr(p1, level = 0.99): bad spline fit for .sig02:## falling back to linear interpolation## 0.5 % 99.5 %## .sig01 1.53387241 2.3344245## .sig02 -0.05128605 0.8739596## .sig03 0.46712430 1.1273210## .sigma 7.00835180 7.3246400## (Intercept) -8.49642794 26.4606505## centered.math 4.75133021 5.3363461## school.mean 3.71846419 4.7720857## hrTV2 -0.62622860 1.2220820## hrTV3 -0.36287584 1.4585286## hrTV4 -0.11074220 1.9299122## hrTV5 -1.20189341 0.7586826## hrCG2 -0.31731560 0.7393753## hrCG3 -0.59903847 0.7817812## hrCG4 -1.40183779 0.7543930## hrCG5 -2.42348047 -0.437749021 Not required: Plot EBLUPsstep 1 extract random effectsrequire(lme4)modelxx <- lmer(science ~ 1 + centered.math + school.mean + hrTV + hrCG + (1 + centered.math|idschool), data=lab4, REML=FALSE, control = lmerControl(optimizer ="Nelder_Mead")) ranU.mer <- ranef(modelxx)ranU <- as.data.frame(ranU.mer)step 2 put in separate objectsU0j <- ranU[which(ranU$term=="(Intercept)"),]U1j <- ranU[which(ranU$term=="centered.math"),]step 3 sort by ranU which is “condval”U0j <-U0j[order(U0j$condval),]U1j <-U1j[order(U1j$condval),]step 4 Add intergers for schools so that I can plot in orderU0j$xgrp <- seq(1:146)U1j$xgrp <- seq(1:146)setp 5 Draw graph for U0jFor sd errors, drew arrows but with no arrowhead…A hack solutionplot(U0j$xgrp,U0j$condval,type="p", main="Final Model U0j +/- 1 std error", xlab="Schools (sort by Uo)", ylab="EBLUP of U0j", col="blue")arrows(U0j$xgrp, U0j$condval-U0j$condsd, U0j$xgrp, U0j$condval+U0j$condsd, length=0.00)Now for U1j:For sd errors, drew arrows but with no arrowhead…A hack solutionplot(U1j$xgrp,U1j$condval,type="p", main="Final Model U1j +/- 1 std error", xlab="Schools (sort by U1j)", ylab="EBLUP of U1j", col="blue")arrows(U1j$xgrp, U1j$condval-U1j$condsd, U1j$xgrp, U1j$condval+U1j$condsd, length=0.00)QQplots of EBLUPsqqdata <- qqnorm(U0j$condval)plot(qqdata$x,qqdata$y,type="p",pch=19,col="red",main="Normal QQ plot of U0j with 95% Confidence Bands", xlab="Theoretical Value", ylab="Estimated U0j (EBLUPS)", ylim=c(-5,5))arrows(qqdata$x, qqdata$y - 1.96*U0j$condsd, qqdata$x, qqdata$y + 1.96*U0j$condsd, length=0.00, col="gray")qqline(U0j$condval,col="steelblue")And the random slops for centered.math:qqdata <- qqnorm(U1j$condval)plot(qqdata$x,qqdata$y,type="p",pch=19,col="red",main="Normal QQ plot of U1j with 95% Confidence Bands", xlab="Theoretical Value", ylab="Estimated Uuj (EBLUPS)", ylim=c(-5,5))arrows(qqdata$x, qqdata$y - 1.96*U1j$condsd, qqdata$x, qqdata$y + 1.96*U1j$condsd, length=0.00, col="gray")qqline(U1j$condval,col="steelblue") ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download