The ANOVA - UC Davis Plants



R Notes 2011 LAB 5 Topics covered: General overviewLog transformationPower transformation> setwd("G:/Courses/A205/R/Lab5")Transformation in RTrans=counts^3 #(SAS: Counts**3; Raises Counts to the power of 3)Trans=counts^1/9 #(SAS: Counts**(1/9); takes the ninth root of Counts)Trans=log(counts) #(SAS: Log(Counts); natural logarithm of Counts)Trans=Log10(Counts) #(SAS: Log10(Counts); base-10 logarithm of Counts)Trans=sin(counts) #(SAS: Sin(Counts); Calculates the sine of Counts)Trans =asin #(SAS Arsin(Counts); Calculates the inverse sine)# Etc…Log transformation# In this experiment, the effect of vitamin supplements on weight gain is# being investigated in three animal species (mice, chickens, and sheep). # The experiment is designed as an RCBD with one replication (i.e. animal)# per block*treatment combination. The six treatment levels are MC (mouse# control), MV (mouse + vitamin), CC (chicken control), CV (chicken +# vitamin), SC (sheep control), and SV (sheep + vitamin). The response# variable is the weight of the animal at the end of the experiment. > lab5a<-read.table("Lab5a.txt", header=T)> head(lab5a, 3) Trtmt Block Weight1 MC 1 0.182 MV 1 0.323 CC 1 2.00> str(lab5a)'data.frame':24 obs. of 3 variables: $ Trtmt : Factor w/ 6 levels "CC","CV","MC",..: 3 4 1 2 5 6 3 4 1 2 ... $ Block : int 1 1 1 1 1 1 2 2 2 2 ... $ Weight: num 0.18 0.32 2 2.5 108 127 0.3 0.4 3 3.3 ...> lab5a$Block<-as.factor(lab5a$Block)> str(lab5a)'data.frame':24 obs. of 3 variables: $ Trtmt : Factor w/ 6 levels "CC","CV","MC",..: 3 4 1 2 5 6 3 4 1 2 ... $ Block : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 2 2 2 2 ... $ Weight: num 0.18 0.32 2 2.5 108 127 0.3 0.4 3 3.3 ...The ANOVA> anova(lm(Weight~Trtmt+Block, lab5a))Analysis of Variance TableResponse: Weight Df Sum Sq Mean Sq F value Pr(>F) Trtmt 5 108714 21742.7 174.4326 9.768e-13 ***Block 3 984 328.0 2.6314 0.08807 . Residuals 15 1870 124.6 ---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Test for normality of residuals> shapiro.test(resid(lm(Weight~Trtmt+Block, lab5a)))Shapiro-Wilk normality testdata: resid(lm(Weight ~ Trtmt + Block, lab5a)) W = 0.9536, p-value = 0.3236Test for homogeneity of variance among treatments> anova(lm((resid(lm(Weight~Trtmt, lab5a))^2)~Trtmt, lab5a))Analysis of Variance TableResponse: (resid(lm(Weight ~ Trtmt, lab5a))^2) Df Sum Sq Mean Sq F value Pr(>F) Trtmt 5 699888 139978 2.5063 0.06859 .Residuals 18 1005322 55851 ---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 # Levene's Test is NS, but one can clearly see that it is borderline.# The res vs. pred plot will illustrate this.Test for nonadditivity> anova(lm(Weight~Trtmt+Block+pred2, lab5a))Analysis of Variance TableResponse: Weight Df Sum Sq Mean Sq F value Pr(>F) Trtmt 5 108714 21742.7 6506.417 < 2.2e-16 ***Block 3 984 328.0 98.153 1.222e-09 ***pred2 1 1823 1822.9 545.507 1.301e-12 ***Residuals 14 47 3.3 ---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 SIGNIFICANT NON-ADDITIVE EFFECT! MUST TRANSFORM DATA!Status: We violated our assumption of additivity,and Levene's Test for Treatment is almost significant.> par(mfrow=c(2,2))> plot(lm(Weight~Trtmt+Block, lab5a))# And take a look at the means, standard deviations, and variances:> data.frame(cbind(Mean=with(lab5a, tapply(Weight, Trtmt, mean)), StDev=round(with(lab5a, tapply(Weight, Trtmt, sd)),2), Var=round(with(lab5a, tapply(Weight, Trtmt, var)),2))) Mean StDev VarCC 2.4 0.59 0.35CV 2.9 0.46 0.21MC 0.3 0.11 0.01MV 0.4 0.06 0.00SC 137.0 23.37 546.00SV 151.0 20.12 404.67# Between mice and sheep, the mean increases by a factor of about 400, the# standard deviation increases by a factor of about 270, and the variance# increases by a factor of about 73,000!# The situation we face is this:# 1. Significant Tukey Test for Nonadditivity# 2. The standard deviation scales with the mean# 3. The Res vs. Pred plot is smiling tauntingly at you# The best transformation under these conditions is a LOG transformation> lab5a$Badweight<-lab5a$Weight> lab5a$Weight<-log(lab5a$Badweight,10)> head(lab5a, 4) Trtmt Block Weight Badweight1 MC 1 -0.7447275 0.182 MV 1 -0.4948500 0.323 CC 1 0.3010300 2.004 CV 1 0.3979400 2.50# You can copy and paste the following lines:anova(lm(Weight~Trtmt+Block, lab5a))shapiro.test(resid(lm(Weight~Trtmt+Block, lab5a)))anova(lm((resid(lm(Weight~Trtmt, lab5a))^2)~Trtmt, lab5a))pred2<- predict(lm(Weight~Trtmt+Block, lab5a))^2anova(lm(Weight~Trtmt+Block+pred2, lab5a))par(mfrow=c(2,2))plot(lm(Weight~Trtmt+Block, lab5a))> anova(lm(Weight~Trtmt+Block, lab5a))Analysis of Variance TableResponse: Weight Df Sum Sq Mean Sq F value Pr(>F) Trtmt 5 28.6323 5.7265 1859.571 < 2.2e-16 ***Block 3 0.1205 0.0402 13.043 0.0001861 ***Residuals 15 0.0462 0.0031 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > shapiro.test(resid(lm(Weight~Trtmt+Block, lab5a))) Shapiro-Wilk normality testdata: resid(lm(Weight ~ Trtmt + Block, lab5a)) W = 0.966, p-value = 0.5694> anova(lm((resid(lm(Weight~Trtmt, lab5a))^2)~Trtmt, lab5a))Analysis of Variance TableResponse: (resid(lm(Weight ~ Trtmt, lab5a))^2) Df Sum Sq Mean Sq F value Pr(>F)Trtmt 5 0.00079507 1.5901e-04 1.7765 0.1686Residuals 18 0.00161117 8.9509e-05 > pred2<- predict(lm(Weight~Trtmt+Block, lab5a))^2> anova(lm(Weight~Trtmt+Block+pred2, lab5a))Analysis of Variance TableResponse: Weight Df Sum Sq Mean Sq F value Pr(>F) Trtmt 5 28.6323 5.7265 1950.9256 < 2.2e-16 ***Block 3 0.1205 0.0402 13.6838 0.0001906 ***pred2 1 0.0051 0.0051 1.7369 0.2086908 Residuals 14 0.0411 0.0029 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > par(mfrow=c(2,2))> plot(lm(Weight~Trtmt+Block, lab5a))# At this point then, you may make conclusions about differences among# treatments, etc. But be careful how you state your conclusions because you# are making them based on transformed data. It is also customary to use the# detransformed means in your final conclusions.Power transformation# This experiment is a generic CRD with six treatments and five replications# per treatment.> lab5b<-read.table("Lab5b.txt", header=T)> head(lab5b, 3) Trtmt Response1 A 2202 B 963 C 62> str(lab5b)'data.frame':30 obs. of 2 variables: $ Trtmt : Factor w/ 6 levels "A","B","C","D",..: 1 2 3 4 5 6 1 2 3 4 ... $ Response: int 220 96 62 378 197 77 200 213 75 323 ...# Copy and paste:anova(lm(Response~Trtmt, lab5b))shapiro.test(resid(lm(Response~Trtmt, lab5b)))anova(lm((resid(lm(Response~Trtmt, lab5b))^2)~Trtmt, lab5b))par(mfrow=c(2,2))plot(lm(Response~Trtmt, lab5b)) Note: There is no Tukey 1-df Test for Nonadditivity because this is a complete randomized design.The ANOVA> anova(lm(Response~Trtmt, lab5b))Analysis of Variance TableResponse: Response Df Sum Sq Mean Sq F value Pr(>F) Trtmt 5 143273 28654.6 13.437 2.641e-06 ***Residuals 24 51180 2132.5 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Test for normality of residuals> shapiro.test(resid(lm(Response~Trtmt, lab5b)))Shapiro-Wilk normality testdata: resid(lm(Response ~ Trtmt, lab5b)) W = 0.9827, p-value = 0.891Test for homogeneity of variance among treatments> anova(lm((resid(lm(Response~Trtmt, lab5b))^2)~Trtmt, lab5b))Analysis of Variance TableResponse: (resid(lm(Response ~ Trtmt, lab5b))^2) Df Sum Sq Mean Sq F value Pr(>F) Trtmt 5 75259223 15051845 2.8184 0.03857 *Residuals 24 128173145 5340548 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 DANGER DANGER!!! Significant Levene's Test! Must transform data!# The significant Levene's Test is reflected in the Res*Pred plot above. The# funnel shape of the data indicates that the magnitude of the residuals is# increasing as the mean increases. # This is verified by the table of means and standard deviations found below# the Levene’s Test: > lab5b_summarytab<-data.frame(cbind(Mean=with(lab5b, tapply(Response, Trtmt, mean)), StDev=round(with(lab5b, tapply(Response, Trtmt, sd)),2), Var=round(with(lab5b, tapply(Response, Trtmt, var)),2)))> lab5b_summarytab Mean StDev VarA 237.8 48.57 2359.2B 151.2 41.71 1739.7C 82.2 13.50 182.2D 274.2 78.78 6205.7E 153.0 43.16 1862.5F 99.8 21.11 445.7# In this situation, a power transformation will likely restore the data; but# what is the appropriate power to use? There is a slick procedure for# finding this information, and it involves performing a regression of the # logarithms of the variances vs. the logarithms of the means of the original# data. The code:> summary(lm(log(lab5b_summarytab$Var, 10)~log(lab5b_summarytab$Mean, 10)))Call:lm(formula = log(lab5b_summarytab$Var, 10) ~ log(lab5b_summarytab$Mean, 10))Residuals: 1 2 3 4 5 6 -0.22597 0.14940 -0.14727 0.03438 0.16575 0.02371 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.5353 0.8463 -2.996 0.04010 * log(lab5b_summarytab$Mean, 10) 2.5814 0.3864 6.680 0.00261 **---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.1763 on 4 degrees of freedomMultiple R-squared: 0.9177,Adjusted R-squared: 0.8972 F-statistic: 44.63 on 1 and 4 DF, p-value: 0.002610# Locate the slope of the regression. In this case, slope = 2.5814. Now# calculate the appropriate power of the transformation, where Power = 1 –# (b/2):> Power<-1-(2.5814/2)> Power[1] -0.2907# To use this magic number, we transform the Response variable:> lab5b$Badresponse<-lab5b$Response> lab5b$Response<-lab5b$Badresponse^Power> head(lab5b) Trtmt Response Badresponse1 A 0.2084768 2202 B 0.2653101 963 C 0.3012671 624 D 0.1781243 3785 E 0.2152775 1976 F 0.2828767 77Copy and paste:anova(lm(Response~Trtmt, lab5b))shapiro.test(resid(lm(Response~Trtmt, lab5b)))anova(lm((resid(lm(Response~Trtmt, lab5b))^2)~Trtmt, lab5b))par(mfrow=c(2,2))plot(lm(Response~Trtmt, lab5b))The ANOVA> anova(lm(Response~Trtmt, lab5b))Analysis of Variance TableResponse: Response Df Sum Sq Mean Sq F value Pr(>F) Trtmt 5 0.0251167 0.0050233 17.774 2.233e-07 ***Residuals 24 0.0067831 0.0002826 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Test for normality of residuals> shapiro.test(resid(lm(Response~Trtmt, lab5b)))Shapiro-Wilk normality testdata: resid(lm(Response ~ Trtmt, lab5b)) W = 0.9641, p-value = 0.3916Test for homogeneity of variance among treatments> anova(lm((resid(lm(Response~Trtmt, lab5b))^2)~Trtmt, lab5b))Analysis of Variance TableResponse: (resid(lm(Response ~ Trtmt, lab5b))^2) Df Sum Sq Mean Sq F value Pr(>F)Trtmt 5 1.6754e-07 3.3508e-08 0.5105 0.7655Residuals 24 1.5752e-06 6.5631e-08 > par(mfrow=c(2,2))> plot(lm(Response~Trtmt, lab5b)) ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download