Stat 421, Fall 2007
Stat 421, Fall 2008
Fritz Scholz
Homework 7 Solutions
Problem 1: Write a function Satterthwaite(xdat,ydat){....} that takes an x-sample xdat and a y-sample ydat and computes the two-sample t-statistic txy (we called it t*(X,Y) on slide 107) without assumption of equal variance in both sampled populations, computes the degrees of freedom fhat by the Satterthwaite method, and the corresponding p-value , pval, corresponding to the observed value txy. The output should be a list with the values txy, fhat, and pval.
Compare your results with those of t.test(xdat,ydat,var.equal=F) when the two samples are given as
xdat=c(-0.7,-1.6,1.1,-0.2,0.4,-1.4,1.6,0.2,-0.3,0.3)
ydat=c(-0.4,-2.7,2.6,-1.6,2.8,1.5,-2.2,3.2,-0.1,2.2,-1.9, 0.9)
This means that you should not use t.test as a shortcut in your function Satterthwaite. Give your function and the results of your comparison.
After having convinced yourself that t.test does produce the correct p-values (according to slide 108 in Stat421NormalPopulation.pdf), write another function
test.Satterthwaite(Nsim=10000,sigx=1,sigy=2,m=10,n=20){...}
that generates in a loop, Nsim times, independent samples of size m and n from normal populations with means 0 and standard deviations sigx and sigy, respectively, and uses t.test to compute the p-value for each such pair of samples (Use out=t.test(xdat,ydat,var.equal=F) and out$p.value to get the p-value). This function should then also produce a histogram of these Nsim p-values, alternating colors in the bars and using breaks=seq(0,1,.025) and probability=T. During the development of this function use Nsim=1000, but ultimately use the default arguments as shown above.
Show the histogram (with appropriate annotations) and give the code. The histogram should look approximately uniform, since p-values (computed from continuous null distributions) should have a U(0,1) distribution when the data are generated according to the null distribution. Does this distribution of p-values suggest that these p-values will lead to a more or less correct type I error probability alpha, if we reject the hypothesis of equal means whenever an observed p-value is less than or equal to alpha, whatever alpha is chosen?
Solution: The code
Satterthwaite= function (xdat,ydat){
m=length(xdat)
n=length(ydat)
sx2=var(xdat)
sy2=var(ydat)
fhat=(sx2/m+sy2/n)^2/((sx2/m)^2/(m-1)+(sy2/n)^2/(n-1))
txy=(mean(ydat)-mean(xdat))/sqrt(sx2/m+sy2/n)
pval=2*pt(-abs(txy),fhat)
out=list(pval=pval,txy=txy,fhat=fhat)
out
}
The comparison on the given samples yields
> Satterthwaite(xdat,ydat)
$pval
[1] 0.5535037
$txy
[1] 0.6050084
$fhat
[1] 16.29990
➢ t.test(ydat,xdat,var.equal=F)
➢ Welch Two Sample t-test
data: ydat and xdat
t = 0.605, df = 16.3, p-value = 0.5535
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.045287 1.881954
sample estimates:
mean of x mean of y
0.3583333 -0.0600000
The results are correct within given precision.
The simulation function is given below, followed by the resulting plot
test.Satterthwaite=function (Nsim=10000,sigx=1,sigy=2,m=10,n=20)
{pval=rep(0,Nsim)
for(i in 1:Nsim){
xdat=rnorm(m,0,sigx)
ydat=rnorm(n,0,sigy)
out=t.test(xdat,ydat,var.equal=F)
pval[i]=out$p.value
}
hist(pval,xlab="P-value of Satterthwaite-Welch t-test", breaks=seq(0,1,.025),
col=c("blue","orange"),probability=T,main="")
segments(0,1,1,1,lty=2)
}
[pic]
Since this distribution looks indeed approximately like that of a U(0,1) random variable we will reject about 100 alpha% of the time by rejecting the hypothesis whenever the p-value is =crit.alpha)
abline(v=crit.alpha,col="red",lwd=2)
text(1.3*crit.alpha,max(out.hist$counts),substitute(beta==ax,
list(ax=format(signif(power,4)))),
adj=0)
}
ad.test.simulation=function (Nsim=10000,n=100,alpha=.05)
{
sim.out.norm=rep(0,Nsim)
sim.out.unif=rep(0,Nsim)
for(i in 1:Nsim){
out.n=ad.test(rnorm(n))
out.u=ad.test(runif(n))
sim.out.norm[i]=out.n$statistic
sim.out.unif[i]=out.u$statistic
}
m=min(sim.out.norm,sim.out.unif)
M=max(sim.out.unif,sim.out.norm)
par(mfrow=c(2,1))
out.hist=hist(sim.out.norm,breaks=seq(0,1.1*M,.1),col=c("blue","orange"),
xlab="AD-Statistic for Normal Samples",
main=paste("Sample size n =",n," , Nsim =",Nsim))
crit.alpha=quantile(sim.out.norm,1-alpha)
abline(v=crit.alpha,col="red",lwd=2)
text(1.02*crit.alpha,.95*max(out.hist$counts),substitute(alpha==ax,list(ax=alpha)),
adj=0)
out.hist=hist(sim.out.unif,breaks=seq(0,1.1*M,.1),col=c("blue","orange"),
xlab="AD-Statistic for Uniform Samples",
main="")
power=mean(sim.out.unif>=crit.alpha)
abline(v=crit.alpha,col="red",lwd=2)
text(1.3*crit.alpha,max(out.hist$counts),substitute(beta==ax,
list(ax=format(signif(power,4)))),
adj=0)
}
Problem 3: This is essentially the same as problem 2, except that in place of a uniform sample
xu = runif(n)you will use
xu = rnorm(n) followed by
xu[abs(xu) .2 we have
P(V≤ x)=P( V< -.2 )+P( V = 0 ) + P( .2 < V ≤ x)
= P(Z< -.2)+P( -.2 ≤ Z ≤ .2) + P( .2 < Z ≤ x) )= Φ(x)
For -.2 ≤ x < 0 we have P(V≤ x)=P(V< -.2) = Φ(-.2)
And for 0 ≤ x ≤ .2 we have
P(V≤ x)=P(V< -.2) + P(V = 0 ) = (Z < -.2)+P( -.2 ≤ Z ≤ .2) = Φ(.2)
The plot of this CDF is given below followed by the empirical CDF of a sample of size n=10000 from this CDF. The discrepancy between this kinked CDF and the standard normal CDF (superimposed) is local near zero (within the interval [-.2,.2]), not in the tails. Thus the KS-statistic should be sensitive to it while the AD-statistic will be less effective since discrepancies don’t extend over a wider range and thus do not contribute much in the integration, and discrepancies are not in the tails.
[pic]
[pic]
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.