Stat 421, Fall 2007



Stat 421, Fall 2008

Fritz Scholz

Homework 7 Solutions

Problem 1: Write a function Satterthwaite(xdat,ydat){....} that takes an x-sample xdat and a y-sample ydat and computes the two-sample t-statistic txy (we called it t*(X,Y) on slide 107) without assumption of equal variance in both sampled populations, computes the degrees of freedom fhat by the Satterthwaite method, and the corresponding p-value , pval, corresponding to the observed value txy. The output should be a list with the values txy, fhat, and pval.

Compare your results with those of t.test(xdat,ydat,var.equal=F) when the two samples are given as

xdat=c(-0.7,-1.6,1.1,-0.2,0.4,-1.4,1.6,0.2,-0.3,0.3)

ydat=c(-0.4,-2.7,2.6,-1.6,2.8,1.5,-2.2,3.2,-0.1,2.2,-1.9, 0.9)

This means that you should not use t.test as a shortcut in your function Satterthwaite. Give your function and the results of your comparison.

After having convinced yourself that t.test does produce the correct p-values (according to slide 108 in Stat421NormalPopulation.pdf), write another function

test.Satterthwaite(Nsim=10000,sigx=1,sigy=2,m=10,n=20){...}

that generates in a loop, Nsim times, independent samples of size m and n from normal populations with means 0 and standard deviations sigx and sigy, respectively, and uses t.test to compute the p-value for each such pair of samples (Use out=t.test(xdat,ydat,var.equal=F) and out$p.value to get the p-value). This function should then also produce a histogram of these Nsim p-values, alternating colors in the bars and using breaks=seq(0,1,.025) and probability=T. During the development of this function use Nsim=1000, but ultimately use the default arguments as shown above.

Show the histogram (with appropriate annotations) and give the code. The histogram should look approximately uniform, since p-values (computed from continuous null distributions) should have a U(0,1) distribution when the data are generated according to the null distribution. Does this distribution of p-values suggest that these p-values will lead to a more or less correct type I error probability alpha, if we reject the hypothesis of equal means whenever an observed p-value is less than or equal to alpha, whatever alpha is chosen?

Solution: The code

Satterthwaite= function (xdat,ydat){

m=length(xdat)

n=length(ydat)

sx2=var(xdat)

sy2=var(ydat)

fhat=(sx2/m+sy2/n)^2/((sx2/m)^2/(m-1)+(sy2/n)^2/(n-1))

txy=(mean(ydat)-mean(xdat))/sqrt(sx2/m+sy2/n)

pval=2*pt(-abs(txy),fhat)

out=list(pval=pval,txy=txy,fhat=fhat)

out

}

The comparison on the given samples yields

> Satterthwaite(xdat,ydat)

$pval

[1] 0.5535037

$txy

[1] 0.6050084

$fhat

[1] 16.29990

➢ t.test(ydat,xdat,var.equal=F)

➢ Welch Two Sample t-test

data: ydat and xdat

t = 0.605, df = 16.3, p-value = 0.5535

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-1.045287 1.881954

sample estimates:

mean of x mean of y

0.3583333 -0.0600000

The results are correct within given precision.

The simulation function is given below, followed by the resulting plot

test.Satterthwaite=function (Nsim=10000,sigx=1,sigy=2,m=10,n=20)

{pval=rep(0,Nsim)

for(i in 1:Nsim){

xdat=rnorm(m,0,sigx)

ydat=rnorm(n,0,sigy)

out=t.test(xdat,ydat,var.equal=F)

pval[i]=out$p.value

}

hist(pval,xlab="P-value of Satterthwaite-Welch t-test", breaks=seq(0,1,.025),

col=c("blue","orange"),probability=T,main="")

segments(0,1,1,1,lty=2)

}

[pic]

Since this distribution looks indeed approximately like that of a U(0,1) random variable we will reject about 100 alpha% of the time by rejecting the hypothesis whenever the p-value is =crit.alpha)

abline(v=crit.alpha,col="red",lwd=2)

text(1.3*crit.alpha,max(out.hist$counts),substitute(beta==ax,

list(ax=format(signif(power,4)))),

adj=0)

}

ad.test.simulation=function (Nsim=10000,n=100,alpha=.05)

{

sim.out.norm=rep(0,Nsim)

sim.out.unif=rep(0,Nsim)

for(i in 1:Nsim){

out.n=ad.test(rnorm(n))

out.u=ad.test(runif(n))

sim.out.norm[i]=out.n$statistic

sim.out.unif[i]=out.u$statistic

}

m=min(sim.out.norm,sim.out.unif)

M=max(sim.out.unif,sim.out.norm)

par(mfrow=c(2,1))

out.hist=hist(sim.out.norm,breaks=seq(0,1.1*M,.1),col=c("blue","orange"),

xlab="AD-Statistic for Normal Samples",

main=paste("Sample size n =",n," , Nsim =",Nsim))

crit.alpha=quantile(sim.out.norm,1-alpha)

abline(v=crit.alpha,col="red",lwd=2)

text(1.02*crit.alpha,.95*max(out.hist$counts),substitute(alpha==ax,list(ax=alpha)),

adj=0)

out.hist=hist(sim.out.unif,breaks=seq(0,1.1*M,.1),col=c("blue","orange"),

xlab="AD-Statistic for Uniform Samples",

main="")

power=mean(sim.out.unif>=crit.alpha)

abline(v=crit.alpha,col="red",lwd=2)

text(1.3*crit.alpha,max(out.hist$counts),substitute(beta==ax,

list(ax=format(signif(power,4)))),

adj=0)

}

Problem 3: This is essentially the same as problem 2, except that in place of a uniform sample

xu = runif(n)you will use

xu = rnorm(n) followed by

xu[abs(xu) .2 we have

P(V≤ x)=P( V< -.2 )+P( V = 0 ) + P( .2 < V ≤ x)

= P(Z< -.2)+P( -.2 ≤ Z ≤ .2) + P( .2 < Z ≤ x) )= Φ(x)

For -.2 ≤ x < 0 we have P(V≤ x)=P(V< -.2) = Φ(-.2)

And for 0 ≤ x ≤ .2 we have

P(V≤ x)=P(V< -.2) + P(V = 0 ) = (Z < -.2)+P( -.2 ≤ Z ≤ .2) = Φ(.2)

The plot of this CDF is given below followed by the empirical CDF of a sample of size n=10000 from this CDF. The discrepancy between this kinked CDF and the standard normal CDF (superimposed) is local near zero (within the interval [-.2,.2]), not in the tails. Thus the KS-statistic should be sensitive to it while the AD-statistic will be less effective since discrepancies don’t extend over a wider range and thus do not contribute much in the integration, and discrepancies are not in the tails.

[pic]

[pic]

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download