Stat 421, Fall 2007

Stat 421, Fall 2008

Fritz Scholz

Homework 7 Solutions

Problem 1: Write a function Satterthwaite(xdat,ydat){....} that takes an x-sample xdat and a y-sample ydat and computes the two-sample t-statistic txy (we called it t*(X,Y) on slide 107) without assumption of equal variance in both sampled populations, computes the degrees of freedom fhat by the Satterthwaite method, and the corresponding p-value , pval, corresponding to the observed value txy. The output should be a list with the values txy, fhat, and pval.

Compare your results with those of t.test(xdat,ydat,var.equal=F) when the two samples are given as

xdat=c(-0.7,-1.6,1.1,-0.2,0.4,-1.4,1.6,0.2,-0.3,0.3)

ydat=c(-0.4,-2.7,2.6,-1.6,2.8,1.5,-2.2,3.2,-0.1,2.2,-1.9, 0.9)

This means that you should not use t.test as a shortcut in your function Satterthwaite. Give your function and the results of your comparison.

After having convinced yourself that t.test does produce the correct p-values (according to slide 108 in Stat421NormalPopulation.pdf), write another function

test.Satterthwaite(Nsim=10000,sigx=1,sigy=2,m=10,n=20){...}

that generates in a loop, Nsim times, independent samples of size m and n from normal populations with means 0 and standard deviations sigx and sigy, respectively, and uses t.test to compute the p-value for each such pair of samples (Use out=t.test(xdat,ydat,var.equal=F) and out$p.value to get the p-value). This function should then also produce a histogram of these Nsim p-values, alternating colors in the bars and using breaks=seq(0,1,.025) and probability=T. During the development of this function use Nsim=1000, but ultimately use the default arguments as shown above.

Show the histogram (with appropriate annotations) and give the code. The histogram should look approximately uniform, since p-values (computed from continuous null distributions) should have a U(0,1) distribution when the data are generated according to the null distribution. Does this distribution of p-values suggest that these p-values will lead to a more or less correct type I error probability alpha, if we reject the hypothesis of equal means whenever an observed p-value is less than or equal to alpha, whatever alpha is chosen?

Solution: The code

Satterthwaite= function (xdat,ydat){

m=length(xdat)

n=length(ydat)

sx2=var(xdat)

sy2=var(ydat)

fhat=(sx2/m+sy2/n)^2/((sx2/m)^2/(m-1)+(sy2/n)^2/(n-1))

txy=(mean(ydat)-mean(xdat))/sqrt(sx2/m+sy2/n)

pval=2*pt(-abs(txy),fhat)

out=list(pval=pval,txy=txy,fhat=fhat)

out

}

The comparison on the given samples yields

> Satterthwaite(xdat,ydat)

$pval

[1] 0.5535037

$txy

[1] 0.6050084

$fhat

[1] 16.29990

➢ t.test(ydat,xdat,var.equal=F)

➢ Welch Two Sample t-test

data: ydat and xdat

t = 0.605, df = 16.3, p-value = 0.5535

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-1.045287 1.881954

sample estimates:

mean of x mean of y

0.3583333 -0.0600000

The results are correct within given precision.

The simulation function is given below, followed by the resulting plot

test.Satterthwaite=function (Nsim=10000,sigx=1,sigy=2,m=10,n=20)

{pval=rep(0,Nsim)

for(i in 1:Nsim){

xdat=rnorm(m,0,sigx)

ydat=rnorm(n,0,sigy)

out=t.test(xdat,ydat,var.equal=F)

pval[i]=out$p.value

}

hist(pval,xlab="P-value of Satterthwaite-Welch t-test", breaks=seq(0,1,.025),

col=c("blue","orange"),probability=T,main="")

segments(0,1,1,1,lty=2)

}

[pic]

Since this distribution looks indeed approximately like that of a U(0,1) random variable we will reject about 100 alpha% of the time by rejecting the hypothesis whenever the p-value is =crit.alpha)

abline(v=crit.alpha,col="red",lwd=2)

text(1.3*crit.alpha,max(out.hist$counts),substitute(beta==ax,

list(ax=format(signif(power,4)))),

adj=0)

}

ad.test.simulation=function (Nsim=10000,n=100,alpha=.05)

{

sim.out.norm=rep(0,Nsim)

sim.out.unif=rep(0,Nsim)

for(i in 1:Nsim){

out.n=ad.test(rnorm(n))

out.u=ad.test(runif(n))

sim.out.norm[i]=out.n$statistic

sim.out.unif[i]=out.u$statistic

}

m=min(sim.out.norm,sim.out.unif)

M=max(sim.out.unif,sim.out.norm)

par(mfrow=c(2,1))

out.hist=hist(sim.out.norm,breaks=seq(0,1.1*M,.1),col=c("blue","orange"),

xlab="AD-Statistic for Normal Samples",

main=paste("Sample size n =",n," , Nsim =",Nsim))

crit.alpha=quantile(sim.out.norm,1-alpha)

abline(v=crit.alpha,col="red",lwd=2)

text(1.02*crit.alpha,.95*max(out.hist$counts),substitute(alpha==ax,list(ax=alpha)),

adj=0)

out.hist=hist(sim.out.unif,breaks=seq(0,1.1*M,.1),col=c("blue","orange"),

xlab="AD-Statistic for Uniform Samples",

main="")

power=mean(sim.out.unif>=crit.alpha)

abline(v=crit.alpha,col="red",lwd=2)

text(1.3*crit.alpha,max(out.hist$counts),substitute(beta==ax,

list(ax=format(signif(power,4)))),

adj=0)

}

Problem 3: This is essentially the same as problem 2, except that in place of a uniform sample

xu = runif(n)you will use

xu = rnorm(n) followed by

xu[abs(xu) .2 we have

P(V≤ x)=P( V< -.2 )+P( V = 0 ) + P( .2 < V ≤ x)

= P(Z< -.2)+P( -.2 ≤ Z ≤ .2) + P( .2 < Z ≤ x) )= Φ(x)

For -.2 ≤ x < 0 we have P(V≤ x)=P(V< -.2) = Φ(-.2)

And for 0 ≤ x ≤ .2 we have

P(V≤ x)=P(V< -.2) + P(V = 0 ) = (Z < -.2)+P( -.2 ≤ Z ≤ .2) = Φ(.2)

The plot of this CDF is given below followed by the empirical CDF of a sample of size n=10000 from this CDF. The discrepancy between this kinked CDF and the standard normal CDF (superimposed) is local near zero (within the interval [-.2,.2]), not in the tails. Thus the KS-statistic should be sensitive to it while the AD-statistic will be less effective since discrepancies don’t extend over a wider range and thus do not contribute much in the integration, and discrepancies are not in the tails.

[pic]

[pic]

................
................

In order to avoid copyright disputes, this page is only a partial summary.

To fulfill the demand for quickly locating and searching documents.

It is intelligent file search solution for home and business.

Literature Lottery

To fulfill the demand for quickly locating and searching documents.

Related download

Related searches