Homework 1



Homework 1, Statistics 921, Fall 2007

This homework is due Wednesday, September 26th at the beginning of class.

1. Mean and variance of the Mann-Whitney test statistic. Suppose that [pic]of [pic]subjects are randomly assigned to receive an active treatment and the remaining [pic]subjects to receive a control. Suppose the additive treatment effect model holds, [pic]. Consider the Mann-Whitney test statistic for [pic],

[pic]

[The number of pairs of treated, control units for which the treated unit has a higher value of [pic], counting half for ties]. Assume that the outcome [pic]is such that there are no ties among [pic].

Show that under [pic], we have [pic] and [pic] where the expectation and variance are taken over the distribution of random assignments.

2. R function for computing Mann-Whitney test. The distribution of [pic]under [pic]is quite well approximated by a normal distribution for [pic]and [pic] both greater than ten:

[pic] (1.1)

We will prove this fact in question 5 (extra credit)

Write an R function that takes as input the treated and control unit’s outcomes and the value of [pic] that the user wants to test in a one-sided test of[pic] versus the alternative [pic] in the additive treatment effect model, and outputs a p-value for the test based on the approximate distribution of [pic] under [pic] that is given in (1.1).

3. Simulation study comparison of Mann-Whitney test vs. the t-test. In this problem, we will compare the level and power of the Mann-Whitney test based on the approximation (1.1) (using the R function that you created in question 2) vs. the pooled t-test for the additive treatment effect model [pic].

Consider a randomized experiment with [pic] subjects, 25 of whom will be assigned to the treatment group. We will consider two situations: (1) there is no treatment effect, [pic]; (2) there is a positive treatment effect of [pic].

First, consider the situation with [pic]

(a) Simulate [pic]from a standard normal distribution (use the rnorm function in R). Then randomly assign 25 subjects to treatment. Test [pic] vs. [pic] at the 0.05 level using the pooled t-test and the Mann-Whitney test. Repeat this 2000 times (using a loop). Record the proportion of times each test rejects [pic] at the 0.05 level.

(b) Repeat part (a) except this time simulate [pic]from a t distribution with 3 degrees of freedom (use the rt function in R). The t distribution with 3 degrees of freedom has heavier tails than the normal distribution.

(c) Repeat part (a) except this time simulate [pic]from a Cauchy distribution (which is a t-distribution with 1 degree of freedom). The Cauchy distribution has very heavy tails; in fact the mean of the Cauchy distribution does not exist.

(d) Repeat (a)-(c) except that now consider the situation in which there is a positive treatment effect of [pic]. We still test [pic] vs. [pic]; in this set of simulations, we are comparing the power of the pooled t-test vs. the Mann-Whitney test.

(e) Summarize what you have learned from the simulation study about the actual level of the pooled t-test and the Mann-Whitney test for a nominal 0.05 level test and the power of the pooled t-test vs. the Mann-Whitney test when [pic]for the distributions considered.

4. In this problem, we will examine data from an experiment to test whether massive injection of silver iodide into cumulus clouds can lead to increased rainfall (Data from J. Simpson, A. Olsen and J. Eden, “A Bayesian Analysis of a Multiplicative Treatment Effect in Weather Modification,” Technometrics 17 (1975): 161-166). The experiment was carried out in southern Florida in 1968. The days in a period of 52 days were randomly assigned to two groups: 26 of the days were assigned to have a target cloud seeded that day and 26 of the days were assigned to have a target cloud left unseeded as a control. An airplane flew through the cloud in both cases; the experimenters and the pilot were themselves unaware of whether on any particular day the seeding mechanism in the plane was loaded or not (that is, they were blind to the treatment). Precipitation was measured as the total rain volume falling from the cloud base following the airplane seeding run, as measured by radar. The data for the treatment and the control group are as follows:

treatment=c(2745.6, 1697.1, 1656.4, 978, 703.4, 489.1, 430, 334.1, 302.8, 274.7, 274.7, 255, 242.5, 200.7, 198.6, 129.6, 119, 118.3, 115.3, 92.4, 40.6, 32.7, 31.4, 17.5, 7.7, 4.1)

control=c(1202.6, 830.1, 372.4, 345.5, 321.2, 244.3, 163, 147.8, 95, 87, 81.2, 68.5, 47.3, 41.1, 36.6, 29, 28.6, 26.3, 26, 24.4, 21.4, 17.3, 11.5, 4.9, 4.9, 1.0)

(a) Make box plots of the treated and control outcomes, and argue that the additive treatment effect model is not reasonable.

(b) Make box plots of the logarithm of rainfall and argue that the multiplicative treatment effect model, [pic] is reasonable.

(c) Use the R function you wrote in question 2 to test the null hypothesis that the multiplicative treatment effect [pic]equals 1 versus the alternative [pic].

(d) Show that the p-value for testing the multiplicative treatment effect [pic] versus the alternative [pic] is a monotonically increasing function of [pic].

(e) Use the result in part 4(d) and the R function you wrote in question 2 to find a 95% one-sided confidence interval (of the form [pic]) for the multiplicative treatment effect [pic].

(f) Find a two-sided 95% confidence interval for [pic].

(g) There is likely to be serial correlation in rainfall, i.e., if [pic] are the rainfalls that would occur on days 1,…,52 of the experiment respectively if there was no cloud seeding, then [pic]and [pic]are likely to be correlated. Is the randomization inference in parts (b)-(d) still valid in the presence of this serial correlation, i.e, will the confidence intervals in parts (c)-(d) still contain the true [pic] at least 95% of the time in repeated randomized assignments (assuming that the multiplicative treatment effect model is true)? Explain your answer briefly.

5. Extra credit problem: Large experiment distribution of Mann-Whitney statistic. In this problem, we will prove (1.1) concerning the null hypothesis distribution of the Mann-Whitney statistic in “large” experiments (it suffices that [pic]and [pic] both be greater than ten for the approximation to work well). Note that (1.1) does not follow immediately from the ordinary central limit theorem because although [pic]is a sum of random variables, the random variables in the sum are not independent.

Suppose that a sequence of randomized experiments is performed in which [pic] and [pic], and suppose that [pic] is true for all of the experiments and that there are no ties among [pic] for any of the experiments. Let [pic]refer to the Mann-Whitney test statistic for the experiment with [pic]units.

(a) Consider the Wilcoxon rank sum test statistic, [pic], (i.e., the sum of the ranks of the treated units) where the unit with the smallest [pic]has rank 1 and the unit with the largest [pic]has rank [pic].

Show that

[pic].

(b) Show that the distribution of [pic]is the same as the distribution of the random variable [pic] that is generated as follows: draw [pic] iid uniform (0,1) random variables, rank [pic] and then set [pic]

We shall approximate [pic] by [pic] where [pic] is the smallest integer greater than or equal to [pic]. Note that [pic] should be fairly closely to [pic]. The advantage of considering [pic] is that it is a sum of iid random variables so that the Lindeberg-Feller central limit theorem can be applied to it.

(c) Show using the Lindeberg-Feller central limit theorem that [pic] as [pic].

(d) Show that if two sequences of random variables [pic] and [pic] have the properties that (i) [pic] (where [pic] is a random variable) as [pic] and (ii) if [pic] as [pic], then [pic].

(e) Show that [pic] as [pic].

(f) Based on (a)-(e), conclude that [pic] as [pic].

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download