AMS572



AMS572.01 Midterm Exam Fall, 2005

Instructions: This is a close book exam. Anyone who cheats in the exam shall receive a grade of F. Please provide complete solutions for full credit. Good luck!

1. In a study of hypnotic suggestion, 10 male volunteers were randomly allocated to an experimental group and a control group. Each subject participated in a two-phase experimental session. In the first phase, respiration was measured while the subject was awake and at rest. In the second phase, the subject was told to imagine that he was performing muscular work, and respiration was measured again. For subjects in the experimental group, hypnosis was induced between the first and second phases; thus, the suggestion to imagine muscular work was “hypnotic suggestion” for experimental subjects and “waking suggestion” for control subjects. The accompanying table shows the measurements of total ventilation (liters of air per minute per square meter of body area) for all 10 subjects.

|Experimental Group |Control Group |

|Subject |Rest |Work |Subject |Rest |Work |

|1 |6 |6 |6 |6 |5 |

|2 |7 |9 |7 |5 |5 |

|3 |5 |8 |8 |5 |5 |

|4 |7 |12 |9 |6 |6 |

|5 |6 |7 |10 |5 |4 |

(a) Use suitable tests to investigate the differences between the responses of the experimental and control groups (Use α =.05. Please state the assumption(s) of the test.)

(b) Please write up the entire SAS program necessary to answer questions raised in (a). Please include the data step as well as tests for testing for various assumptions.

/*Problem #1*/

data one;

input ID group rest work;

diff=work-rest;

datalines;

1 1 6 6

2 1 7 9

3 1 5 8

4 1 7 12

5 1 6 7

6 2 6 5

7 2 5 5

8 2 5 5

9 2 6 6

10 2 5 4

;

run;

proc univariate data=one normal;

class group;

var diff;

title 'Check for normality and test for one population mean, Q1';

run;

proc ttest data=one;

class group;

var diff;

title 'Independent samples t-test, Q1';

run;

proc npar1way wilcoxon data=one;

class group;

var diff;

exact;

title 'Nonparametric test for two-mean comparisons, Q1';

run;

Check for normality and test for one population mean, Q1

15:49 Friday, October 28, 2005

The UNIVARIATE Procedure

Variable: diff

group = 1

Moments

N 5 Sum Weights 5

Mean 2.2 Sum Observations 11

Std Deviation 1.92353841 Variance 3.7

Skewness 0.59012866 Kurtosis -0.0219138

Uncorrected SS 39 Corrected SS 14.8

Coeff Variation 87.4335639 Std Error Mean 0.86023253

Basic Statistical Measures

Location Variability

Mean 2.200000 Std Deviation 1.92354

Median 2.000000 Variance 3.70000

Mode . Range 5.00000

Interquartile Range 2.00000

Tests for Location: Mu0=0

Test -Statistic- -----p Value------

Student's t t 2.557448 Pr > |t| 0.0628

Sign M 2 Pr >= |M| 0.1250

Signed Rank S 5 Pr >= |S| 0.1250

Tests for Normality

Test --Statistic--- -----p Value------

Shapiro-Wilk W 0.978716 Pr < W 0.9276

Kolmogorov-Smirnov D 0.141405 Pr > D >0.1500

Cramer-von Mises W-Sq 0.022452 Pr > W-Sq >0.2500

Anderson-Darling A-Sq 0.166107 Pr > A-Sq >0.2500

Quantiles (Definition 5)

Quantile Estimate

100% Max 5

99% 5

95% 5

90% 5

Check for normality and test for one population mean, Q1

15:49 Friday, October 28, 2005

The UNIVARIATE Procedure

Variable: diff

group = 1

Quantiles (Definition 5)

Quantile Estimate

75% Q3 3

50% Median 2

25% Q1 1

10% 0

5% 0

1% 0

0% Min 0

Extreme Observations

----Lowest---- ----Highest---

Value Obs Value Obs

0 1 0 1

1 5 1 5

2 2 2 2

3 3 3 3

5 4 5 4

Check for normality and test for one population mean, Q1

15:49 Friday, October 28, 2005

The UNIVARIATE Procedure

Variable: diff

group = 2

Moments

N 5 Sum Weights 5

Mean -0.4 Sum Observations -2

Std Deviation 0.54772256 Variance 0.3

Skewness -0.6085806 Kurtosis -3.3333333

Uncorrected SS 2 Corrected SS 1.2

Coeff Variation -136.93064 Std Error Mean 0.24494897

Basic Statistical Measures

Location Variability

Mean -0.40000 Std Deviation 0.54772

Median 0.00000 Variance 0.30000

Mode 0.00000 Range 1.00000

Interquartile Range 1.00000

Tests for Location: Mu0=0

Test -Statistic- -----p Value------

Student's t t -1.63299 Pr > |t| 0.1778

Sign M -1 Pr >= |M| 0.5000

Signed Rank S -1.5 Pr >= |S| 0.5000

Tests for Normality

Test --Statistic--- -----p Value------

Shapiro-Wilk W 0.684029 Pr < W 0.0065

Kolmogorov-Smirnov D 0.367396 Pr > D 0.0245

Cramer-von Mises W-Sq 0.138317 Pr > W-Sq 0.0229

Anderson-Darling A-Sq 0.799546 Pr > A-Sq 0.0140

(So in this case, the problem is better solved using the nonparametric method. Although in the exam, since you can not perform the normality test, it is fine to just do the parametric test. Wei)

Quantiles (Definition 5)

Quantile Estimate

100% Max 0

99% 0

95% 0

90% 0

Check for normality and test for one population mean, Q1

15:49 Friday, October 28, 2005

The UNIVARIATE Procedure

Variable: diff

group = 2

Quantiles (Definition 5)

Quantile Estimate

75% Q3 0

50% Median 0

25% Q1 -1

10% -1

5% -1

1% -1

0% Min -1

Extreme Observations

----Lowest---- ----Highest---

Value Obs Value Obs

-1 10 -1 6

-1 6 -1 10

0 9 0 7

0 8 0 8

0 7 0 9

Independent samples t-test, Q1

15:49 Friday, October 28, 2005

The TTEST Procedure

Statistics

Lower CL Upper CL Lower CL Upper CL

Variable group N Mean Mean Mean Std Dev Std Dev Std Dev Std Err

diff 5 -0.188 2.2 4.5884 1.1525 1.9235 5.5274 0.8602

1

diff 5 -1.08 -0.4 0.2801 0.3282 0.5477 1.5739 0.2449

2

diff Diff (1-2) 0.5374 2.6 4.6626 0.9552 1.4142 2.7093 0.8944

T-Tests

Variable Method Variances DF t Value Pr > |t|

diff Pooled Equal 8 2.91 0.0197

diff Satterthwaite Unequal 4.64 2.91 0.0366

(Note: In the exam, for the separate variance t-test, you can use df = min (df1-1, df2-1). For the given problem, df = min (5-1, 5-1) = 4 Wei)

Equality of Variances

Variable Method Num DF Den DF F Value Pr > F

diff Folded F 4 4 12.33 0.0321

Nonparametric test for two-mean comparisons, Q1

15:49 Friday, October 28, 2005

The NPAR1WAY Procedure

The NPAR1WAY Procedure

Wilcoxon Scores (Rank Sums) for Variable diff

Classified by Variable group

Sum of Expected Std Dev Mean

group N Scores Under H0 Under H0 Score

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

1 5 38.50 27.50 4.624812 7.70

2 5 16.50 27.50 4.624812 3.30

Average scores were used for ties.

Wilcoxon Two-Sample Test

Statistic (S) 38.5000

Normal Approximation

Z 2.2704

One-Sided Pr > Z 0.0116

Two-Sided Pr > |Z| 0.0232

t Approximation

One-Sided Pr > Z 0.0247

Two-Sided Pr > |Z| 0.0493

Exact Test

One-Sided Pr >= S 0.0159

Two-Sided Pr >= |S - Mean| 0.0317

Z includes a continuity correction of 0.5.

Kruskal-Wallis Test

Chi-Square 5.6571

DF 1

Pr > Chi-Square 0.0174

2. Thanksgiving was coming up and Harvey's Turkey Farm was doing a land-office business. Harvey sold 100 gobblers to Nedicks for their famous Turkey-dogs. Nedicks found that 90 of Harvey's turkeys were in reality peacocks.

(a) Estimate the proportion of peacocks at Harvey's Turkey Farm and find a 95% confidence interval for the true proportion of turkeys that Harvey owns.

(b) How large a random sample should we select from Harvey's Farm to guarantee the length of the 95% confidence interval to be no more than 0.06? (Note: please first derive the general formula for sample size calculation based on the length of the CI for inference on one population proportion, large sample situation. Please give the formula for the two cases: (i) we have an estimate of the proportion and (ii) we do not have an estimate of the proportion to be estimated. (iii) Finally, please plug in the numerical values and obtain the sample size for this particular problem.)

Solution:

(a) This is large sample CI for one population proportion.

We have α = 0.05, [pic]. The 95% CI is

[pic]=[pic]=[pic] or (0.04, 0.16)

(b) This is sample size calculation for the estimation of one population proportion.

The general formula is derived as follows (also refer to your lecture notes for the derivations of the pivotal quantity, and the CI etc.):

(i) The pivotal quantity for the inference on π is

[pic]. The 100(1-α)% symmetrical CI for π is derived from

[pic], which yields [pic]. Therefore the length of the CI is: [pic]. Solving for the sample size, we have [pic], where E = L/2 is referred to as the maximum error.

(Note: E is also directly defined as[pic])

(ii) When we have no estimate of the proportion, since simple calculus shows that [pic], a conservative estimate is [pic]

In the given problem, we have E = 0.03 and α = 0.05.

(i). [pic]. [pic]; (ii). [pic]

3. An expert witness in a paternity suit testifies that the length (in days) of pregnancy (that is, the time from impregnation to the delivery of the child) is approximately normally distributed with parameter [pic] and [pic]. The defendant in the suit is able to prove that he was out of the country during a period that began 290 days before the birth of the child and ended 240 days before the birth. If the defendant was, in fact, the father of the child, what is the probability that the mother could have had the very long or very short pregnancy indicated by the testimony?

Solution: let [pic]~[pic] and [pic]~[pic]

[pic](the woman had a very long or very short pregnancy)

[pic]

[pic]

4A (for AMS majors). Suppose we have two independent random samples from two normal populations: [pic], and [pic]. Please derive the pooled-variance t-test using the pivotal quantity method. Please make sure that you include the following key steps.

a) Please derive the distribution of [pic]

b) Please derive the distribution of [pic]

c) Please derive the distribution of the pooled-variance t statistic (the pivotal quantity).

d) Please derive the rejection region for a 2-sided test at the significance level of α.

e) Please illustrate using the pdf plot how to calculate the p-value for a 2-sided test.

Solution: (Please refer to your lecture notes for the entire derivation.) Here is a simple outline of the derivation.

a) We start with the point estimator for the parameter of interest[pic]: [pic]. Its distribution is [pic] using the mgf for [pic] which is [pic], and the independence properties of the random samples. From this we have [pic]. Unfortunately, Z can not serve as the pivotal quantity because σ is unknown.

b) We next look for a way to get rid of the unknown σ following a similar approach in the construction of the Student’s t-statistic. We found that [pic] using the mgf for [pic] which is [pic], and the independence properties of the random samples.

c) Then we found, from the theorem of sampling from the normal population, and the independence properties of the random samples, that Z and W are independent, and therefore, by the definition of the t-distribution, we have obtained our pivotal quantity: [pic], where [pic] is the pooled sample variance.

d) The rejection region is derived from [pic], thus [pic]

e) The p-value is twice the tail area bounded by the test statistic [pic] . I will not show the pdf plot here although you should.

4B (for non-AMS majors). A new method of making concrete blocks has been proposed. To test whether or not the new method increases the compressive strength, five sample blocks are made by each method. The compressive strengths in 10 pounds per square inch are listed here:

|New Method |15 14 13 15 16 |

|Old Method |13 15 13 12 14 |

(a). Please construct a 95 % percent confidence interval for the mean difference between the compressive strengths by two methods.

(b). At the significance level .05, can you conclude that the new method increases the compressive strength?

(c). What assumptions do you need for the inference in parts (a) and (b)?

(d). Please write up the entire SAS program necessary to answer questions raised in (a), (b) and (c). Please include the data step as well as tests for testing for various assumptions.

Solution: Inference on two population means. Two small and independent samples.

New method: [pic]

Old method: [pic]

Under the normality assumption, we test if the two population variances are equal [pic] vs [pic].

Test statistic is

[pic], [pic] and [pic].

Since F0 is between 0.1042 and 9.60, we cannot reject H0 . Therefore it is reasonable to assume that[pic].

a) 95% C. I. for difference is

[pic]

where [pic]

Therefore 95% C.I. is [-0.46, 2.86].

b) Using t-test with hypotheses [pic] v.s. [pic],

[pic]

[pic] Thus, we cannot reject H0. There isn't enough evidence to conclude that the new method increases the strength.

c) (1) Both populations are normally distributed

(2) [pic]

(d) /*Problem #4B*/

data four;

input method strength;

datalines;

1 15

1 14

1 13

1 15

1 16

2 13

2 15

2 13

2 12

2 14

;

run;

proc univariate data=four normal;

class method;

var strength;

title 'Check for normality';

run;

proc ttest data=four;

class method;

var strength;

title 'Independent samples t-test';

run;

proc npar1way data=four wilcoxon;

class method;

var strength;

title 'Nonparametric test for two-mean comparisons';

run;

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download