AMS572 - Stony Brook



AMS572.01 Final Exam Fall, 2013 Name ___________________________________ID ______________________Signature________________________Instruction: This is a close book exam. Anyone who cheats in the exam shall receive a grade of F. Please provide complete solutions for full credit. The exam goes from 11:15am - 1:45pm. (*Extended time at the DSS as required.*) Calculator is allowed. Please use the given statistical tables. Good luck! Reed Auto periodically has a special week-long sale. As part of the advertising campaign Reed runs one or more television commercials during the weekend preceding the sale. Data from a sample of 5 previous sales are shown below. Number of TV Ads 13213Number of Cars Sold1424181727Find the least squares regression line.Test at α=0.05 whether there is a significant linear relationship between these two variables.What percentage of variation in numbers of cars sold is explained by the number of TV ads?Please write up the entire SAS code necessary to answer questions (a), (b), (c) above. In addition, please write up a SAS program to compute the sample correlation coefficient between the two variables and to test whether the corresponding population correlation is zero or not. Solution: This is a simple linear regression problem.n=5, x=2, y=20Sxy=xy-nxy=220-5*2*20=20Sxx=x2-nx2=24-5*22=4Syy=y2-ny2=2114-5*202=114β1=SxySxx=204=5β0=y-β1x=20-5*2=10The fitted least square regression line is:y=10+5x(*Dear students, many of you forgot to add the hat to Y – then I realized that we had such a typo in the homework solutions. So you are forgiven and no point was taken. But please do remember the hat for the future.)The mean square error estimate of σ is:σ=MSE=SSEn-2=SST-SSRn-2=Syy-β12Sxxn-2=114-52*45-2=2.16The hypotheses are: H0: β1=0 versus Ha: β1≠0Test statistic:t0=β1-0SE(β1)=β1σSxx=52.164=4.63>t3,0.025=3.182Therefore we reject the null hypothesis at α=0.05 and conclude that there is a significant linear relationship between these two variables.R2=Sxy2SxxSyy=2024*114=0.877Therefore we claim that 87.7% of variation in number of cars sold is explained by the number of TV ads. .Data carsell;input x y;datalines;114324218117327;run;proc reg data = carsell;model y = x;run;proc corr data = carsell;var x y;run;A firm wishes to compare four programs for training workers to perform a certain manual task. Twenty new employees are randomly assigned to the training programs, with 5 in each program. At the end of the training period, a test is conducted to see how quickly trainees can perform the task. The number of times the task is performed per minute is recorded for each trainee, with the following results: ObservationProgram 1Program 2Program 3Program 41910129212614831491111411913751310118Using the hypothetical data provided below, test at α=0.05 whether the four training programs are equally effective. What assumptions are necessary for your test?Please write up the entire SAS code necessary to answer question (a) above.Please compare training programs 3 and 4 using the usual pooled-variance t-test at the significance level α=0.05. At α=0.05, please compare training programs 3 and 4 using an optimal test – that is, the best test you can find based on the given data – this test should be better than the pooled variance t-test in part (c). Please derive your optimal test in part (d) using the pivotal quantity method under the same assumptions mentioned in part (a), and for the following general setting, at the significance level α.ObservationProgram 1Program 2?Program k1X11X21?Xk12X12X22?Xk2?????nX1nX2n?Xkn(extra credit) Please derive your optimal test in part (d) using the likelihood ratio test method using the same assumptions and general setting given in (e). Prove whether the tests in (e) and (f) are equivalent. Solution: This is a one-way ANOVA problem with 4 independent samples. We need to perform an ANOVA F-test. The first assumption is that all four populations are normal. The second is that all four population variances are unknown but equal. H0: μ1=μ2=μ3=μ4Ha: the above is not trueAnalysis of VarianceSourceSSd.f.MSFTraining Program54.95318.327.04Error41.6162.6Total96.5519Since F0=7.04>F3,16,0.05=3.24, we reject the null hypothesis, and claim that the four training programs are not equally effective. .(b)data training;input program speed;datalines;;1 9 1 12 1 14 1 11 13 21026292921031231431131331149484114748;run;proc anova data = training;class program;model speed = program;run;(c) By the ANOVA assumption, we assume that both populations are normal, and the population variances are unknown but equal ().Now we perform the pooled variance t-test to test whether the two population means are equal. versus Test Statistic : (p-value = 0.00384, 2-sided)∴ We reject the null hypothesis at α=0.05, and conclude that there is evidence of a difference in mean speed between these two training programs.Derivation of the pooled-variance t-test (2-sided test) using the pivotal quantity approachSuppose we have k independent random samples each of size n from k normal populations with unknown but equal population variances: Xi1,Xi2, ?,Xin~Nμi,σ2, i=1,?,k. Here is a simple outline of the derivation of the test: versus , where 1≤i≠j≤k, using the pivotal quantity approach.[1]. We start with the point estimator for the parameter of interest:Xi-Xj. Its distribution is using the mgf for which is , and the independence properties of the random samples. From this we have . Unfortunately, Z can not serve as the pivotal quantity because σ is unknown. [2]. We next look for a way to get rid of the unknown σ following a similar approach in the construction of the pooled-variance t-statistic. We found that using the mgf for which is , and the independence properties of the random samples.[3]. Then we found, from the theorem of sampling from the normal population, and the independence properties of the random samples, that Z and W are independent, and therefore, by the definition of the t-distribution, we have obtained our pivotal quantity: , where is the pooled sample variance from all k samples. [4]. The rejection region is derived from , where . Thus . Therefore at the significance level of α, we reject in favor of iff For the given problem, we have: versus Test statistic: (p-value = 0.00278, 2-sided -- we can see that this p-value is indeed smaller than the pooled-variance t-test in part (c) because this t-test is more optimal, with the largest degree of freedom possible.)∴ We reject the null hypothesis at α=0.05, and conclude that there is evidence of a difference in mean speed between these two training programs.Derivation of the pooled-variance t-test (2-sided test) using the likelihood ratio test approachGiven that we have two independent random samples from two normal populations with equal but unknown variances. Now we derive the likelihood ratio test for:H0:μi=μj vs Ha :μi≠μj --- Without loss of generality, for the sake of simplicity, we will set i=1, j=2 for the derivation of the likelihood ratio test. Let μ1=μ2=μ, then, ={-∞<μ1=μ2=μ, μ3, μ4, ?,μk<+∞, 0≤σ2<+∞},Ω={-∞<μ1,μ2, μ3, μ4, ?,μk<+∞, 0<σ2<+∞}Lω=Lμ, μ3, μ4, ?,μk,σ2=(12πσ2)kn2exp?[-12σ2m=1nx1m-μ2+m=1nx2m-μ2+l=3km=1nxlm-μl2],and there are k parameters .lnLω=-kn2ln2πσ2-12σ2m=1nx1m-μ2+m=1nx2m-μ2+l=3km=1nxlm-μl2, for it contains k parameters, we do the partial derivatives with μ, μ3, μ4, ?,μk and σ2 respectively and let the partial derivatives equal to 0. Then we have:μ=X1+X22μl,ω=Xl, l=3,?,kσω2=1kn[m=1nx1m-μ2+m=1nx2m-μ2+l=3km=1nxlm-μl,ω2]L(Ω)=Lμ1,μ2, μ3, μ4, ?,μk,σ2=(12πσ2)kn2exp?[-12σ2l=1km=1nxlm-μl2],and there are k+1 parameters.lnLΩ=-kn2ln2πσ2-12σ2l=1km=1nxlm-μl2We do the partial derivatives with μ1,μ2, μ3, μ4, ?,μk and σ2 respectively and let them all equal to 0. Then we have:μl,Ω=Xl, l=1,2,3,?,kσΩ2=1knl=1km=1nxlm-μl,Ω2=n-1nMSE=n-1nS2 (where S2 is the pooled sample variance from all k samples as defined in part d)At this time, we have done all the estimation of parameters. Then, after some cancellations/simplifications, we have:λ=LωLΩ=12πσω2kn212πσΩ2kn2=σΩ2σω2kn2=l=1km=1nxlm-μl,Ω2m=1nx1m-μ2+m=1nx2m-μ2+l=3km=1nxlm-μl,ω2kn2=[1+t02kn-1]-kn2where t0 is the test statistic in the pooled variance t-test. Therefore, λ≤λ* is equivalent to |t0|≥c. Thus at the significance levelα, we reject the null hypothesis in favor of the alternative when t0≥ c = tkn-1,α/2. This test is identical to the test we have derived in part (b). 3. In order to test the accuracy of speedometers purchased from a subcontractor, the purchasing department of an automaker orders a test of a sample of speedometers at a controlled speed of 55 mph. At this speed, it is estimated that the variance of the readings is 1. (a) How many speedometers need to be tested to have a 95% power to detect a bias of 0.5 mph or greater using a 0.01 level test?(b) A sample of the size determined in (a) has a mean of 55.2 and standard deviation of 0.8. Can you conclude that the speedometers have a bias?(c) Calculate the power of the test if 50 speedometers are tested and the actual bias is 0.5 mph. Assume a population standard deviation of 0.8. Solution:(a) . (*Note, if , )Hence, 64 packages of cereal speedometers need to be tested. (*Note, only 41 packages are needed if ) (b) . .. (*Note, -- This is the large sample z-test by the central limit theorem that is suitable even if the population distribution is not normal.)Since , we can not conclude that the speedometers have a bias.(**Note: Here you can also use the t-test – but remember to mention that the t-test is suitable if we assume the population distribution is normal!)(c) You are an epidemiologist for the US Department of Health and Human Services. You are studying the prevalence of a certain disease in two states (MA and CA). In MA, 74 of 1500 people surveyed were diseased and in CA, 129 of 1500 were diseased. At the significance level of .05, can you conclude that the prevalence rates are different? Can you test the hypotheses mentioned in (a) using another test?Are the two tests in parts (a) and (b) equivalent or not? Please justify your claim in a general setting – that is, suppose we have X1diseased subjects among a total of n1 people surveyed in MA, and X2 diseased subjects among a total of n2 people surveyed in CA. Furthermore, the significance level is α.Solution:DiseasedNot-DiseasedTotalMAa (74) P1b (1500-74)1500CAc (129) P2d (1500-129)1500H0:P1=P2Hα:P1≠P2 P=74+1291500+1500≈0.0677Z0=p1-p2-0p1-p1n1+1n2=0.0493-0.0860.06771-0.067711500+11500≈-3.998Z0≈3.998>Z0.025=1.96We not reject at α=0.05 and conclude that based on the given data, the prevalence of this disease is different between CA and MA.& (c) Now we denote the probabilities of the four table cells as follows:DiseasedNot-DiseasedMAp11 (=p1) p12 (=1-p1)CAp21(=p2)p22 (=1-p2)The original hypotheses of equal population proportions (versus not equal): H0: p1=p2Ha: p1≠p2are equivalent to the hypotheses for the homogeneity test for a two-way contingency table:H0: p11=p21,p12=p22Ha: the above is not trueThe (large sample) test statistic is:χ02=a-Pa+b2Pa+b+b-1-Pa+b21-Pa+b+c-Pc+d2Pc+d+d-1-Pc+d21-Pc+d=a-a+ca+ba+b+c+d2a+ca+ba+b+c+d+b-b+da+ba+b+c+d2b+da+ba+b+c+d+c-(a+c)c+da+b+c+d2(a+c)c+da+b+c+d+d-(b+d)c+da+b+c+d2(b+d)c+da+b+c+d=Z02Where P=a+ca+b+c+d and χ02~χ(2-1)(2-1)2At the significance level α, we reject the null hypothesis iff χ02>χ1,α, upper2 which is equivalent to rejecting the null hypothesis iff Z0>Zα/2. This is because: α=PZ0>Zα/2=PZ02>Zα/22=Pχ02>Zα/22, and therefore: χ1,α, upper2=Zα/22. Thus we claim that these two tests are entirely equivalent. So we have done part (c).Now back to part (b), we can simply plug in the values (or you can do part (c) first and then use its general result to perform part (b), either way is Ok with me) and obtain the chi-square statistic as follows:χ02=Z02=-3.9982≈16Since χ02=16>χ1,0.05, upper2=3.84 (it is indeed 1.96 squared), we reject the null hypothesis and claim that the prevalence rates are different between MA and CA. We have two independent samples and , where (the variance is unknown), and n1=n2=n. For the hypothesis of Please derive the general formula for power calculation for the pooled variance t-test based on an effect size of EFF at the significance level of α. Recall - Definition: Effect size = EFF = (e.g. Eff=1)With a sample size of 26 per group, α = 0.05, and an estimated effect size ranging from 1 to 1.5, please calculate the power of your pooled variance t-test.Solution: T.S : =At α=0.05, reject in favor of iff Power = 1-β = P(reject |) = ==≈ (Effect size =)With n = 26, α = 0.05, Eff = 1 to 1.5, the power is calculated as follows:Power (Eff = 1) = PT≥t50,0.05-1*262Ha:μ1-μ2=?= PT≥1.676-3.606Ha:μ1-μ2=?=PT≥-1.93Ha:μ1-μ2=?By our t-table, we estimate that the above power is between 95% and 97.5%.(In fact if you check with R, the above power is about 97%)Power (Eff = 1.5) = PT≥t50,0.05-1.5*262Ha:μ1-μ2=?= PT≥1.676-5.409Ha:μ1-μ2=?=PT≥-3.733Ha:μ1-μ2=?By our t-table, we estimate that the above power is greater than 99.95%.(In fact if you check with R, the above power is about 99.98%)Note: the T statistic above follows a t-distribution with 50 (=26+26-2) degrees of freedom.Therefore we conclude that the power will range from 95% to 99.95% for a given effect size ranging from 1 to 1.5. ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download