10 Hypothesis testing - University of Arizona

10 Hypothesis testing

10.1 Introduction

In this chapter we will study hypothesis testing for a population parameter. There are other type of hypothesis testing in statistics.

We will always have two hypotheses: the null hypothesis, H0, and the alternative hypothesis, Ha. Depending on the data, we will either reject the null hypothesis and accept the alternative hypothesis or not reject the null hypothesis. We start with an example.

Example: In an election between two candidates it takes 50% or more of the votes to win. McSally is one of the candidates. We think she is going to lose and want to test this hypothesis. Let p be the fraction of the voters that will vote for McSally. Our hypotheses are

H0 : p = 0.5

(1)

Ha : p < 0.5

(2)

Note that we take the null hypothesis to be that p = 0.5. We take a poll with 15 people and sees how many say they will vote for her. How do we decide between the two hypotheses?

Example: A drug company has a drug, call it A, for lowering blood pressure. They have just developed a new drug, B, that they think is better. They want to test it to decide if they should quit selling drug A and starting selling drug B. Let ?A be the average amount a patient's blood pressure is lowered by drug A, and let ?B be the average amount a patient's blood pressure is lowered by drug B. Our hypotheses are

H0 : ?A = ?B

(3)

Ha : ?A > ?B

(4)

Again, note that the null hypothesis involves an equality.

Hypothesis testing is like a court trial (Wikipedia). The null hypothesis is that the defendant is not guilty. The alternative is that he or she is guilty. The evidence (data) must reach a certain level (beyond a reasonable doubt) for us to reject the null hypothesis in favor of the alternative.

1

10.2 Elements of a statistical test

Our hypothesis test involves the following elements:

1. Null hypothesis

2. Alternative hypothesis

3. A test statistic

4. Rejection region

We will always take the null hypothesis to be of the form

H0 : = 0

(5)

where 0 is a known number. The alternative hypothesis can be of three forms. The two sided alternative is

Ha : = 0

(6)

There are two possible one-sided alternatives:

Ha : > 0

(7)

or

Ha : < 0

(8)

The test statistic is (like all statistics) a function of the random sample. The rejection region is the set of values of the test statistic for which we reject the null hypothesis and so conlude the alternative hypothesis holds. If the test statistic does not fall in the rejection region, we do not reject the null hypothesis. However, we do not accept the null hypothesis. We just conclude that our data does not support the conclusion that the null hypothesis is false.

Example: Return to the election example. We have already stated the hypotheses. We poll n people and let Yn be the number of them that say they will vote for McSally. The test statistic is Yn. If Yn is small enough we should reject H0 and accept Ha. So the rejection region should be of the form Yn k. What should k be?

Example: Return to the drug example. We take a bunch of patients, randomly divide them into two groups and give one group drug A and the other

2

group drug B. We let Y A be the average reduction in the blood pressure in group A, Y A the average reduction in the blood pressure in group B. Our test statistic is Y A - Y B. If it is significantly bigger than 0 we should reject H0 and accept Ha. So the rejectiion region should be of the form Y A - Y B k. If H0 is true, then there is still some probability that Y A - Y B will be positive. So we should not take the rejection region to be just Y A - Y B > 0. Obviously k should be positive, but how large should it be?

There is always some chance that the random sample we get is "atypical" and so the conclusion we draw based on it is wrong. There are two possible types of errors.

Definition 1. If H0 is true and we reject it, this is called a type I error. We let be the probability of a type I error. is called the level of the test. If Ha is true and we accept H0, this is called a type II error. We let be the probability of a type II error.

Note that if H0 is true, then we know the value of . So we can compute the probability the test statistic falls in the rejection region, i.e., we can compute . It will just be a number. Of course it depends on the rejection region. But if we know that Ha is true, then we know something about , but we don't know the actual value of . So when we compute the probability will depend on . So is a function of .

Example: Return to the election example. Suppose we sample 15 people and we take the rejection region to be Y 2. What is ?

2

= P (Y 2|p = 0.5) =

15 (0.5)y(1 - 0.5)15-y = 0.00369

(9)

y

y=0

The value of depends on p. Suppose p = 0.3. Then we have

15

= P (Y > 2|p = 0.3) =

15 (0.3)y(1 - 0.3)15-y = 0.873

(10)

y

y=3

This is not good. We are almost certain to make a type II error even if p is 0.3. To make smaller we need to make the rejection region bigger. If we do this, then will increase. So there is a trade-off between and . A small rejection region makes smaller but larger. A larger rejection

3

region makes larger but smaller. To do better overall we need to make the sample size larger. More on this later.

Which is worse - a type I or type II error? That depends very much on the particular problem. In the election example, concluding she will win when she will not is comparable to concluding she will lose when she will in fact win. But consider this example. When a drug company starts tesing a new drug they may start with tests just to see if the drug has harmful side effects. To be extreme, suppose they want to test if the drug is actually fatal. Let p be the probability that the drug kills a patient. Take

H0 : p = 0

(11)

Ha : p > 0

(12)

Suppose Ha is true and we mistakenly accept H0 (a type II error) and conclude the drug safe. This is really bad. The company will kill people. On the other hand, if H0 is true and we mistakenly reject H0 (a type I error), then we will mistakenly conclude the drug is dangerous when it is in fact safe. So we will probably abandon the drug and the company may lose all the money it might have made from the new drug.

10.3 Common large sample tests

Review: one population mean, one population proportion, difference of two population means, difference of two population proportions.

Suppose our hypothesis involves a population mean ?. (So is ?.) We have a point estimator for ?, namely the sample mean Y . We could use Y as the test statistic. The mean of Y is ? and its variance is 2/n. If the sample size is large, then the CLT says that

Y -? (13)

2/n

is approximately a standard normal. Note that this involves the unknown parameter ?, so this is not a valid statistic.

Now suppose our hypotheses are

H0 : ? = ?0

(14)

Ha : ? > ?0

(15)

4

where ?0 is known. We define

Z = Y - ?0

(16)

2/n

Note that we used ?0, not ?. So Z is a valid statistic. It does not depend on the unknown ?. If the null hypothesis is true, then the distribution of Z is approximately standard normal. We should reject H0 if Y is significantly larger than ?0, i.e., if Z is significantly larger than 0. We can finally quantify what "significantly larger" should mean since we know the distribution of Z. If H0 is true the values of Z are usually between -2 and 2 and so a reasonble choice for the rejection region would be to reject H0 if Z > 2. Note that if the null hypothesis is false, then Z does not have a standard normal distribution since the mean ? is not ?0.

The rejection region is of the form Z > zc. The probability of a type I error is the probability that Z > zc when the null hypothesis is true. This is just P (Z > zc). So if we have a desired value of , this determines the cutoff zc. It should just be z where P (Z > z) = . Note that the rejection region Z > z is the same as

Y

> ?0 +

n

z

(17)

Example: An assembly line makes widgets. They claim that the number of defective widgets per day is on average 15. We suspect the number is higher than this. We randomly pick 36 days and see how many defective widgets were made each of those 36 days. The sample mean is 17.0 and the sample variance is 9.0. Test the companies claim with significance level = 0.05.

Let ? be the average number of defective widgets per day. We take our hypotheses to be

H0 : ? = 15

(18)

Ha : ? > 15

(19)

Our test statistic is

Z = Y -?0

(20)

/ n

5

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download