Chapter 6: Confidence Intervals and Hypothesis Testing ...

[Pages:10]Chapter 6: Confidence Intervals and Hypothesis Testing

Using Z for the CI and test of the population mean

Learning goals for this chapter:

Understand what inference is and why it is needed. Know that all inference techniques give us information about the population parameter. Explain what a confidence interval is and when it is needed. Calculate a confidence interval for the population mean when the population standard deviation is known. Know the assumptions that must be met for doing inference for the population mean. Calculate the needed sample size if you have a pre-determined margin of error. Know how to write hypotheses, calculate a test statistic and P-value, and write conclusions in terms of the story. Draw Normal curve pictures to match the hypothesis test. Understand the logic of hypothesis testing and when a hypothesis test is needed. Use the confidence interval to perform a two-sided hypothesis test. Explain sampling variability and the difference between the population mean and the sample mean. Explain the difference between the population standard deviation and the sample standard deviation. Know which technique is most appropriate for a story: confidence interval, hypothesis test, or simple summary statistics.

When we collect data from our sample, we can calculate sample statistics. However, usually we are interested in what is true for the whole population, not just for the sample. (Remember that a census is very hard and expensive to do well.) Why can't we just accept our sample mean or sample proportion as the official mean or proportion for the population? Every time we estimate the statistics x, p^ (sample mean and sample proportion), we get a different answer due to sampling variability.

Two most common types of formal statistical inference:

Confidence Intervals: when we want to estimate a population parameter

Significance Tests: when we want to assess the evidence provided by the data in favor of some claim about the population (yes/no question about the population)

Confidence Intervals allow us to estimate the population mean or population proportion.

1

The true mean or proportion for the population exists and is a fixed number, but we just don't know what it is. Using our sample statistic, we can create a "net" to give us an estimate of where to expect the population parameter to be. Confidence interval = net Population parameter = invisible, stationary butterfly We don't know exactly where the butterfly is, but from our sample, we have a pretty good estimate of the location.

If we just take a single sample, our single confidence interval "net" may or may not include the population parameter. However if we take many samples of the same size and create a confidence interval from each sample statistic, over the long run 95% of our confidence intervals will contain the true population parameter (if we are using a 95% confidence level).

We don't need to take a lot of random samples to recreate the sampling distribution with the population mean at its center. All we need is one Simple Random Sample of size n. Because of what we know about the sample mean distribution, we can use that one sample mean's confidence interval to infer what the population mean really is. If you increase the sample size (n), you decrease the size of your "net" (or your margin of error).

2

If you increase your confidence level (C), then you increase the size of your "net" (or your margin of error).

A smaller "net" is good because it gives you more information. It is a smaller range for where to expect your true population parameter. Freeman applet: Go to course website, Freeman link, statistical applets, confidence interval. Confidence intervals look like: estimate margin of error Confidence Interval for a Population Mean, :

x z* x n

Where z* is the value on the standard normal curve with area C between ?z* and z*. (Table D at the back of the book also contains more z* values on the bottom row.)

z* 1.645 1.960 2.576 C 90% 95% 99% Remember from Ch. 5 that the mean and standard deviation for a sample mean are:

3

xx x

xn

Also remember that if X is normally distributed then x will be too, and if n is large, the sample mean will be approximately normally distributed even if X is not normally distributed (Central Limit Theorem). What if your margin of error is too large? Here are ways to reduce it:

Increase the sample size (bigger n) Use a lower level of confidence (smaller C) Reduce x Sample Size, n, for Desired Margin of Error, m:

n z* x 2 m

Note that it is the sample size, n, that influences the margin of error. The population size has nothing to do with it.

4

Be careful!!!! You can only use the formula x z * x under certain circumstances: n

Data must be an SRS from the population. Do not use if the sampling is anything more complicated than an SRS. Data must be collected correctly (no bias). The margin of error covers only random sampling errors. Undercoverage and nonresponse are not covered. Outliers can have a big effect on the confidence interval. (This makes sense because we use the mean and standard deviation to get a CI.) You must know the standard deviation of the population, x . Example: A questionnaire of drinking habits was given to a random sample of fraternity members, and each student was asked to report the # of beers he had drunk in the past month. The sample of 30 students resulted in an average of 22 beers with a population standard deviation of 9 beers. a) Give a 90% confidence interval for the mean number of beers drunk by

fraternity members in the past month.

b) Is it true that 90% of the fraternity members each month drink the number of beers that lie in the interval you found in part (a)? Explain your answer.

No, this is the confidence interval for the population mean, not for individual population members. If we take many 30-frat member samples and make a confidence interval from each sample, 90% of these confidence intervals will contain the true population mean # of beers drunk in a month by fraternity members. c) What is the margin of error for the 90% confidence interval?

d) How many students should you sample if you want a margin of error of 1 for a 90% confidence interval?

5

Hypothesis Testing

To do a significance test, you need 2 hypotheses:

H0, Null Hypothesis: the statement being tested, usually phrased as "no effect" or "no difference".

Ha, Alternative Hypothesis: the statement we hope or suspect is true instead of H0.

Hypotheses always refer to some population or model. Not to a particular outcome.

Hypotheses can be one-sided or two-sided.

One-sided hypothesis: covers just part of the range for your parameter

H0: = 10

OR

H0: = 10

Ha: > 10

Ha: < 10

Two-sided hypothesis: covers the whole possible range for your parameter

H0: = 10 Ha: 10

Even though Ha is what we hope or believe to be true, our test gives evidence for or against H0 only.

We never prove H0 true, we can only state whether we have enough evidence to reject H0 (which is evidence in favor of Ha, but not proof that Ha is true) or that we don't have enough evidence to reject H0.

A test statistic measures compatibility between the H0 and the data.

P-value: the probability, computed assuming that H0 is true, that the test statistic would take a value as extreme or more extreme than that actually observed due to random fluctuation. It is a measure of how unusual your sample results are.

The smaller the P-value, the stronger the evidence against H0 provided by the data.

Calculate the P-value by using the sampling distribution of the test statistic (only the normal distribution for Chapter 6).

6

Compare P-value to a significance level, . If the P-value , we can reject H0. If you can reject H0, your results are significant. If you do not reject H0, your results are not significant. The 4 steps common to all tests of significance: 1. State the null hypothesis H0 and the alternative hypothesis Ha. 2. Calculate the value of the test statistic (z-score in Chapter 6). 3. Draw a picture of what Ha looks like, and find the P-value (use the normal table in Chapter 6). 4. State your conclusion about the data in a sentence, using the P-value and/or comparing the P-value to a significance level for your evidence.

Z-Test for a Population Mean To test the hypothesis H0: = 0 based on an SRS of size n from a population with unknown mean and known standard deviation ,

compute the test statistic: z x 0 /n

the P-values for a test of H0 against:

Ha: > 0 is P( Z z)

Ha: < 0 is P( Z z )

Ha:

0 is 2* P( Z | z| )

7

These P-values are exact if the population is Normally distributed, and are approximately correct for large n in other cases. Examples: 1. Last year the government made a claim that the average income of the American people was $33,950. However, a sample of 50 people taken recently showed an average income of $34,076 with a population standard deviation of $324. Conduct a significance test to see if the true population mean is more than the government's claim. Use = 0.01.

2. Suppose that the cellulose content of alfalfa hay in the population has a standard deviation of 8 mg. A sample of 15 cuttings has a mean cellulose content of 145 mg. A previous study claimed that the mean cellulose content was 140 mg. Perform a hypothesis test to determine if the mean cellulose content is different from 140 mg if = 0.05.

Using confidence intervals to do hypothesis tests You can use a CI to do a HT only if 2 conditions are met:

Your alternative hypothesis has a (is two-sided). Your confidence level and your significance level add to 100% (e.g., an of 0.05 + a confidence level of 95% = 100%). You check to see if your null hypothesis could be true at the same time your confidence interval is true.

8

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download