Spring 1998



Handout 7

Chapter 7: Statistical Intervals and

Chapter 8:Tests of Hypotheses based on a Single Sample

A point estimate of a population characteristic is a single number that is based on sample data and represents a plausible value of the characteristic.

The best statistic (MVUE) is the unbiased statistic with the smallest standard deviation.

Since the point estimate is a single number, it does not provide information about the precision and reliability of estimation.

A confidence interval for a population characteristic (parameter) is an interval of plausible values for the characteristic. It is constructed so that, with a chosen degree of confidence, the value of the characteristic will be captured inside the interval. The confidence level, 1-(, associated with a confidence interval estimate the success rate of the method used to construct the interval.

If we repeatedly sample from a population and calculate a confidence interval each time with the data available, then over the long run the proportion of the confidence intervals that actually contain the true value of the population characteristic will be 100(1-()% (95%, 90%, or 99% for (=0.05, 0.10, or 0.01, respectively).

The general form of a confidence interval:

(point estimate for a specified statistic) ( (critical value).(standard error for the point estimate).

What is the point estimator for parameters, (, (2, p? _____________

Empirical Rule tells you about 95% of all our values for [pic] will be within 1.96 standard deviation from the mean.

• 1-( when you compute 95% confidence interval is 0.95

• ( when you compute 95% confidence interval is 0.05

• [pic] when you compute 95% confidence interval is 1.96

A hypothesis is a claim or statement either about the value of a single population characteristic or about the values of several population characteristics.

Statistical testing involves two complementary hypotheses:

H0: null hypothesis

Ha : alternative hypothesis

Both of them are based on population characteristics.

One-sided (One-tailed) test:

Lower tailed: H0: population characteristics ( claimed constant value

(Left-sided) Ha: population characteristics < claimed constant value

Upper tailed: H0: population characteristics ( claimed constant value

(Right-sided) Ha: population characteristics > claimed constant value

Two-sided (Two-tailed) test: H0: population characteristics = claimed constant value

Ha: population characteristics ( claimed constant value

Example 1: Identify the population characteristics (parameter) and then H0 and Ha for the following.

• The burning rate of propellant is an important product characteristic. Specifications require that the mean burning rate must be 50 cm/s.

• The sugar content of the syrup in canned peaches is normally distributed and the variance is thought to be exceeding 18 mg2.

• Consider the defective circuit data. Test the claim that the fraction of defective units produced is less than 0.05.

• Pizza hut, after test-marketing a new product called Bigfoot Pizza, concluded that the introduction of The Bigfoot nationwide would increase their average sales by more than their usual 14.

• A television manufacturer claims that at least 90% of its sets will need no service during the first three years of operation.

• Water samples are taken from water used for cooling as its being discharged from a power plant into river. It has been determined that as long as the mean temperature of the discharged is at most 150(F, there will be no negative effects on the river's ecosystem.

Hypothesis testing involves two complementary actions or choices, reject H0 and fail to reject H0. A major concern in hypothesis testing is controlling the incidence of the two kinds of errors:

( = P(Type I error) = P(reject H0 when it is true)

( = P(Type II error) = P(fail to reject H0 when it is false)

1-( = 1-P(Type II error) = P(reject H0 when it is false)

( is also called as the significance level and 1-( as the power of the test.

The following table summarizes the decisions:

Acts

|Events |Fail to reject H0 |Reject H0 |

|H0 is true |Correct Decision |Type I error |

|H0 is false |Type II error |Correct Decision |

Test statistic: A function of the sample data on which the decision will be based.

Rejection region: The set of all test statistic values for which H0 will be rejected.

Example 2: The following hypotheses are to be tested: [pic] versus [pic]. Assume that the population standard deviation is (=28 and the sample size is n=100. The following decision rule applies:

Fail to reject [pic] if [pic]

Reject [pic] if [pic]

(a) Compute the type I error probability, ( when (=100 and type II error probability, ( when (=110.

(=0.0764 and (=0.0162

(b) Change the sample size from 100 to 200. Recompute the type I error probability, ( when (=100 and type II error probability, ( when (=110.

(=0.0217 and (=0.0012

(c) Is the ( and ( in part (b) are larger than the ( and ( in part (a)

How do we improve ( and (?

Example 3 (Exercise 8.9): Two different companies have applied to provide cable TV in a certain region

p=proportion of all potential subscribers who favor the first company over the second one.

Test [pic]versus [pic]based on random sample of 25 individuals

X: the number in the sample who favor the first company over the second one

a) Which of the following is the possible rejection region?

R1={x: x ( 7 or x ( 18}

R2={x: x ( 8}

R3={x: x ( 17}

b) What is the probability distribution of X?

c) What is the probability of type I error?

d) What is the probability of type II error when p=0.3?

e) What is the power when p=0.3?

f) What would you conclude if 6 out of 25 individuals favored company 1?

g) What would you conclude if 6 out of 25 individuals favored company 2?

P-value : The probability that the test statistic will take on a value that is at least as extreme as the observed value of the statistics when the null hypothesis is true. It is the smallest level of significance at which the null hypothesis would be rejected. It is customary to call the test statistic (and the data) significant when the null hypothesis is rejected.

Population characteristics: Population mean, (

(0 is the claimed constant.

[pic] is the sample mean

[pic] and [pic] are the population and sample standard deviation of [pic], respectively.

|Assumption |normal population and ( is |Unknown population and ( is unknown with a large|normal population and ( is unknown with |

| |known |sample size (n >40) by CLT |small sample size (n ( 40) |

|Test statistics |[pic] |[pic] |[pic] |

Decision can be made in one of the three ways:

a. Let z* or t* be the computed test statistic values.

| |if test statistics is z |if test statistics is t |

|Lower tailed test |P-value = P(zt*) |

|Two tailed test |P-value = 2P(z>|z*|)=2P(z |t*| )=2P(t (

b. Rejection region for level ( test:

| |if test statistics is z |if test statistics is t |

|Lower tailed test |z ( -z( |t ( -t(;n-1 |

|Upper tailed test |z ( z( |t ( t(;n-1 |

|Two tailed test ||z| ( z(/2 ||t| ( t(/2;n-1 |

c. Confidence interval for the population mean, (

| Normal population, |Unknown population, |Normal population, |

|( is known |( is unknown, |( is unknown, |

| |large sample (n >40) by CLT |small sample (n ( 40) |

|[pic] |[pic] |[pic] |

A large sample upper confidence bound for ( is [pic]

A large sample lower confidence bound for ( is [pic]

Choosing the sample size: With the known desired confidence level and interval width, we can determine the necessary sample size. Let X1, X2, ....,Xn be a random sample from a normal population with the unknown population mean ( and the known population standard deviation (, The width of the interval is w=[pic]

The bound on the error estimation is B=[pic].

I mean [pic] will be within [pic] of (.

The sample size required to estimate a population mean ( to within [pic] with 100(1(()% confidence is n=[pic]=[pic].

Example 4:Each of the following is a confidence interval for true average amount of time spent by the patients using physical therapy device using the sample data: (10.90, 25.44), (13.58, 22.76). Both intervals are computed using the sample data with the only difference being the confidence level.

a) What is the value of the sample mean time spent by the patients using physical therapy device?

(b) The confidence level for one of these intervals is 95% and for the other is 99%. Which of the intervals has the 95% confidence level and why?

Example 5: Suppose we want to estimate the average # of violent acts on TV per hour for a specific network. Data was collected from viewing random selection of 50 prime time hours and average of 11.7 violent acts were recorded. Suppose it is known that (=5 and population distribution is normal.

The 95% CI for ( is (10.3141 , 13.0859)

The 95% confidence interval for ( if 100 prime time hours had been viewed where the same mean and the variance obtained is (10.72 , 12.68)

The 90% CI for ( is (10.5368 , 12.8632)

The width of the 90% confidence interval for ( is 2.3264

The bound on the error estimation of the 90% confidence interval for ( is 1.1632

Example 6: Investigators would like to estimate the average taxable income of apartment dwellers to within $500, using a 95% CI for the normally distributed data. Suppose that the previous studies show that standard deviation is $8000. How many people should they study? (Answer: 984)

Example 7: The brightness of a television picture tube can be evaluated by measuring the amount of current required to achieve a particular brightness. An engineer has designed a tube that he believes will require 300 microamps of current to produce the desired brightness level. A sample of 10 tubes results in the average of 317.2 and the standard deviation 15.7. Data follows the normal distribution.

a) Using 95% confidence interval, did he achieve the desired brightness?

b) Hypothesize the belief and test to see if he achieved the desired brightness?

Specify the hypothesis

Show test statistics=3.46

Either compute the P-value (=between 0.002 and 0.01) or by looking at the rejection region, make a decision (Fail to reject H0/Reject H0)

Write the conclusion

Example 8: I want to see how long on average, it takes Drano to unclog a sink. In a recent commercial, the stated claim was that it takes on average, 15 minutes. I wanted to see if that claim was true, so I tested Drano on 64 randomly selected sinks. I found that it took an average of 18 minutes with standard deviation of 2.5 minutes. Was their claim false?

99% CI for ( is (17.1953 , 18.8047)

90% CI for ( is (17.4859 , 18.5141)

How would you answer this using the hypothesis testing?

Specify the hypothesis

Show test statistics=9.6

Either compute the P-value (=0) or by looking at the rejection region, make a decision (Fail to reject H0/Reject H0)

Write the conclusion

Would my answer be different if I tested Drano on 25 randomly selected sinks and I found that it took an average of 18 minutes with standard deviation of 2.5 minutes?

99% CI for ( is (16.6015 , 19.3985)

90% CI for ( is (17.1445 , 18.5555)

How would you answer this using the hypothesis testing?

Specify the hypothesis

Show test statistics= 6

Either compute the P-value ( 0.20) or by looking at the rejection region, make a decision (Fail to reject H0/Reject H0)

Write the conclusion

Example 10: Determine the confidence level for each of the following large sample one-sided confidence bounds.

(a) Upper bound: [pic] (Answer: 0.8238)

(b) Lower bound: [pic] (Answer: 0.9599)

Would your answer be different in small samples?

Example 11: An aptitude test has been used to test the ability of fourth graders to reason quantitatively. The test is supposed to be calibrated so that the scores are normally distributed with a mean of 50 and standard deviation of 10. It is suspected that the mean score is now higher than 50, although ( remains the same. The suspicion may be tested based on a sample of students who have been exposed to a certain amount of computer-assisted learning. If the test is administered to a random sample of 500 fourth graders and the sample mean is found to be 51.07, is the suspicion confirmed? Does your decision and conclusion change if the significance level is 0.01 or 0.05?

Specify the hypothesis

Show test statistics=2.39

Either compute the P-value or by looking at the rejection region, make a decision (Fail to reject H0/Reject H0)

Write the conclusion

Example 12: The average breaking strength of yarn used in manufacturing drapery material is required to be at least 100 psi. Past experience indicated that the standard deviation of breaking strength is 2 psi. A random sample of nine specimens is tested, and the average breaking strength is found to be 98 psi.

a. Should the fiber be judged acceptable?

Specify the hypothesis

Show test statistics=-3

Either compute the P-value (=0.0013) or by looking at the rejection region, make a decision (Fail to reject H0/Reject H0)

Write the conclusion

Does your decision and conclusion change if the significance level is 0.01 or 0.05?

b. Find a 95% two sided C.I on the true mean breaking strength.

(96.6933 , 99.3067)

A General Large Sample Confidence Interval

When the estimator[pic] satisfies the following properties,

a. The estimator has approximately a normal population distribution

b. It is at least unbiased

c. standard deviation of the estimator is known

The confidence interval for ( can be constructed as [pic] where [pic]

Example: large sample confidence interval for the parameter ( in Poisson distribution is [pic] where [pic]

Population characteristics: Population proportion, p

p0 is the claimed constant and [pic]is the standard deviation of p.

Test Statistics: [pic].

Need to check np0 ( 10 and n(1-p0) ( 10 to be able use this test statistic.

Decision can be made in one of the three ways:

| |a. Let z be the computed test statistic value. Reject H0 when P-value ( ( and fail to |b. Rejection region for level ( |

| |reject H0 when P-value > (. |test: reject H0 if |

|Lower tailed test|P-value =[pic] |[pic] |

|Upper tailed test|P-value =[pic] |[pic] |

|Two-tailed test |P-value =[pic] |[pic] |

c. If n is sufficiently large, 100(1-()% large sample confidence interval for p is [pic] where [pic]

Check if [pic] and [pic] to see if you have a large sample to use this confidence interval. Otherwise, there is a formula (7.10) in your textbook, which can be used without checking if it is a large sample. I mean formula (7.10) can be used for large and small samples.

There is a table on page 341 of your textbook (336 for 5th edition) which gives you the formulas for computing type II error probabilities with fixed type I error probability and computing sample size when you know the probability of both errors.

Small sample tests will be discussed in class.

Choosing the sample size: With the known desired confidence level and interval width, we can determine the necessary sample size. Bound on the error estimation is [pic]. I mean [pic] will be within [pic] of p. The sample size required to estimate a population proportion p to within an amount B=[pic] with 100(1(()% confidence is n=[pic]. The same formula can be written using the interval width, w=[pic] then n=[pic].

The conservative sample size can be found when [pic]=[pic]=0.5

What is different in one-sided confidence intervals? Discussion

Example 13: We are interested in proportion of all students enrolled in Stat211 who listen to country music. Using our class as random sample from Stat211 students, we see that ___________ out of ___________ listen to country music. Estimate the true proportion of all Stat211 students that listen to country music using 90% confidence interval.

What parameter are we estimating?_______________

Example 14: Scripps News service reported that 4% of the members of the American Bar Association (ABA) are African American. Suppose that this figure is based on a random sample of 400 ABA members.

(a) Is the sample size large enough to justify the use of the large-sample confidence interval for a population proportion?

(b) Construct and interpret a 90% confidence interval for the true proportion of all ABA members who are African American. (Answer: (0.0239 , 0.0561))

Example 15: I want to estimate the proportion of freshmen Aggies who will drop out before graduation. How many Aggies should I include in my study in order to estimate p within 0.05 with 95% confidence? (Answer: 385)

Example 16: Drug testing of job applicants is becoming increasingly common. The associated press reported that 12.1% of those tested in California tested positive. Suppose that this figure had been based on a sample size 600, with 73 testing positive. Does this support a claim that more than 10% of job applicants in California test positive for drug use?

Specify the hypothesis

Show test statistics=1.77

Either compute the P-value (=0.0384) or by looking at the rejection region, make a decision (Fail to reject H0/Reject H0)

Write the conclusion

Does your decision and conclusion change if the significance level is 0.01 or 0.05?

Example 17: Let p denote the probability that a coin will land heads side up. The coin is tossed 50,000 times and 25,250 heads result. Using a significance level of 0.05, would you reject the assertion that the coin is fair?

Specify the hypothesis

Show test statistics=2.24

Either compute the P-value (=0.025) or by looking at the rejection region, make a decision (Fail to reject H0/Reject H0)

Write the conclusion

95% confidence interval for p is (0.5006 , 0.5094)

99% confidence interval for p is (0.4992 , 0.5108)

Does your decision and conclusion change if the significance level is 0.01?

Example 18: A random sample of 50 suspension helmets used by motorcycle riders and automobile race-car drivers was subjected to an impact test, and on 18 of these helmets some damage was observed. Do the data support the claim that the true proportion of helmets of this type that would show damage from this test is less than 0.50, using (=0.05? Does your decision and conclusion change if the significance level is 0.01?

Specify the hypothesis

Show test statistics=-1.98

Either compute the P-value (=0.0239) or by looking at the rejection region, make a decision (Fail to reject H0/Reject H0)

Write the conclusion

A Prediction Interval for a Single Future Value:

Let X1, X2, ...,Xn be a random sample from a normal population distribution and we wish to predict the value of Xn+1, a single future observation. 100(1-()% prediction interval for Xn+1 is [pic] where[pic]

Example 19: What is the 99% prediction interval for the weight change of an individual student from the population distribution in example 9? (Answer: (-4.3992 , 5.2992))

Tolerance Intervals: Let k be a number between 0 and 100. A tolerance interval for capturing at least k% of the values in a normal distribution with a confidence level 100(1-()% has the form [pic]

Table A.6 is designed for the tolerance critical values where k=90, 95, 99 and (=0.05 ,0.01 in one and two-sided intervals.

Example 20: Use example 9 and calculate an interval that includes at least 95% of the student weight changes in the population distribution using a confidence level of 99%. (Answer: (-5.355 , 6.255))

Population characteristics: Population Variance, (2 or Standard Deviation, (:

The population of interest is normal, so that X1, X2,.....,Xn constitutes a random sample from a normal distribution with parameters ( and (2.

[pic]and[pic]are the claimed constants for the population variance and the standard deviation, respectively

[pic]is the sample variance.

Test Statistics: [pic]=[pic]

Decision can be made in one of the two ways:

(a) Let [pic]be the computed test statistic value. Reject H0 for level ( test if

|Lower tailed test |Upper tailed test |Two-tailed test |

|[pic][pic] |[pic][pic] |

There is also P-value method which will be discussed in class.

(b) The population of interest is normal, so that X1, X2, ...,Xn constitutes a random sample from a normal distribution with parameters ( and (2. Then the random variable

[pic] has a chi-squared ([pic]) probability distribution with n-1 degrees of freedom. 100(1-()% confidence interval for (2 is [pic] where [pic].

The details of the chi-squared ([pic]) probability distribution will be discussed in class and the table of critical values (Table A.7) will be demonstrated.

100(1-()% upper confidence bound for (2 is [pic].

100(1-()% lower confidence bound for (2 is [pic].

Example 21: Determine the following:

(a) The 95th percentile for the chi-squared distribution with n=20.

(b) The 5th percentile for the chi-squared distribution with n=20.

(c) P(10.117([pic]( 30.143) where [pic] is a chi-squared r.v. with n=20.

(d) P([pic]35.478) where [pic] is a chi-squared r.v. with n=22.

Example 22: An automatic filing machine is used to fill bottles with liquid detergent. A random sample of 20 bottles results in a sample variance of fill volume 0.0153 (fluid ounces)2. If the variance of fill volume exceeds 0.01 (fluid ounces)2, an unacceptable proportion of bottles will be underfilled and overfilled. Is there evidence in the sample data to suggest that the manufacturer has a problem with underfilled and overfilled bottles. Use a significance level 0.05 and assume that fill volume has a normal distribution to answer the question. Does your decision and conclusion change if the significance level is 0.01?

Specify the hypothesis

Show test statistics=29.07

By looking at the rejection region, make a decision (Fail to reject H0/Reject H0)

Write the conclusion

95% confidence interval for the variance of the fill volume is (0.0089,0.0326)

99% confidence interval for the variance of the fill volume?

Example 23 (Exercise 7.46, the 6th edition and Exercise 7.44, the 5th edition):

a) Is it plausible to assume that the data come from a normal population distribution?

Variable n Mean Median TrMean StDev SE Mean

turbidity 15 25.313 25.800 25.438 1.579 0.408

Variable Minimum Maximum Q1 Q3

turbidity 21.700 27.300 24.100 26.700

b) Calculate a 95% CI for the population standard deviation of turbidity.

95% CI for ( is [pic]=(1.16 , 2.49)

where [pic]=26.119 and [pic]=5.629

Discussion on finding the confidence interval for the linear combination of the population means

Example 24 (Exercise 7.53, the 6th edition Exercise 7.51, the 5th edition): Four different groves of fruit trees are selected for experimentation. The first three groves are sprayed with pesticides and the fourth is treated with the ladybugs. We like to measure the difference in true average yields between treatment with pesticides and treatment with ladybugs. Compute the 95% CI for [pic] where [pic] is the ith true average yield.

|Treatment |[pic] |[pic] |[pic] |

|1 (pesticide) |100 |10.5 |1.5 |

|2 (pesticide) |90 |10.0 |1.3 |

|3 (pesticide) |100 |10.1 |1.8 |

|4 (ladybugs) |120 |10.7 |1.6 |

[pic]=-0.5

[pic]

Estimated [pic]=0.0295

95% CI for [pic] is [pic]=(-0.8366 , -0.1634)

ADDITIONAL INFORMATION

Confidence Interval for a Population Mean, (

(1) Let X1, X2, ....,Xn be a random sample from a normal population with the unknown population mean ( and the known population standard deviation (, then 100(1-()% confidence interval for ( is [pic] where [pic]

Thus, in 95% of all possible samples, ( will be captured in the following calculated confidence interval: [pic]

(2) Large Sample Confidence Interval for (: Let X1, X2, ...,Xn is a random sample from a population distribution with mean, ( and standard deviation, (. For the large sample size n, the CLT implies that [pic] has approximately a normal distribution for any population distribution. The value of the population standard deviation ( may not be known. Instead, the value of the sample standard deviation s may be known. If n is sufficiently large (n>40), 100(1-()% large sample confidence interval for ( is [pic] where [pic]

Thus, in 95% of all possible samples, ( will be captured in the following calculated confidence interval: [pic]

(3) Small Sample Confidence Interval for (: When the sample size is small (n≤40), we have to make specific assumptions to find the confidence intervals.

Assumption: The population of interest is normal, so that X1, X2, ...,Xn constitutes a random sample from a normal distribution with both ( and ( unknown.

When the sample mean of a random sample of size n from a normal distribution with mean (, the random variable [pic] has a probability distribution called a t-distribution with n-1 degrees of freedom (Properties of t-distribution: discussion and t-distribution, Table A.5).

100(1-()% confidence interval for ( is [pic] where [pic]

Thus, in 95% of all possible samples, ( will be captured in the following calculated confidence interval: [pic]

-----------------------

[pic]

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download