SECTION 6 - Purdue University - Choosing the correct statistical test

Chapter 6 Sections 6.1, 6.2, 6.3

Statistical inference is the next topic we will cover in this course. We have been preparing for this by 1) describing and analyzing data (graphs and plots, descriptive statistics), 2)discussing the ways to find and/or generate data (studies, samples, experiments) and 3) we have studied sampling distributions. We are now ready for statistical inference.

The purpose of statistical inference is to draw conclusions about population parameters based on data which came from a random sample or from a randomized experiment.

Data (statistics) are used to infer population parameters.

We will learn in this chapter the two most prominent types of statistical inference,

• confidence intervals for estimating the value of a population mean and

• tests of significance which weigh the evidence for a claim concerning the population mean.

Section 6.1

Estimating the population mean, µ, with a stated confidence:

A confidence interval for the population mean, µ, includes a point estimate and a margin of error.

• The point estimate is a single statistic calculated from a random sample of units. For example, [pic], the sample mean, is a point estimate of [pic], the population mean. However sample means fluctuate, so we need to adjust our point estimate by adding and subtracting a margin of error, thus creating a confidence interval.

The following general formula may be used when the population sigma is known:

___ % Confidence Interval: [pic] margin of error , where

Margin of error = [pic]

For the Normal Distribution the following values for Z* apply:

For a 90% Confidence Interval Z* = 1.645

For a 95% Confidence Interval Z* = 1.960

For a 99% Confidence Interval Z* = 2.576

Example:

You want to estimate the population mean SAT Math score for the high school seniors in California. You give a test to a simple random sample of 500 high school seniors in CA. The mean score for your sample, X =461. The population standard deviation is known to be σ = 100.

For the SAT Math scores, a 95 % confidence interval for µ would be :

This Confidence Interval was calculated using a procedure which gives a “correct” interval (Contains the Population Mean), 95% of the times it is used.

Another example:

1. Suppose we wish to estimate [pic], the population mean driving time between Lafayette and Indianapolis. We select a SRS of n = 25 drivers. The observed sample mean is [pic] = 1.10 hours. Let’s assume that we know the population standard deviation of X is [pic] hours.

A 95 % confidence interval for µ would be:

Again, the procedure gives a “correct” interval 95% of the times it is used.

Suppose we selected another sample of 25 driving times and obtained an [pic] = 1.00 hours. If we calculate a 95% confidence interval for [pic] based on [pic] we get (0.804, 1.196) which is a different interval estimate.

Which interval is correct? We don’t know.

If we repeatedly selected SRS of 25 drivers, and for each SRS, we calculated a 95% confidence interval for [pic], the population mean, in the long-run, 95% of the intervals will contain the true value of [pic]. They will all be different, but 95 % of them will include the true mean and will therefore be considered correct. And the other 5% will not include the true mean . We have no way to know whether a given Confidence Interval is “correct” or “incorrect”.

Confidence level vs Width of Confidence Interval:

Suppose X, Bob’s golf scores, have a normal distribution with unknown population mean but we believe the population standard deviation [pic] = 3. A SRS of n=16 units is selected and a sample mean of [pic] = 77 is observed.

a. Calculate a 90% confidence interval for [pic]. Use Z* = 1.645

b. Calculate a 95% confidence interval for [pic]. Use Z* = 1.960

c. Calculate a 99% confidence interval for [pic]. Use Z* = 2.576

As you can see from these calculations, raising the confidence level requires a larger Z* value, which increases the margin of error and produces a wider confidence interval. There is a trade-off between the precision of our estimate and the confidence we have in the result.

Higher confidence level requires a wider interval.

The margin of error also depends on sample size.

A larger sample size will result in a smaller margin of error. In fact, quadrupling the sample size will cut the margin of error in half.

Calculating the sample size for a desired margin of error:

The confidence interval for a population mean will have a specified margin of error m when the sample size is

[pic]

Example:

1. You are planning a survey of starting salaries for recent liberal arts major graduates from your college. From a pilot study you estimate that the population standard deviation is about $8000. What sample size do you need to develop a 95% confidence interval with a margin of error of $500 maximum?

2. Always round a sample size number with any decimals up to the next whole number. Never drop the decimals and round down.

Some Cautions:

• The above formulas do not correct the data for any unknown bias. Consequently, if the data are biased, then ANY inferences based on those data are also biased. This includes biases arising from nonresponse, undercoverage , response error or hidden bias in experiments.

• Because the sample mean is not resistant, confidence intervals are not resistant to outliers.

• When the population being sampled is not normally distributed, the sample size needs to be at least 30 in order to have the sample mean be normally distributed. This is the Central Limit Theorem. Always plot the data to check normality.

• Typically we do not know the population standard deviation, [pic]. When σ is not known we will use the t procedures which will be introduced in Chapter 7.

Interpretation Of A Confidence Interval:

Any value in a confidence interval is considered a possible value for µ, including the end points. Any value not included in the confidence interval is considered an unlikely value for µ.

Section 6.2 TESTS OF SIGNIFICANCE (HYPOTHESIS TESTING)

The second type of statistical inference is a significance test which assesses evidence provided by data regarding some claim about the population mean.

Based on a random sample from the population, we want to determine if a the population mean has changed upward, downward, or in either direction. Because the null hypotheses represents the established or accepted mean value, we want to use the data to determine, statistically, if we can reject the null hypothesis in favor of the alternative hypothesis.

The four steps for a Test of Significance/Hypothesis Tests:

Step 1. State the Null and Alternative Hypothesis:

Null Hypothesis [pic]: The statement being tested in a statistical test is called the null hypothesis. The test is designed to assess the strength of the evidence against the null hypothesis. Usually the null hypothesis is a statement of “no effect” or “no difference” or “status quo”.

[pic]

Alternative Hypothesis [pic]: The claim about the population mean that we are trying to find evidence for. Choose one of the following hypotheses.

[pic] one side right

[pic] one side left

[pic] two side

Step 2. Find the test statistic:

If [pic] is the value of the population mean [pic] specified by the null hypothesis, the one-sample z statistic is

[pic]

Step 3. Calculate the p-value.

For one sided left tests: the P Value = P(Z ≤ z), the area in the left tail.

For one side right tests: the P Value = P(Z ≥ z), the area in the right tail.

For two sided tests: the P Value = 2P(Z ≥ |z|) , the area in the right and left tail

Step 4. State conclusions in terms of the problem. The value of α defines how much evidence we require to reject Ho. Then, compare the p-value to the α level. This is usually stated in the problem.

If p-value = or < α, then reject [pic]. Strong evidence exists against Ho.

If p-value > α, then fail to reject [pic]. Insufficient evidence exists…..

Generally the value chosen for α is one of the following three:

• α = .01 strictest burden of proof of these three values

• α = .05

• α = .10 easiest burden of proof of these three values.

Your conclusion should be in this form:

If the p-value =< α we say that we have sufficient evidence to reject the null hypothesis in favor of the alternate hypothesis, using the words of the original problem.

If the p-value > α we say that we do not have sufficient evidence to reject the null hypothesis, using the words of the original problem.

Even though [pic] is what we hope or believe to be true, our test gives

evidence for or against [pic]only.

We never prove [pic] true; we can only state whether we have enough

evidence to reject [pic] (which is evidence in favor of [pic], but not proof

that [pic] is true) or that we don’t have enough evidence to reject [pic].

Example 1, A one sided hypothesis test:

Bob’s golf scores are historically normally distributed with µ = 77 strokes and σ = 3 strokes. Bob has recently made two “improvements” to his game, and he thinks his scores should be lower.

Bob has played 9 rounds since these improvements. His scores are:

77 73 74 78 78 75 75 74 71 Sample Mean = 75

Does this data provide sufficient evidence to conclude that Bob’s population mean is reduced, ie, µ < 77?

Null Hypothesis: Ho: µ = 77 Status Quo or Established Norm

Alternate Hypothesis: Ha: µ < 77 Improvement

Since we know that Bob’s golf scores are normally distributed, the sampling distribution of the sample mean of 9 rounds must also be normally distributed.

The standard deviation of [pic] is [pic] = 1 stroke.

The logic of the hypothesis test:

If [pic][pic] is true, then [pic]

If [pic] is false and [pic] is true, then [pic]

for some value [pic].

• Values of [pic] close to 77 would tend to support [pic] and values that are much lower than 77 would provide evidence against [pic]and in favor of Ha.

• From the sample of Bob’s last 9 scores we get a sample mean of [pic]. Can we conclude that we should reject [pic] in favor of [pic]?

• We need to calculate the P-value. Assuming that Ho is true, we calculate the probability [pic]

• The calculation says that if Ho is true, the probability that Xbar would be < 75 solely due to random chance is .0228, or 2.28%.

• For this example let us use α = .05.

• Because the probability of obtaining a sample [pic] is less than α we would reject[pic]and conclude it is more likely that µ < 77.

Example 2.

What if Bob had only obtained the first 5 scores. In this case, we get a sample mean of [pic] and [pic]. Then the P Value [pic] which is much greater than α, meaning that an Xbar value of 76 or lower could occur by chance alone 22.66% of the time when Ho is true. This is not strong evidence that the population mean has changed.

• So we would fail to reject [pic].

Example 3, A two sided hypothesis test:

Bob has a driver’s license that gives his weight as 190 pounds. Bob’s license is coming up for renewal. Let’s test whether Bob’s weight is different from 190 pounds using a test of significance. Let’s assume that Bob’s weight is approximately normally distributed with a population standard deviation of 3 pounds. Bob’s last four weekly weights are:

193 194 192 191 Sample Mean = 192.5= X

We ask if this data provides sufficient evidence to say that Bob’s weight has changed. (Two side hypothesis wording because direction is not implied)

Null Hypothesis: Ho: µ = 190 No change

Alternate Hypothesis: Ha: µ ≠ 190 Changed. Two side hypothesis

Again, the parameter value specified in the null hypothesis usually represents no change, or the status quo. The suspected change in the parameter value is stated by the alternative hypothesis.

Calculate Z: (192.5-190) / (3/sqrt(4)) = 1.67

Calculate P Value: Tail probability = .0475

For two side tests we must double the tail probability.

P Value = 2 ( .0475) = .0950 which is the probability in both tails.

Reach a conclusion using α = .05: Since the P Value is greater than α, we have insufficient evidence to reject Ho. We lack sufficient evidence to say it has changed. Bob’s weight could still be 190. The above sample mean could have occurred by random chance with a probability of .0950 or 9.50% of the time.

What if α had been .10 instead of .05? We would then have sufficient evidence to reject Ho and say it has changed.

Example 4:

1. A shipment of machined parts has a critical dimension that is normally distributed with mean 12 centimeters and standard deviation 0.1 centimeters. The acceptance sampling team suspects that the dimension is less than 12 centimeters. They take a simple random sample of 25 of these parts and obtain a mean of 11.99. Is the acceptance sampling team correct in their assertions? Use an α level of 0.01.

Confidence Intervals and Two-Sided Tests:

A level α two-sided significance test rejects a hypothesis [pic] exactly when the value [pic] falls outside a level 1-α confidence interval for [pic].

Example using the weight on Bob’s drivers license:

Bob’s data: 193 194 192 191 Sample Mean = 192.5

It was assumed that σ = 3 for the population of Bob’s weights.

Calculate 95% Confidence Interval:

192.5 + / - 1.960 ( 3 / sqrt 4)

(189.56 , 195.44)

From Page 8 and 9, a 2 side hypothesis test using α = .05 failed to reject Ho, meaning that Bob’s weight could still be 190. You can see that this result is consistent with the 95% confidence interval above, since 190 is included in the confidence interval.

If we repeated this example and calculated a 90% confidence interval we would get

(190.03, 194.97)

If α = .10 the hypothesis would reject Ho meaning that Bob’s weight is different from 190. You can see that this result is consistent with the 90% confidence interval since 190 is not included in the interval.

Two side hypotheses will reject Ho when a confidence interval does not include µo, provided that α and the confidence level are equivalent.

Α 99% confidence level is equivalent to α =.01

A 95% confidence level is equivalent to α= .05

A 90% confidence level is equivalent to α= .10

Section 6.3

Use and Abuse of Tests:

Choosing a Level of Significance:

If we want to make a decision based on our test, we choose a level of significance in advance. We do not have to do this, however, if we are only interested in describing the strength of our evidence. If we do choose a level of significance in advance, we must choose α by asking how much evidence is required to reject [pic]. The choice of α depends on the type of study we are doing. If the value for α is not given, use α = .05

Some Cautions about Statistical tests:

• As with CI’s, badly designed surveys or experiments often produce invalid results. Formal statistical inference cannot correct basic flaws in data collection.

• As with CI’s, tests of significance are based on laws of probability. Random sampling or random assignment of subjects to treatments ensures that these laws apply.

• Statistical significance is not the same thing as practical significance.

• There is no sharp border between “significant” and “non significant”, only increasingly strong evidence as the P-value gets smaller.

• It is possible that a non-significant result is due to the sample size being too small. Larger sample sizes are capable of detecting smaller shifts.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

To fulfill the demand for quickly locating and searching documents.

It is intelligent file search solution for home and business.

Literature Lottery

SECTION 6 - Purdue University

To fulfill the demand for quickly locating and searching documents.

Related download

Related searches