CHAPTER 9—POINT AND INTERVAL ESTIMATION



Chapter 9--Estimation.Doc

STATISTICS 301—APPLIED STATISTICS, Statistics for Engineers and Scientists, Walpole, Myers, Myers, and Ye, Prentice Hall

Goal: In this section we will investigate the concept of “Estimation” in which our goal is to use sample information (assumed to be a random sample from the population of interest) to arrive at a reasonable guess of a population parameter. Estimation is done in two ways—point estimation (or single value) and interval estimation (an interval or range of likely values).

INTERVAL ESTIMATION (aka CONFIDENCE INTERVALS)

The advantage of point estimation and point estimates is their simplicity—a single number. However, this simplicity has a price. Consider the following.

In a follow-up to the Dean’s request about the proportion of MU undergrads who plan to attend graduate school, he checks with another faculty member who also collects data. This faculty member reports to the Dean that 78.45% of the students he has asked plan to attend graduate school. Whom does the Dean believe since the results are slightly different, me (recall my estimate was [pic]) or the other faculty member? In other words, what DON’T point estimates tell you about the estimate and sample?

This is the downside of point estimates—they provide no sense of how large the sample is nor how variable the estimate is. We know that the spread of every sampling distribution is dependent upon the sample size so that smaller sample sizes yield sampling distributions with larger spread and for larger sample sizes, the sampling distribution is less variable.

What would the SE of my estimate of p, the proportion of MU students going to grad school?

SE(0.602) = √(0.602)(1-0.602)/123 = 0.044

What would the SE of the other estimate of p from the other faculty member? What do you need to know? Suppose n = 30.

SE(0.7845) = √(0.7845)(1-0.7845)/30 = 0.075

The SE of our estimate is almost ½ of the other!

Hence, if we are only given the estimate, the accuracy of point estimates is not evident!

Just as we can obtain point estimates for every population parameter we have discussed thus far, we can also obtain Confidence Intervals for these parameters. However we will only give the Confidence Interval (CI) or interval estimates for the population mean and proportion ( and later for the difference between two means and two proportions).

Defn: Most interval estimates for parameters are of the form:

Point Estimate of Parameter ± Multiplier * SE(Point Estimate)

PE ± Multiplier * SE(PE)

or [ PE - M * SE(PE), PE + M * SE(PE) ]

where the Multiplier is an upper percentile point from the sampling distribution of the point estimator used.

Hence to form an interval estimate or confidence interval for a parameter we need:

1. A point estimate of the parameter,

2. the distribution of the point estimator,

3. and an estimate of the Standard Error of the point estimate.

CONFIDENCE INTERVAL FOR POPULATION MEAN ((popln or ()

Using our basic confidence interval form for the population mean we know that:

1. Our point estimate of ( is [pic].

2. The SE of [pic] is [pic] ( ≈ [pic] if ( were unknown, which is the “usual” case!).

3. Lastly, the distribution of [pic] depends on several things:

a. If the population is Normal and ( is known, then the Multiplier is a z value.

b. If n is large, then [pic] is approximately Normal and our Multiplier is z value

c. And if n is small, ( is unknown, and the population is Normal, the Multiplier is a t value with n – 1 degrees of freedom.

Thm: If X1, X2, …, Xn are a random sample from a population with mean = (, variance = (2, then a ( 1 - ( ) 100% confidence interval for ( is:

i. [pic] if the population is Normally distributed and ( is known

ii. [pic] if n is large ( n > 30)

iii. [pic] if n is small, ( is unknown, & the population is Normal.

EXAMPLE #1

Recall our Milky Way candy data in which we found that the average weight of the 40 candy bars was 59.97 grams with a standard deviation of 1.92 grams. Find a 95% confidence interval for the mean weight of all Milky Way candy bars.

Parameter: ( = mean weight of all Milky Way candy bars

Point Estimate: [pic] =

Standard Error of our Point Estimate: [pic] but since ( is unknown we use [pic] =

( value: Since 95% = ( 1 - ( ) 100% ( =

Multiplier: Since n is large, our multiplier is z(α/2) =

[pic]

[pic]

Our 95% confidence interval for the mean weight of all Milky Way candy bars is

Example #2: Exercise 9.6 from WMMY 8th page 286

A random sample of 50 college students yields a sample average hgt of 174.5 cm and a standard deviation of 6.9 cm. Obtain a 98% CI for the mean hgt of college students.

|[pic] |[pic] |

| | |

|[pic] |[pic] |

CONFIDENCE INTERVAL INTERPRETATION

We just found a 95% CI for ( = mean weight of all Milky Way candy bars was (59.4, 60.6).

Now some True/False questions.

T F a. The probability that ( is in the CI is 95%.

T F b. The probability that [pic] is in the CI is 95%.

T F c. The probability that ( is in the CI is either 0% or 100%.

T F d. 95% of all such CI’s contain (.

T F e. We can conclude that ( is closer to the center of the CI than the ends.

T F f. 95% of all candy bars weigh between (59.4, 60.6gms).

Before we answer these T/F, here are some more questions:

Is ( a constant or does it vary?

[pic]

Is σ2 known or unknown?

[pic]

Based on our RS of n = 40, what is the distribution of [pic] and what does it look like?

[pic]

If we took a different sample of Milky Way candy bars, would we get the same 95% CI?

Would ( change?

Would [pic] change?

Would s change?

Would z(α/2) change?

[pic]

1. For each CI, is ( in the interval? So what’s the probability ( is in any ONE CI?

2. What % of ALL CI’s contain (?

3. What would the population distribution look like?

We just found that a 95% CI for ( = mean weight of all Milky Way candy bars

was (59.4, 60.6gms).

True/False Answers:

T F a. The probability that ( is in the CI is 95%.

T F b. The probability that [pic] is in the CI is 95%.

T F c. The probability that ( is in the CI is either 0% or 100%.

T F d. 95% of all such CI’s contain (.

T F e. We can conclude that ( is closer to the center of the CI than the ends.

T F f. 95% of all candy bars weigh between (59.4, 60.6gms).

[pic]

What is(are) the population(s)?

What is(are) the parameter(s)?

1995:

2006:

CONFIDENCE INTERVAL FOR POPULATION PROPORTION, DIFFERENCE BETWEEN TWO MEANS, & DIFFERENCE BETWEEN TWO PROPORTIONS

We present the CI’s forms for the above three different cases in the following theorems, then present several examples.

Thm: If X1, X2, …, Xn are a random sample from a population with proportion, p, then

a ( 1 - ( ) 100% confidence interval for p is

[pic] if np > 5 and n(1-p) > 5 OR n[pic] > 5 and n(1-[pic]) > 5.

Thm: Let [pic] and s1 and [pic] and s2 be the sample average and sample standard deviation, respectively, of two independent random samples of sizes n1 and n2, respectively, from two populations with means (1 and (2, then a ( 1 - ( ) 100% confidence interval for ( (1 - (2) is

[pic].

Thm: If [pic] and [pic] are sample proportions from two independent random samples of size n1 and n2 from two populations with proportions p1 and p2, then a ( 1 - ( ) 100% confidence interval for (p1 - p2) is

[pic] if n1[pic] > 5, n1(1-[pic]) > 5, n2[pic] > 5, and n2(1-[pic]) > 5.

EXAMPLE #1 Underweight Milky Way Candy Bars

Let’s let p be the proportion of “vending-sized” Milky Way candy bars that are below the stated Net Weight of 58.1 grams.

|Candy Wgt |

|62.2 |59.7 |60.7 |58.6 |57.1 |60.2 |62.1 |60.3 |

|60.4 |64.5 |61.3 |59.9 |64.6 |61.3 |58.3 |60.0 |

59.7 |57.4 |57.2 |58.6 |61.6 |59.2 |57.1 |58.2 | |62.4 |56.0 |58.4 |61.9 |59.5 |59.7 |58.7 |57.7 | |

We find that 6 of the 40 candy bars weighed less than 58.1 grams. Our point estimate of the proportion of underweight Milky Ways is [pic]. Let’s also obtain a 95% CI for p. Checking to insure our sample size is large enough

1. n([pic]) = 40(0.15) = 6 > 5

AND

2. n(1-[pic]) = 40(1-0.15) = 34 > 5!

So our 95% CI for p is

[pic]

We can then conclude, with a very high degree of confidence (95% !) that between 4% and 26% of Milky Way candy bars are underweight.

Do you believe MW’s claim that no candy bar that is less than 58.1 gm goes out of the assembly line? Why?

Example #2: Phone Battery Data

Lithium Ion Batteries: 9.75 10.17 11.77 11.77 11.87 11.90 12.12 12.15 12.18 12.24 12.51 12.9 13.15 13.16 13.61 13.63 13.63 13.66 13.75 13.81 13.86 14.15 14.2 14.25 14.42 14.57 14.84 14.92 14.93 14.95 15.63 15.78 15.9 16.06 16.25 16.42 16.43 16.46 16.82 17.04 17.08 17.58 17.65 17.85 17.9

Nickel Metal Hydride: 10.08 11.98 12.19 12.36 12.37 12.4 12.45 12.46 12.54 12.59 12.68 12.85 12.88 13.06 13.07 13.18 13.18 13.35 13.38 13.47 13.48 13.5 13.51 13.52 13.67 13.83 13.85 13.86 13.9 14.02 14.05 14.07 14.15 14.19 14.49 14.53 14.59 14.61 14.81 14.85 14.99 15.01 15.04 15.07 15.10 15.22 15.28 15.3 15.38 15.53 15.54 15.59 15.72

nLI = 45, [pic]=14.348, sLI = 2.0693, se([pic]) = 2.0693/(45 = 0.3085

n NIMH = 53, [pic]=13.826, sNIMH = 1.1819, se([pic]) = 1.1819/(53 = 0.1623

(1-() 100% CI for (1 - (2 is [pic] .

[pic]

Obtain a 90% CI for ( NIMH - ( LI: (13.826 – 14.348) ( t(0.05, df)*0.357

( (13.826 – 14.348) ( t(0.05, 68)*0.357

( -0.522 ( 1.990*0.357

( -0.522 ( 0.711

( [ -1.233, 0.189 ]

Interpretation?

Example #3: Hair Color & Pain Threshold Data

Light Blonde: 62 60 71 55 48

Dark Brunette: 32 39 51 30 35

nLB = 5, [pic]=59.2, sLB = 8.5264, se([pic]) = 8.5264/(5 = 3.8131

nDB = 5, [pic]=37.4, sDB = 8.3247, se([pic]) = 8.3247/(5 = 3.7229

(1-() 100% CI for (1 - (2 is [pic] .

[pic]

Obtain a 99% CI for (LB - (DB: (59.2 –37.4) ( t(0.005, df)*5.3292,

( (59.2 –37.4) ( t(0.005, 8)*5.3291

( 21.8 ( 3.355*5.3292

( 21.8 ( 17.8795

( [ 3.92, 39.68 ]

Interpretation?

Example #4: Example 9.6 from WMMY 8th page 289

Compare the gas mileage of two car types (compact and sub-compact). We have two independent RS’s with summary information:

nSC = 75, [pic]=42, sSC = 8

nC = 50, [pic]=36, sS = 6

Obtain a 96% CI for (C - (SC.

Example #5: Exercise 9.65 from WMMY 8th page 305

Compare the proportion of females and males with a certain minor blood disorder. We have independent RS’s of size 1,000 and found 275 females with the disorder and 250 males with the disorder.

Obtain a 95% confidence interval for the difference in proportions.

Example #6:

Example #7:

Confidence Interval for Difference of Two Means

Non-Independent Samples—What’s the Effect?

(9.44 WMMY 8th) A taxi company is trying to decide whether to purchase Brand A or Brand B tires for its fleet of taxis. A tire from each brand is assigned at random to the rear wheels of 8 taxis and the following distances, in km, recorded until a tire had only 1/8” of tread remaining.

Taxi |1 |2 |3 |4 |5 |6 |7 |8 |n |Average |ST Dev | |Brand A |34,400 |45,500 |36,700 |32,000 |48,400 |32,800 |38,100 |30,100 |8 |33,112 |6546.7549 | |Brand B |36,700 |46,800 |37,700 |31,100 |47,800 |36,400 |38,900 |31,500 |8 |34,101 |6181.0627 | |

Are these two samples independent?

Now let’s calculate the [pic]assuming independent samples. Recall that [pic] .

The problem is that since the samples are NOT independent, our [pic] could either over-estimate or under-estimate that true standard error!

Confidence Intervals for Difference of Two Means

Paired Data Case

Thm: Assuming a sample of “n” paired observations (x1i, y2i), a (1-() 100% CI for (1 - (2 is

[pic]

where di = (x1i - y2i) and [pic].

Taxi |1 |2 |3 |4 |5 |6 |7 |8 |n |Average |ST Dev | |Brand A |34,400 |45,500 |36,700 |32,000 |48,400 |32,800 |38,100 |30,100 |8 |33,112 |6546.7549 | |Brand B |36,700 |46,800 |37,700 |31,100 |47,800 |36,400 |38,900 |31,500 |8 |34,101 |6181.0627 | |Difference |-2,300 |-1,300 |-1,000 |900 |600 |-3,600 |-800 |-1,400 |8 |-1112.5 |1454.4881 | |

Hence, for our data, a 95% CI for the difference in mileage for the two Brands of tires ((A - (B) is:

NOTES AND COMMENTS ON CONFIDENCE INTERVALS

1. While n > 30 will work well in most instances, larger sample sizes would be needed if the population is known to be severely skewed. If the population is symmetric or approximately so, then CI’s for the mean (() based on samples of size 30 are adequate.

Populations that are known to be severely skewed, in either direction, would require a larger sample size.

2. INTERPRETATION OF CI’S: A 95% CI would be interpreted as follows:

We are 95% confident that the parameter of interest falls somewhere within the stated interval.

Notice that we do NOT say, “The probability is 95% that the parameter of interest falls somewhere within the stated interval” since this is not true. Hence avoid using the term “probability” in the interpretation of CI’s.

3. The Degree of Confidence of the CI is a statement about how sure or confident we are in our CI. The higher the degree of confidence, the more certain we are with our statement; the lower the degree of confidence the less sure we are. While higher confidence in general is better, the sacrifice is a wider CI and hence more possible values for the parameter. One usually uses 90%, 95%, or 99% in most cases.

What would a 100% confidence interval be? How informative is it?

[pic]

4. CI’s are a statement about a population parameters value. It does NOT say anything about what percent or proportion of the population falls in the interval. Hence for a 95% CI, you can NOT conclude “95% of the population falls within the CI.” Rather it is an interval in which the population parameter is likely to lie.

5. The Margin of Error of a confidence interval is the ½ width of the confidence interval. So for our candy bar example, since the 95% confidence interval

was [ 59.97 ± 0.595 ], the Margin of Error would be 0.595.

The Margin of Error provides some evidence of how large an “error” is involved with our estimate or how far away our estimate is from the true parameter.

6. If the degree of confidence is not stated, it’s assumed to be 95%. So if a Margin of Error is given with no indication of the degree of confidence, assume it is 95%.

USING SAS TO OBTAIN CONFIDENCE INTERVALS

Recall we found a 95% CI for ( = mean weight of all Milky Way candy bars was (59.4, 60.6).

SAS CI for a Single Mean

OPTIONS LS=110 PS=60 PAGENO=1 NODATE FORMDLIM='+';

TITLE 'CI.SAS';

TITLE2 'EXAMPLE OF ONE AND TWO SAMPLE CI OF MEANS IN SAS';

TITLE3 'MILKY WAY WGT DATA FROM CLASS';

DATA MWDATA;

INPUT MW_WGT @@; DATALINES;

62.2 59.7 60.7 58.6 57.1 60.2 62.1 60.3

59.6 61.6 59.1 61.5 61.5 60.5 59.6 60.7

60.4 64.5 61.3 59.9 64.6 61.3 58.3 60.0

59.7 57.4 57.2 58.6 61.6 59.2 57.1 58.2

62.4 56.0 58.4 61.9 59.5 59.7 58.7 57.7

;

PROC TTEST DATA= MWDATA ALPHA=0.05;

VAR MW_WGT;

RUN;

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

CI.SAS 1

EXAMPLE OF ONE AND TWO SAMPLE CI OF MEANS IN SAS

MILKY WAY WGT DATA FROM CLASS

The TTEST Procedure

Statistics

Lower CL Upper CL Lower CL Upper CL

Variable N Mean Mean Mean Std Dev Std Dev Std Dev Std Err Minimum Maximum

MW_WGT 40 59.351 59.965 60.579 1.573 1.9203 2.4657 0.3036 56 64.6

T-Tests

Variable DF t Value Pr > |t|

MW_WGT 39 197.50 |t|

Time Pooled Equal 96 -1.56 0.1214

Time Satterthwaite Unequal 67.4 -1.50 0.1387

Equality of Variances

Variable Method Num DF Den DF F Value Pr > F

Time Folded F 44 52 3.07 0.0001

SAS CI of the Difference of Two Means—Paired Data

DATA PAIRED;

TITLE3 'PAIRED TIRE DATA';

INPUT BRANDA BRANDB;

DIFF = BRANDA-BRANDB;

DATALINES;

34400 36700

45500 46800

36700 37700

32000 31100

48400 47800

32800 36400

38100 38900

30100 31500

;

PROC TTEST DATA=PAIRED;

VAR DIFF;

RUN;

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

CI.SAS 3

EXAMPLE OF ONE AND TWO SAMPLE CI OF MEANS IN SAS

PAIRED TIRE DATA

The TTEST Procedure

Statistics

Lower CL Upper CL Lower CL Upper CL

Variable N Mean Mean Mean Std Dev Std Dev Std Dev Std Err Minimum Maximum

DIFF 8 -2328 -1113 103.48 961.67 1454.5 2960.3 514.24 -3600 900

T-Tests

Variable DF t Value Pr > |t|

DIFF 7 -2.16 0.0673

Approximate 95% Margin of Error for proportions is 1/√n …

So MoE is

[pic]

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download