CHAPTER 9—POINT AND INTERVAL ESTIMATION
Chapter 9--Estimation.Doc
STATISTICS 301—APPLIED STATISTICS, Statistics for Engineers and Scientists, Walpole, Myers, Myers, and Ye, Prentice Hall
Goal: In this section we will investigate the concept of “Estimation” in which our goal is to use sample information (assumed to be a random sample from the population of interest) to arrive at a reasonable guess of a population parameter. Estimation is done in two ways—point estimation (or single value) and interval estimation (an interval or range of likely values).
INTERVAL ESTIMATION (aka CONFIDENCE INTERVALS)
The advantage of point estimation and point estimates is their simplicity—a single number. However, this simplicity has a price. Consider the following.
In a follow-up to the Dean’s request about the proportion of MU undergrads who plan to attend graduate school, he checks with another faculty member who also collects data. This faculty member reports to the Dean that 78.45% of the students he has asked plan to attend graduate school. Whom does the Dean believe since the results are slightly different, me (recall my estimate was [pic]) or the other faculty member? In other words, what DON’T point estimates tell you about the estimate and sample?
This is the downside of point estimates—they provide no sense of how large the sample is nor how variable the estimate is. We know that the spread of every sampling distribution is dependent upon the sample size so that smaller sample sizes yield sampling distributions with larger spread and for larger sample sizes, the sampling distribution is less variable.
What would the SE of my estimate of p, the proportion of MU students going to grad school?
SE(0.602) = √(0.602)(1-0.602)/123 = 0.044
What would the SE of the other estimate of p from the other faculty member? What do you need to know? Suppose n = 30.
SE(0.7845) = √(0.7845)(1-0.7845)/30 = 0.075
The SE of our estimate is almost ½ of the other!
Hence, if we are only given the estimate, the accuracy of point estimates is not evident!
Just as we can obtain point estimates for every population parameter we have discussed thus far, we can also obtain Confidence Intervals for these parameters. However we will only give the Confidence Interval (CI) or interval estimates for the population mean and proportion ( and later for the difference between two means and two proportions).
Defn: Most interval estimates for parameters are of the form:
Point Estimate of Parameter ± Multiplier * SE(Point Estimate)
PE ± Multiplier * SE(PE)
or [ PE - M * SE(PE), PE + M * SE(PE) ]
where the Multiplier is an upper percentile point from the sampling distribution of the point estimator used.
Hence to form an interval estimate or confidence interval for a parameter we need:
1. A point estimate of the parameter,
2. the distribution of the point estimator,
3. and an estimate of the Standard Error of the point estimate.
CONFIDENCE INTERVAL FOR POPULATION MEAN ((popln or ()
Using our basic confidence interval form for the population mean we know that:
1. Our point estimate of ( is [pic].
2. The SE of [pic] is [pic] ( ≈ [pic] if ( were unknown, which is the “usual” case!).
3. Lastly, the distribution of [pic] depends on several things:
a. If the population is Normal and ( is known, then the Multiplier is a z value.
b. If n is large, then [pic] is approximately Normal and our Multiplier is z value
c. And if n is small, ( is unknown, and the population is Normal, the Multiplier is a t value with n – 1 degrees of freedom.
Thm: If X1, X2, …, Xn are a random sample from a population with mean = (, variance = (2, then a ( 1 - ( ) 100% confidence interval for ( is:
i. [pic] if the population is Normally distributed and ( is known
ii. [pic] if n is large ( n > 30)
iii. [pic] if n is small, ( is unknown, & the population is Normal.
EXAMPLE #1
Recall our Milky Way candy data in which we found that the average weight of the 40 candy bars was 59.97 grams with a standard deviation of 1.92 grams. Find a 95% confidence interval for the mean weight of all Milky Way candy bars.
Parameter: ( = mean weight of all Milky Way candy bars
Point Estimate: [pic] =
Standard Error of our Point Estimate: [pic] but since ( is unknown we use [pic] =
( value: Since 95% = ( 1 - ( ) 100% ( =
Multiplier: Since n is large, our multiplier is z(α/2) =
[pic]
[pic]
Our 95% confidence interval for the mean weight of all Milky Way candy bars is
Example #2: Exercise 9.6 from WMMY 8th page 286
A random sample of 50 college students yields a sample average hgt of 174.5 cm and a standard deviation of 6.9 cm. Obtain a 98% CI for the mean hgt of college students.
|[pic] |[pic] |
| | |
|[pic] |[pic] |
CONFIDENCE INTERVAL INTERPRETATION
We just found a 95% CI for ( = mean weight of all Milky Way candy bars was (59.4, 60.6).
Now some True/False questions.
T F a. The probability that ( is in the CI is 95%.
T F b. The probability that [pic] is in the CI is 95%.
T F c. The probability that ( is in the CI is either 0% or 100%.
T F d. 95% of all such CI’s contain (.
T F e. We can conclude that ( is closer to the center of the CI than the ends.
T F f. 95% of all candy bars weigh between (59.4, 60.6gms).
Before we answer these T/F, here are some more questions:
Is ( a constant or does it vary?
[pic]
Is σ2 known or unknown?
[pic]
Based on our RS of n = 40, what is the distribution of [pic] and what does it look like?
[pic]
If we took a different sample of Milky Way candy bars, would we get the same 95% CI?
Would ( change?
Would [pic] change?
Would s change?
Would z(α/2) change?
[pic]
1. For each CI, is ( in the interval? So what’s the probability ( is in any ONE CI?
2. What % of ALL CI’s contain (?
3. What would the population distribution look like?
We just found that a 95% CI for ( = mean weight of all Milky Way candy bars
was (59.4, 60.6gms).
True/False Answers:
T F a. The probability that ( is in the CI is 95%.
T F b. The probability that [pic] is in the CI is 95%.
T F c. The probability that ( is in the CI is either 0% or 100%.
T F d. 95% of all such CI’s contain (.
T F e. We can conclude that ( is closer to the center of the CI than the ends.
T F f. 95% of all candy bars weigh between (59.4, 60.6gms).
[pic]
What is(are) the population(s)?
What is(are) the parameter(s)?
1995:
2006:
CONFIDENCE INTERVAL FOR POPULATION PROPORTION, DIFFERENCE BETWEEN TWO MEANS, & DIFFERENCE BETWEEN TWO PROPORTIONS
We present the CI’s forms for the above three different cases in the following theorems, then present several examples.
Thm: If X1, X2, …, Xn are a random sample from a population with proportion, p, then
a ( 1 - ( ) 100% confidence interval for p is
[pic] if np > 5 and n(1-p) > 5 OR n[pic] > 5 and n(1-[pic]) > 5.
Thm: Let [pic] and s1 and [pic] and s2 be the sample average and sample standard deviation, respectively, of two independent random samples of sizes n1 and n2, respectively, from two populations with means (1 and (2, then a ( 1 - ( ) 100% confidence interval for ( (1 - (2) is
[pic].
Thm: If [pic] and [pic] are sample proportions from two independent random samples of size n1 and n2 from two populations with proportions p1 and p2, then a ( 1 - ( ) 100% confidence interval for (p1 - p2) is
[pic] if n1[pic] > 5, n1(1-[pic]) > 5, n2[pic] > 5, and n2(1-[pic]) > 5.
EXAMPLE #1 Underweight Milky Way Candy Bars
Let’s let p be the proportion of “vending-sized” Milky Way candy bars that are below the stated Net Weight of 58.1 grams.
|Candy Wgt |
|62.2 |59.7 |60.7 |58.6 |57.1 |60.2 |62.1 |60.3 |
|60.4 |64.5 |61.3 |59.9 |64.6 |61.3 |58.3 |60.0 |
59.7 |57.4 |57.2 |58.6 |61.6 |59.2 |57.1 |58.2 | |62.4 |56.0 |58.4 |61.9 |59.5 |59.7 |58.7 |57.7 | |
We find that 6 of the 40 candy bars weighed less than 58.1 grams. Our point estimate of the proportion of underweight Milky Ways is [pic]. Let’s also obtain a 95% CI for p. Checking to insure our sample size is large enough
1. n([pic]) = 40(0.15) = 6 > 5
AND
2. n(1-[pic]) = 40(1-0.15) = 34 > 5!
So our 95% CI for p is
[pic]
We can then conclude, with a very high degree of confidence (95% !) that between 4% and 26% of Milky Way candy bars are underweight.
Do you believe MW’s claim that no candy bar that is less than 58.1 gm goes out of the assembly line? Why?
Example #2: Phone Battery Data
Lithium Ion Batteries: 9.75 10.17 11.77 11.77 11.87 11.90 12.12 12.15 12.18 12.24 12.51 12.9 13.15 13.16 13.61 13.63 13.63 13.66 13.75 13.81 13.86 14.15 14.2 14.25 14.42 14.57 14.84 14.92 14.93 14.95 15.63 15.78 15.9 16.06 16.25 16.42 16.43 16.46 16.82 17.04 17.08 17.58 17.65 17.85 17.9
Nickel Metal Hydride: 10.08 11.98 12.19 12.36 12.37 12.4 12.45 12.46 12.54 12.59 12.68 12.85 12.88 13.06 13.07 13.18 13.18 13.35 13.38 13.47 13.48 13.5 13.51 13.52 13.67 13.83 13.85 13.86 13.9 14.02 14.05 14.07 14.15 14.19 14.49 14.53 14.59 14.61 14.81 14.85 14.99 15.01 15.04 15.07 15.10 15.22 15.28 15.3 15.38 15.53 15.54 15.59 15.72
nLI = 45, [pic]=14.348, sLI = 2.0693, se([pic]) = 2.0693/(45 = 0.3085
n NIMH = 53, [pic]=13.826, sNIMH = 1.1819, se([pic]) = 1.1819/(53 = 0.1623
(1-() 100% CI for (1 - (2 is [pic] .
[pic]
Obtain a 90% CI for ( NIMH - ( LI: (13.826 – 14.348) ( t(0.05, df)*0.357
( (13.826 – 14.348) ( t(0.05, 68)*0.357
( -0.522 ( 1.990*0.357
( -0.522 ( 0.711
( [ -1.233, 0.189 ]
Interpretation?
Example #3: Hair Color & Pain Threshold Data
Light Blonde: 62 60 71 55 48
Dark Brunette: 32 39 51 30 35
nLB = 5, [pic]=59.2, sLB = 8.5264, se([pic]) = 8.5264/(5 = 3.8131
nDB = 5, [pic]=37.4, sDB = 8.3247, se([pic]) = 8.3247/(5 = 3.7229
(1-() 100% CI for (1 - (2 is [pic] .
[pic]
Obtain a 99% CI for (LB - (DB: (59.2 –37.4) ( t(0.005, df)*5.3292,
( (59.2 –37.4) ( t(0.005, 8)*5.3291
( 21.8 ( 3.355*5.3292
( 21.8 ( 17.8795
( [ 3.92, 39.68 ]
Interpretation?
Example #4: Example 9.6 from WMMY 8th page 289
Compare the gas mileage of two car types (compact and sub-compact). We have two independent RS’s with summary information:
nSC = 75, [pic]=42, sSC = 8
nC = 50, [pic]=36, sS = 6
Obtain a 96% CI for (C - (SC.
Example #5: Exercise 9.65 from WMMY 8th page 305
Compare the proportion of females and males with a certain minor blood disorder. We have independent RS’s of size 1,000 and found 275 females with the disorder and 250 males with the disorder.
Obtain a 95% confidence interval for the difference in proportions.
Example #6:
Example #7:
Confidence Interval for Difference of Two Means
Non-Independent Samples—What’s the Effect?
(9.44 WMMY 8th) A taxi company is trying to decide whether to purchase Brand A or Brand B tires for its fleet of taxis. A tire from each brand is assigned at random to the rear wheels of 8 taxis and the following distances, in km, recorded until a tire had only 1/8” of tread remaining.
Taxi |1 |2 |3 |4 |5 |6 |7 |8 |n |Average |ST Dev | |Brand A |34,400 |45,500 |36,700 |32,000 |48,400 |32,800 |38,100 |30,100 |8 |33,112 |6546.7549 | |Brand B |36,700 |46,800 |37,700 |31,100 |47,800 |36,400 |38,900 |31,500 |8 |34,101 |6181.0627 | |
Are these two samples independent?
Now let’s calculate the [pic]assuming independent samples. Recall that [pic] .
The problem is that since the samples are NOT independent, our [pic] could either over-estimate or under-estimate that true standard error!
Confidence Intervals for Difference of Two Means
Paired Data Case
Thm: Assuming a sample of “n” paired observations (x1i, y2i), a (1-() 100% CI for (1 - (2 is
[pic]
where di = (x1i - y2i) and [pic].
Taxi |1 |2 |3 |4 |5 |6 |7 |8 |n |Average |ST Dev | |Brand A |34,400 |45,500 |36,700 |32,000 |48,400 |32,800 |38,100 |30,100 |8 |33,112 |6546.7549 | |Brand B |36,700 |46,800 |37,700 |31,100 |47,800 |36,400 |38,900 |31,500 |8 |34,101 |6181.0627 | |Difference |-2,300 |-1,300 |-1,000 |900 |600 |-3,600 |-800 |-1,400 |8 |-1112.5 |1454.4881 | |
Hence, for our data, a 95% CI for the difference in mileage for the two Brands of tires ((A - (B) is:
NOTES AND COMMENTS ON CONFIDENCE INTERVALS
1. While n > 30 will work well in most instances, larger sample sizes would be needed if the population is known to be severely skewed. If the population is symmetric or approximately so, then CI’s for the mean (() based on samples of size 30 are adequate.
Populations that are known to be severely skewed, in either direction, would require a larger sample size.
2. INTERPRETATION OF CI’S: A 95% CI would be interpreted as follows:
We are 95% confident that the parameter of interest falls somewhere within the stated interval.
Notice that we do NOT say, “The probability is 95% that the parameter of interest falls somewhere within the stated interval” since this is not true. Hence avoid using the term “probability” in the interpretation of CI’s.
3. The Degree of Confidence of the CI is a statement about how sure or confident we are in our CI. The higher the degree of confidence, the more certain we are with our statement; the lower the degree of confidence the less sure we are. While higher confidence in general is better, the sacrifice is a wider CI and hence more possible values for the parameter. One usually uses 90%, 95%, or 99% in most cases.
What would a 100% confidence interval be? How informative is it?
[pic]
4. CI’s are a statement about a population parameters value. It does NOT say anything about what percent or proportion of the population falls in the interval. Hence for a 95% CI, you can NOT conclude “95% of the population falls within the CI.” Rather it is an interval in which the population parameter is likely to lie.
5. The Margin of Error of a confidence interval is the ½ width of the confidence interval. So for our candy bar example, since the 95% confidence interval
was [ 59.97 ± 0.595 ], the Margin of Error would be 0.595.
The Margin of Error provides some evidence of how large an “error” is involved with our estimate or how far away our estimate is from the true parameter.
6. If the degree of confidence is not stated, it’s assumed to be 95%. So if a Margin of Error is given with no indication of the degree of confidence, assume it is 95%.
USING SAS TO OBTAIN CONFIDENCE INTERVALS
Recall we found a 95% CI for ( = mean weight of all Milky Way candy bars was (59.4, 60.6).
SAS CI for a Single Mean
OPTIONS LS=110 PS=60 PAGENO=1 NODATE FORMDLIM='+';
TITLE 'CI.SAS';
TITLE2 'EXAMPLE OF ONE AND TWO SAMPLE CI OF MEANS IN SAS';
TITLE3 'MILKY WAY WGT DATA FROM CLASS';
DATA MWDATA;
INPUT MW_WGT @@; DATALINES;
62.2 59.7 60.7 58.6 57.1 60.2 62.1 60.3
59.6 61.6 59.1 61.5 61.5 60.5 59.6 60.7
60.4 64.5 61.3 59.9 64.6 61.3 58.3 60.0
59.7 57.4 57.2 58.6 61.6 59.2 57.1 58.2
62.4 56.0 58.4 61.9 59.5 59.7 58.7 57.7
;
PROC TTEST DATA= MWDATA ALPHA=0.05;
VAR MW_WGT;
RUN;
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
CI.SAS 1
EXAMPLE OF ONE AND TWO SAMPLE CI OF MEANS IN SAS
MILKY WAY WGT DATA FROM CLASS
The TTEST Procedure
Statistics
Lower CL Upper CL Lower CL Upper CL
Variable N Mean Mean Mean Std Dev Std Dev Std Dev Std Err Minimum Maximum
MW_WGT 40 59.351 59.965 60.579 1.573 1.9203 2.4657 0.3036 56 64.6
T-Tests
Variable DF t Value Pr > |t|
MW_WGT 39 197.50 |t|
Time Pooled Equal 96 -1.56 0.1214
Time Satterthwaite Unequal 67.4 -1.50 0.1387
Equality of Variances
Variable Method Num DF Den DF F Value Pr > F
Time Folded F 44 52 3.07 0.0001
SAS CI of the Difference of Two Means—Paired Data
DATA PAIRED;
TITLE3 'PAIRED TIRE DATA';
INPUT BRANDA BRANDB;
DIFF = BRANDA-BRANDB;
DATALINES;
34400 36700
45500 46800
36700 37700
32000 31100
48400 47800
32800 36400
38100 38900
30100 31500
;
PROC TTEST DATA=PAIRED;
VAR DIFF;
RUN;
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
CI.SAS 3
EXAMPLE OF ONE AND TWO SAMPLE CI OF MEANS IN SAS
PAIRED TIRE DATA
The TTEST Procedure
Statistics
Lower CL Upper CL Lower CL Upper CL
Variable N Mean Mean Mean Std Dev Std Dev Std Dev Std Err Minimum Maximum
DIFF 8 -2328 -1113 103.48 961.67 1454.5 2960.3 514.24 -3600 900
T-Tests
Variable DF t Value Pr > |t|
DIFF 7 -2.16 0.0673
Approximate 95% Margin of Error for proportions is 1/√n …
So MoE is
[pic]
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related searches
- genesis chapter 1 questions and answers
- 9 solfeggio and meaning
- mark 9 commentary and notes
- mark 9 commentary and sermons
- chapter 9 questions and answers
- 9 point font sample
- october 9 holidays and observances
- july 9 holidays and observances
- 9 point factor evaluation
- credit 9 reviews and complaints
- chapter 9 nail diseases and disorders
- chapter 7 physical and cognitive development in middle and late childhood