CHAPTER 8—SAMPLING DISTRIBUTIONS



Chapter 8--Sampling Distributions.Doc

STATISTICS 301—APPLIED STATISTICS, Statistics for Engineers and Scientists, Walpole, Myers, Myers, and Ye, Prentice Hall

Goal: We next investigate some sample statistics (eg [pic]--the Sample Average and [pic]--the Sample Proportion or [pic] and [pic]) as Random Variables and further investigate the distributions of these random variables. Lastly we use this information to “ESTIMATE” a population parameter.

Consider the first few paragraphs from an article in the New York Times.

Trends: Halloween, for Skinnier Skeletons

By JOHN O'NEIL

Published: October 21, 2003

Trick-or-treaters can be satisfied with something other than candy, say researchers who conducted an experiment last Halloween that arose from concern about the nation's obesity epidemic.

In the study, test households in five Connecticut neighborhoods offered children two bowls: one with lollipops or fruit candy and one containing small, inexpensive Halloween toys, like plastic bugs that glow in the dark.

Of the 283 children from 3 to 14 whose reactions were tallied, 148 chose candy and 135 toys; the difference was statistically insignificant, the researchers reported in The Journal of Nutrition Education and Behavior in the July-August issue.

Consider the following questions:

1. What is the population?

2. What is the parameter?

3. What is the sample?

4. What is the statistic?

5. Can we say anything about the theoretical “distribution” of the statistic?

So our goal is two-fold: Determine the distribution of statistics and then use this information to provide information about a population parameter.

Chapter 9—Estimation.Doc

STATISTICS 301—APPLIED STATISTICS, Statistics for Engineers and Scientists, Walpole, Myers, Myers, and Ye, Prentice Hall

Goal: In this section we will investigate the concept of “Estimation” in which our goal is to use sample information (assumed to be a random sample from the population of interest) to arrive at a reasonable guess of a population parameter. Estimation is done in two ways—point estimation (or single value) and interval estimation (an interval or range of likely values). We will tackle POINT ESTIMATION at this point and leave INTERVAL ESTIMATION to the next chapter.

Background

We assume that we are interested in one (or more) population parameter(s), such as the mean ((), the variance ((2), the standard deviation ((), the median ([pic]), or proportion (p).

In every case we assume the parameter is unknown and the population is so large that it precludes our measuring every element of the population to obtain the parameter. So we take a random sample from the population, of size n, and calculate the appropriate sample statistic.

POINT ESTIMATION

Below we provide a table with each of the population parameters we have discussed along with the corresponding point estimate or sample statistic of that parameter.

|Population Parameter|Estimator |Estimate |

|μ |[pic] |[pic] |

|σ2 |[pic] |[pic] |

|σ |[pic] |[pic] |

|p |[pic] |[pic] |

EXAMPLES

Miami Undergraduates and Graduate School Plans

Miami’s Graduate Dean contacts me about the percentage of students who plan on pursuing a graduate degree. Using the information that I have collected on STA 301 students, I tell the Dean of Miami’s Graduate School that of the 123 students I have surveyed 74 indicated they plan to attend graduate school. Hence we would estimate the proportion of MU undergrads that plan to attend graduate school as [pic].

Underweight Milky Way Candy Bars

Recall that we found 20 of 85 candy bars weighed less than Milky Way’s claim that none should weigh less than 58.1 gms. Hence we would estimate the proportion of underweight Milky Ways to be [pic], or nearly one quarter!

62.2 59.6 60.4 59.7 62.4 57.1 61.5 64.6 61.6 59.5 59.7 54.1

61.6 64.5 57.4 56.0 60.2 60.5 61.3 59.2 59.7 60.7 59.1 57.4

61.3 57.2 58.4 62.1 59.6 58.3 57.1 58.7 58.6 61.5 59.9 58.4

58.6 61.9 60.3 60.7 60.0 58.2 57.7 53.2 58.7 61.9 58.8 60.0

60.5 61.3 60.1 59.4 60.0 60.5 63.4 59.5 59.1 60.6 57.5 60.9

59.1 57.2 58.4 58.5 60.1 57.5 57.9 55.3 59.7 60.6 57.7 60.8

57.7 58.1 56.8 58.1 57.0 58.5 58.3 59.4 59.8 59.7 59.7 60.5 56.1

“High School Mathematics Course-Taking by Gender and Ethnicity,” American Educational Research Journal, Fall 1998, Volume 35, Number 3, pages 497-514.

In this example one of the goals was to determine the mean number of Carnegie units earned by high-school graduates in seven categories of mathematics courses. Among all students in the study ( the sample size was 21,784 ), the average number of math CU’s earned was [pic]= 3.11. Thus we would estimate the mean number of math CU’s earned by all high-school graduates is 3.11.

While not part of the question or purpose of the study, the authors also report the sample standard deviation of the number of math CU’s earned, s = 0.93. This is an estimate the population standard deviation (σ) of the number of math CU’s earned by HS students.

METHODS OF ESTIMATION

GOAL: Given a random sample (ie the X1, X2, …, Xn) from a population with mean = (popln and

variance = (2popln,

HOW DO WE MANIPULATE THE X’s SO THAT WE ARRIVE AT A “GOOD” GUESS OF THE POPULATION PARAMETER OF INTEREST? That is:

For example, if our goal is to estimate the population mean, (popln, what function of the X’s do we use to arrive at our ESTIMATOR of (popln?

Or what function of the X’s do we use to arrive at our ESTIMATOR of (2popln?

DIFFERENT METHODS OF ESTIMATION

There are many different methods of estimation and all yield the function of the X’s that is used to estimate a population parameter. The three most commonly used methods are:

1. Method of Least Squares Estimation (LSE): you’ll see this used in the regression and design of experiments courses (STA 463 and 466, respectively).

2. Method of Maximum Likelihood Estimation (MLE): this method will be seen in STA 401 and other more advanced statistical theory classes.

3. Method of Moments: to be seen in STA 401 and infrequently otherwise.

We mention these methods in passing; we will assume for the remainder of this section that the function and statistic is just given to us.

PROPERTIES OF ESTIMATORS

We started this section with the goal of obtaining “good” guesses or estimates of population parameters. There are two main criteria that determine “good” estimators and estimates and we illustrate these criteria in the following pictures.

In the following “target” picture, think of the bulls-eye as being the “true” or actual population parameter value and the “hits” on the target as being observations from the distribution of a statistic used to estimate the parameter. And in the second picture, an approximation to the distribution of various statistics all estimators of a parameter is given.

|[pic] |[pic] |

In both pictures, the idea of Unbiased and Best Estimators is illustrated and we give the following definitions:

1. Unbiased Estimators are estimators whose expected value equals the parameter of interest, so that E [ Estimator ] = parameter.

2. Best Estimators are those that have the smallest variance among a set of estimators.

Summary of Estimators

1. Since E [ [pic] ] = (popln, [pic] is an unbiased estimator of the popln mean, (popln.

2. Since E[ [pic] ] = p, [pic] is an unbiased estimator of the popln proportion, p.

In general, if we let [pic] be a sample statistic, we will show that:

1. [pic] is a Random Variable and hence

2. [pic] has a distribution consisting of i) Center = Mean of the RV = [pic] = E[ [pic] ],

ii) Spread = Variance of RV = [pic] = E[ ([pic] - [pic])2 ], and

iii) Shape = Family or Kind of RV .

Recall that Families or Kinds of RV will fall into one of the distributions that we have encountered already, namely:

Finally recall some properties of functions of random variables, namely, sums of linear combinations of random variables.

If X and Y are two INDEPENDENT random variables with means and variances, (X and (Y and (2X and (2Y , respectively, and let a, b, and c be constants, then we know that:

1. The Mean of W (= aX + bY + c) = (W = E [ W ] = a(X + b(Y + c ALWAYS!!!

2. The Variance W = V(aX + bY + c) = (2W = a2(2X + b2(2Y ONLY IF X AND Y INDEP!!!

3. The Mean of W (= X ± Y) = (W = E [ W ] = E [ X ± Y ] = (X ± (Y ALWAYS!!!

4. The Variance W (= X ± Y) = (2W = V[ W ] = V[ X ± Y ] = (2X + (2Y ONLY IF X & Y INDEP!

Now consider the following experiment. Take a random sample of size n from a population of measurements.

[pic]

Some facts!

1. Every sample statistic is a function of X1, X2, …, Xn. Later we’ll briefly discuss this.

2. The values X1, X2, …, Xn are random variables. Why?

3. The random variables X1, X2, …, Xn are independent random variables. Why?

4. Since the X1, X2, …, Xn are random variables, they have a distribution (Center, Spread, and Shape). It’s easy to “see” that the distribution of X1 is identical to the population distribution and hence: Center of X1: (popln

Spread of X1: (2popln

Shape of X1: Popln Shape

5. Less easy to see, but still true is that each of the X’s has the same distribution as the population. This results in the following crucial theorem.

Thm: If X1, X2, …, Xn are a random sample from a population with mean = (popln,

variance = (2popln, and some shape, then the Xi’s are said to independent and identically distributed. That is, the Xi’s are i.i.d. ((popln, (2popln) with same shape as population.

NOTATION AND TERMINOLOGY

We will distinguish between ESTIMATORS and ESTIMATES. Recall that the X’s, the random sample, are random variables. They are random quantities! The values of these X’s, x’s, are the values that the random variables take on and are constants.

Whereas the X’s are random variables and hence have distributions, the x’s are constants and DO NOT HAVE A DISTRIBUTION!

Hence we have the following definitions:

Defn: An ESTIMATOR is a function of the random sample values, X’s, and is a random variable with a distribution.

Defn: An ESTIMATE is a value that the estimator can take on and hence is a single number.

COMMON ESTIMATORS (aka SAMPLE STATISTICS)

Below are the most commonly used ESTIMATORS and ESTIMATES of population parameters of most interest.

|Population Parameter |Estimator |Estimate |

| |(aka Sample Statistic) | |

|μ |[pic] |[pic] |

|σ2 |[pic] |[pic] |

|σ |[pic] |[pic] |

|p |[pic] |[pic] |

|Median |[pic] |[pic] |

|Range |R = Max X – Min X |r = Max x – Min x |

Sampling Distributions of Sample Statistics

[pic]

Sampling Distributions of Sample Statistics

Defn: The Sampling Distribution of a Statistic is the distribution of statistic (the random variable) if every possible random sample of the same size were obtained from the population.

Recall that Distributions have CENTERS, SPREADS, and SHAPES, hence:

SAMPLING DISTRIBUTION will have CENTER

SPREAD

SHAPE

Typically only two things determine the distribution of a sample statistic:

1. sample size and

2. distribution of the population from which the sample came from.

The sampling distribution for any of the sample statistics that we have seen can be determined. Thus we can find the sampling distribution of:

CENTER =

( The sample average, [pic] SPREAD =

SHAPE =

( The sample variance, S2.

( The sample standard deviation, S.

( The sample proportion, [pic].

In each of these cases, the sampling distribution of the sample statistic depends upon

1. The sample size

and

2. The distribution of the population.

In this class we will only investigate the sampling distribution of the sample average and sample proportion in depth.

NOTES AND COMMENTS ABOUT SAMPLING DISTRIBUTIONS

1. Since a Sampling Distribution is a distribution, it has a Center, Spread, and Shape. While we had many ways of describing Center and Spread in general, we will use the MEAN to define the Center and VARIANCE or STANDARD DEVIATION to define the Spread for Sampling Distributions.

2. The Standard Deviation (SD for short) of the Sampling Distribution of a statistic is also known as the Standard Error (SE for short) of the Statistic.

Interpretation: The Standard Error of a Statistic is a measure of how variable the statistic is.

Example

Suppose a research paper reports that the sample of 56 parents of 7th graders in a particular school system was taken. It also reports that the incomes of parents had a sample average income of $45,000 with a standard deviation of $12,000. It also reports that the Standard Error of the Average was $1,600. Pictorially:

Here’s what this means.

1. The $12,000 standard deviation is a measure of how variable the collective 56 incomes were. This is a pretty large spread.

2. The $1,600 standard error of the average is interpreted as if we were to take repeated random samples of 56 parents, measure their incomes, calculate the sample average in each case, the standard deviation for all of the average incomes that we calculated would be approximately $1,600.

The Standard Error of the (sample) Average is also more commonly referred to as the Standard Error of the Mean. Again, poor choice of words, but you’ll see it routinely used. Below is an example of Minitab descriptive statistics output that uses these terms.

Descriptive Statistics: Score

Variable N Mean Median TrMean StDev SE Mean

Score 121 90.339 92.000 91.138 6.947 0.632

Variable Minimum Maximum Q1 Q3

Score 36.000 99.000 88.000 94.000

HOW ARE SAMPLING DISTRIBUTIONS DETERMINED?

Sampling distributions are a statement about the distribution of every possible value of a statistic. That is, suppose we wanted to determine the sampling distribution of the sample average. It might seem that we would have to take every sample of size n from the population, observe, then summarize these values. That is:

[pic]

The most surprising of all is the fact that to determine the sampling distribution of any statistic requires ONLY ONE RANDOM SAMPLE, and contrary to first thoughts, NOT MANY RANDOM SAMPLES!!!!

In fact, using statistical theory, we can determine what the distribution of any statistic will be base on two things:

1. The size of the random sample only, and

2. Facts and characteristics OR assumptions about the population.

Lastly, we note that we will say or assume NOTHING ABOUT THE POPLN SIZE!!!

SAMPLING DISTRIBUTION OF THE SAMPLE AVERAGE—[pic]

Background/Assumptions—We will assume the following:

1. We have a population of measurements with a mean, (popln, and standard deviation, (popln.

2. We have a random sample of size n ( X1, X2, …, Xn ) from this population.

Facts—Here’s what we know:

1. Defn: The Sample Average is [pic].

2. [pic] = E [ Xi ] = (popln

3. [pic] = Var [ Xi ] = (2popln

4. The Xi’s are independent random variables.

Facts concerning Expectation Function.

5. E [ aX + bY + c ] = a(X + b(Y + c

6. Variance of [ aX + bY + c ] = a2(2X + b2(2Y if X and Y are independent.

Using these facts:

Thm: If X1, X2, …, Xn are a RS from a population with mean, (popln, and variance, (2popln, then

1. [pic] = E [ [pic] ] = (popln and

2. [pic] = Var [ [pic] ] = [pic].

Pf: 1. E [ [pic] ] =

2. Var [ [pic] ] =

Example #1

Suppose we take a random sample of 10 from a population that has a mean of 4 and a variance of 5. We now know that:

n =

(popln =

(2popln =

[pic] has a mean of 4 and a variance of [pic] = ½ = 0.5.

Implications

1. The rather small variance of [pic], only 0.5, ( the standard deviation of [pic] or standard error of [pic] is likewise small at 0.71) implies that the random variable [pic] has a very small spread and hence its values are very tightly grouped together.

The important implication is that no matter what value we obtain in our sample, it’s not very far away from the center or mean of [pic] or (popln. As a result we could conclude that our actual sample value of [pic] will probably be very close to the population mean, (popln value of 4.

2. The other important point to take from the theorem is that it says nothing about the shape of the population. The result is true FOR ANY POPULATION OF ANY SHAPE!

3. What is the SE of [pic]?

Could you draw a picture of the distribution of [pic] for this example? What’s missing?

Shape

One missing detail about the distribution of [pic] that is not addressed in the theorem is the shape of the distribution. Recall that the shape of the distribution is equivalent to stating the “kind” of random variable or distribution. To this end we have the following three results that we combine into one theorem on the Shape of [pic].

Thm: If X1, X2, …, Xn are a RS from a population with mean = (popln, and variance = (2popln, then

1. If the population is Normally distributed then [pic] is also Normally distributed.

2. If the population is Normal and (2popln, is unknown then [pic] = T(n-1) .

3. If n is large then [pic] is approximately Normally distributed!

Implications

1. The first result is important since we note that if we sample from a Normal population, then the sample average is exactly normally distributed, regardless of the sample size.

As a follow-up, we also know that: If the population is approx Normal, then [pic] ≈ Normal regardless of sample size.

2. In the second result we see our first encounter with another of the continuous distributions, the T distribution.

3. The last result of the Theorem is the most important theorem in statistics and is known as the CENTRAL LIMIT THEOREM. We restate here to emphasize it’s importance!

CENTRAL LIMIT THEOREM: Let [pic] be the sample average based on a large random sample, then [pic] ≈ Normal!

What makes this result so important is that it says nothing about the shape of the population—NOTHING! The only requirement is that we have a “large” sample size.

But then we need to ask what’s large? It turns out that n > 30 will yield acceptable results in most instances. However, if the population is know to be terribly skewed (in either direction) then a larger sample size will be required; how large depends on how skewed the population is.

Example #2

Suppose [pic] is the sample average lifetime ( in hours ) based on a random sample of n = 16 light bulbs from a population of light bulbs that is approximately Normally distributed with a mean of 800 hours and a standard deviation of 40 hours.

What would a picture of the population of lifetimes look like?

What is the probability that a random chosen light bulb from the population lasts between 760 and 840 hours?

What is the distribution of [pic] for this example? What does the picture look like? What is the SE of [pic]?

What is the probability that our sample average is between 760 and 840 hours?

Example #3

Suppose we take a random sample of 40 college students and obtain their cell phone usage records for the last month and determine the # of minutes each student used that month. What are the chances that our sample average is within ¼ a population standard deviation of the population mean time spent on a cell phone by college students?

What do we know about [pic]?

What does it mean to say “our sample average is within ¼ a population standard deviation of the population mean?”

What are the chances that our sample average is within ¼ a population standard deviation of the population mean time spent on a cell phone by college students?

As a final question, if an event has an 88% chance of happening, would you guess that it WOULD or WOULD NOT happen? Why?

Now suppose we actually take our sample of 40 college students and find that [pic]. What could you conclude about the mean time that college students spend on their cell phones?

NOTES AND COMMENTS

1. The sample size need only be greater than 30 for this result to hold. IT DOES NOT MATTER HOW LARGE OR SMALL THE POPULATION IS.

2. For the Sample Average to be approximately Normal, we only need the sample size to be bigger than 30. We do not need to make any assumption about the shape of the population!

Example #4

Suppose we take a random sample of 150 students and measure the IQ of each student. The population from which we selected the 150 students have a mean IQ of 115 and a standard deviation of 18 or (popln = 115 and (popln = 18. Then, since the sample size of 150 is > 30, we know the distribution of the average IQ in our sample is Normally Distributed with a mean of 115 and a standard deviation of [pic]

What does the picture of the distribution of [pic] look like? What is the SE of [pic]?

If you were to chose one value, at random, from this population of [pic] values, where would it be likely to lie? Why?

SAMPLING DISTRIBUTION OF THE SAMPLE PROPORTION—[pic]

Background/Assumptions We will assume the following:

1. We have a population and in the population the proportion of items with a given characteristic of interest is p.

2. We have a random sample of size n ( X1, X2, …, Xn ) from this population, where each of the Xi is 1 if the ith observation has the characteristic and 0 if not.

Facts Here’s what we know:

1. Recall that the Binomial Experiment was an experiment with n trials, success or failure at each trial, a constant probability of success at each trial, and independent outcomes at each trial.

2. Recall that a Binomial random variable was the number of successes in a Binomial Experiment.

3. Recall that if X was Binomial with parameter n and p, then (X = np and σ2X = np(1-p).

4. Note that if we let X be the sum of the Xi’s, then we have a Binomial Experiment and we have a Binomial random variable. Hence

X = Sum of the Xi’s = Bin (n, p).

5. Finally note that since the Sample Proportion becomes

[pic]

[pic].

Thm: If X1, X2, …, Xn are a random sample from a population with proportion p, then

The Mean of [pic] = E[ [pic] ] = p & The Variance of [pic] = Var[ [pic] ] = [pic]

Pf: 1. The Mean of [pic] = E[ [pic] ] =

2. The Variance of [pic] = Var[ [pic] ] =

Unfortunately this theorem does not say anything about the “shape” of the distribution of [pic]. However, the following theorem does!

Thm: If X1, X2, …, Xn are a random sample from a population with proportion p, then

[pic] ≈ Normal ( p, [pic] )

if np > 5 AND n(1-p) > 5 or # successes in sample > 5 AND # failures in sample > 5

Implication

If the sample size is “large,” then the Sampling Distribution of the Sample Proportion [pic] will have an approximately Normal Distribution with a mean equal to the proportion in the population, p, and standard deviation equal to the population standard deviation divided by the square root of the sample size, or [pic].

What does the distribution of [pic] look like?

Example #1

A random sample of 1,108 adults yielded 78% that said they believe there is a heaven. Letting pH be the proportion of adults who believe in heaven, [pic]is the 78%.

What can we say about the distribution of [pic] in this example? What do we need to check?

What is the probability that [pic] is within 3% of pH?

Example #2

Consider the survey from class and whether a student drinks or not. Let pND be the proportion of MU students who do not drink alcohol.

|The results from our class of n = __ who filled out the survey indicated |Now suppose we use all the other STA 301 students that I have survey |

|that only __ does not drink. Letting [pic]be the sample proportion of |information on. I have information on 118 students (the six from this |

|students who do not drink out of __, this distribution of [pic] is: |class are NOT part of the 124). Of this 124, 19 indicated they do not |

| |drink. What is the distribution of this sample proportion, [pic]? |

|Can you draw a picture of it? | |

| | |

| |Can you draw a picture of it? |

How do these two distributions compare? What are the differences between them?

Why do they differ?

Interesting fact about the SE of [pic]

Recall that the Standard Error of a statistic is nothing more than the standard deviation of the statistic. Also recall that the standard deviation of the sample proportion is given by [pic].

Consider the numerator of this expression and plot this function versus p.

[pic]

Hence we can conclude that the maximum that p(1-p) can be is 0.25. Hence without knowing what the value of the population proportion actually is, we can argue that the standard deviation of [pic] is less than or equal to [pic].

Hence a conservative estimate of the SD or SE of any [pic] is [pic].

So if n is large, we can conclude the [pic] ≈

ASSESSING THE NORMALITY OF A POPULATION

Goal: We consider as a quick aside of whether a population is Normally distributed or not. The importance of this question should be obvious, since many of the techniques we have seen are based on the premise that the population is Normal.

Normality was required in most of the inferences about a single population mean or the difference of two means. It is also required for inferences about a population variance and in the Regression and Analysis of Variance methods that we will see.

The basic premise of checking Normality of a population is that a random sample should exhibit the same characteristics of the population. In particular, if the population is Normal, then the sample should have “Normal” characteristics. In fact, the sample should have a “bell-shape.”

Illustration: Consider the following histograms of data. The first two data sets are the soda pop two-liter contents data and the Lithium Ion cell phone battery data. Based on the histograms would you be willing to conclude the population is Normal or approximately Normal? How about the last one?

[pic] [pic]

[pic]

METHODS OF ASSESSING THE NORMALITY OF A POPULATION

We will consider three methods of determining whether a population can be considered to be Normally distributed. Two of the methods are graphical in nature and the last is a hypothesis test.

In reality I typically use the graphical methods when the sample size is large (over 50 and usually in the hundreds) and the testing method when the sample is smallish (less than 30).

The three methods are:

1. A histogram or stem and leaf of the data and if the population is Normal, then the histogram should also be Normal or bell-shaped.

2. A Normal Probability Plot is a plot that will yield a straight line if the data came from a Normal population.

3. A test in which Ho: The data came from a Normal population and HA: The data came from a Non-Normal population.

The following examples will illustrate all three methods.

EXAMPLE #1: Consider the Heart Rates (based on the 30sec measurement) from the class survey for STA 261 sections in Winter ’98, Fall ’99, Fall ’00, & Fall ’02. Below are the graphical descriptive statistics and the Normal Probability Plot for these data.

[pic]

[pic]

EXAMPLE #2: This data is the parking ticket from the STA 261 Fall 2002 class.

[pic]

[pic]

EXAMPLE #3: This data is the My Age from the STA 261 Fall 2002 class.

[pic]

[pic]

EXAMPLE #4: This data is pop data.

[pic]

[pic]

EXAMPLE #5: This data is the Lithium Ion data.

[pic]

[pic]

EXAMPLE #6: This data is the last data set.

[pic]

[pic]

SUMMARY OF SAMPLING DISTRIBUTIONS AND ESTIMATION

Defn: A Sampling Distribution is the distribution of an estimator or sample statistic (ie RV), such as [pic], S2, S, [pic], M

SAMPLE AVERAGE

Defn: [pic] is the sample average of a RS of size n from a popln with mean, μpopln and variance, [pic]

Distribution of [pic]

Mean of [pic] = [pic]= E[[pic] ] = μpopln

Variance of [pic] = [pic]= [pic]

Shape: [pic] will be exactly or approximately Normally distributed if the popln is exactly or approximately distributed, regardless of the sample size, n

[pic] will be approximately Normally distributed when n ( 30

[pic] is Z if σ is known and popln is Normal

[pic].

SAMPLE PROPORTION

Defn: [pic]= [pic], where X = number in random sample with characteristic = Bin ( n, p ) where p is the proportion with characteristic in population.

Distribution of [pic]

Mean of [pic]= [pic]= E[[pic]] = p = population proportion with characteristic

Variance of [pic]= [pic]= [pic]

Shape of [pic]: Pr{ [pic] = [pic] } = Pr{ X = x } =[pic] or

Normal if np( 5 & n(1-p)( 5

-----------------------

• Population of measurements

• Distn of the popln is:

(popln

(2popln

Shape

Population

F

T

Chi-Square

Normal

Binomial

All Random Variables

Discrete

Continuous

Random Sample

X1 = first sample value

X2 = first sample value

.

.

.

Xn = first sample value

Sample

Sample

Population

S(

S3

S2

S1

RS1

RS(

RS2

X1

X2

.

.

.

Xn

Sample

Corresponding Statistic

[pic]

M

S2

S

[pic]

Parameter of Interest

(popln

Medianpopln

(2popln

(popln

p

Random

Sample

Method

p(1-p)

Statistical Inference

What does the sample statistic (eg [pic]) tells us about the population parameter (eg ()?

Foundation (Theory behind) Statistical Inference

Probability, Random Variables, Families of RV’s, and

SAMPLING DISTRIBUTIONS

RS3

p

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download