Sampling Distributions



Sampling Distributions

___________________________________________

1) Revisit the difference between a statistic and a parameter?

2) Discuss factors that determine whether an estimate of a parameter is ‘good’ or ‘bad’.

3) Define a ‘sampling distribution’ and discuss the properties of same.

4) Answer the following burning question: Why do we take relatively large samples of data?

How can I estimate the number of siblings that people in this class have?

___________________________________________

Take a sample and calculate:

a) mean

b) median

c) mode

d) (High Score + Low Score) / 2

How do I know which of these options is the best?

1) Working with a known population

Take a sample from a population with known parameters and calculate different stats (e.g., mean, median, mode, [High + Low] / 2) and compare them with the population parameter.

2) Repeated Samples method

Take population with known parameters and see how the distributions of the different statistics compare with the population parameter.

What's a Sampling Distribution?

___________________________________________

Sampling Distribution - probability distribution calculated from repeated samples of n measurements

We are going to model sampling distributions as continuous RVs (eventually).

Why is this appropriate?

B/C mean does not have to be a possible outcome of the experiment.

What does this buy us?

B/C we know how to calculate the area under the normal curve for continuous RVs. That will allow us to calculate the probability of obtaining a sample with a given statistic (e.g., mean) from a population.

How are we going to do this?

Patience, my child. All will be revealed.

Constructing a sampling distribution

___________________________________________

3 5 7 9 11

( = 7

( = 2.8 ( 3

How many unique samples could we draw from this population (without replacement) if n = 2?

Answer: (5 choose 2) or 10.

|Mean |4 |5 |6 |7 |8 |9 |10 |

|P | | | | | | | |

| | | | | | | | |

| | | | | | | | |

(M = 7

(M = (3.2 =1.8 ( 2

Important things about this example

___________________________________________

1) A sampling distribution can be constructed by taking repeated samples from the population.

2) This information can be used to determine how well the sample statistic matches the population.

3) Note in this case that the mean of the sampling distribution was equal to the mean of the population, and that the standard deviation of the sampling distribution was smaller than that of the population.

4) Still haven’t told you what makes for a good statistic.

Properties of a good estimator

________________________________________

Point Estimator - rule or formula that tells us how to use the sample data to calculate a single number that can be used as an estimate of the population parameter (really just another word for statistic).

A good point estimator (statistic) is:

(a) unbiased

mean of the sampling distribution equals the mean of the population.

(b) minimum variability

The variability of the sampling distribution is called the Standard Error. Sometimes referred to as reliability.

Can we control biasedness?

What if the mean of the sampling distribution is too high/low?

Can we control variability?

a) Choose random samples

b) Choose large samples

If we can only have one, which one do we want?

So, you want to construct a sampling distribution…

________________________________________

Not so fast, Skippy. Can you envision a problem that might prevent you from constructing a sampling distribution?

Let’s construct a sampling distribution for n=5 for this class:

a) How many observations would be in the sampling distribution?

b) What about samples of 20 at AC?

Can computer technology save us?

Restricted samples:

Unrestricted samples:

________________________________________

Is this the end?

Is class dismissed until the final?

Is there no way to save the semester?

Our hero

________________________________________

Central Limit Theorem - When n, the number of observations in a sample taken from a population, is sufficiently large (n ( 30), the sampling distribution of M (the mean of the sample) will be approximately normal.

Further, the larger n gets, the more closely the sampling distribution will approximate a normal distribution.

Finally,

a) (M = ( and

b) (M = ( / (n and

c) z = (M - () / (M

= (M- () / (( / (n)

Coin Flipping Example

_______________________________________

The outcome of a coin flip is distributed

uniformly: 50% heads, 50% tails.

Let’s see the CLT in action:

Flip a coin once and tell me the # of heads.

Flip a coin twice and tell me the # of heads.

Flip a coin 10 times and tell me the # of heads.

Flip a coin 30 times and tell me the # of heads.

Using the CLT: Rush Example

________________________________________

You are deciding whether or not to rush ((( (it’s a special Stats Honor Fraternity) and, because you are the type of person who would rush a Stats Honor Fraternity, you want to know what the average intelligence level of the frat is. You ask Eric Stratton, the Rush Chairman (he seemed real glad to meet you) what the average GPA in the house is. He says, “( = 3.5 and ( = .6”. You randomly poll 36 fraternity members and find that the mean of the sample is 3.4. What do you conclude?

P(z ( [M-(] / [(/(n])

P(z ( [3.4-3.5] / [.6/(36])

P(z ( [-.1 / .1])

P(z ( -1) = Area(Tail -1.0) = .1587

Would you alter your conclusion if the mean of the sample was 3.2? How?

P(z ( [M-(] / [(/(n])

P(z ( [3.2-3.5] / [.6/(36])

P(z ( [-.3 / .1])

P(z ( -3) = Area(Tail -3.0) = .0013

More Chips Ahoy

___________________________________________

Remember a few weeks ago, you and Biff were trying to figure out the probability that ONE Chips Ahoy cookie, which is supposed to have 23 chips could have as few as 17 chips. Let's say you re-conduct the experiment, but you're smarter now, so rather than examine 1 cookie, you collect a sample of 49 cookies (I imagine you got sick after eating the stimuli). The mean number of chips in your sample was 20, and the standard deviation was 17.5 chips. Do you have just cause for a legal action against Chips Ahoy? In other words, what is the probability that your sample of cookies was drawn from a population with ( = 23?

Central Limit Theorem with Proportions

________________________________________

(p = p

(p = ((p(1-(p) / n

z = p - (p / (p = p - (p / ((p(1-(p)/n

Applying the CLT with proportions: Blood Example

________________________________________

Nine percent of the U.S. Population has Type B blood. What is the probability that 12.5% of a random sample of 400 people will have Type B blood?

P(p ( .125) = P (z ( [.125 – .09] / (p)

(p = ((p(1-(p) / n

= ((.09)(.91) / 400

= .014

P = (z ( [.125 – .09] / .014)

= (z ( 2.5)

Area (Tail: 2.5) = .0062

CLT with proportions: Christmas Example

________________________________________

Sixty percent of the U.S. Population believes that Christmas presents should be opened on Christmas morning, as opposed to Christmas Eve. What is the probability that 65 people out of a random sample of 125 will agree that Christmas morning is the appropriate time to open presents?

Why do we sample?

________________________________________

1) To ensure an unbiased estimator (i.e., random sample).

2) To decrease the variability of our estimator (i.e., increase its reliability).

3) To enable us to use the Central Limit Theorem as a way of modeling chance variation in our sample.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download