The normal distribution - UC Davis Plants



Measures of Central Tendency and Dispersion [ST&D p. 16-27]

Individual values of a population are designated Yi, i = 1,...,N, where N= size of pop.

Individual values of a sample are also denoted Yi, i = 1,...,n, where n= size of the sample.

Greek letters are used for population parameters (µ = pop. mean; σ2 = pop. variance).

Mean or average (measure of central tendency)

• Pop. mean: [pic] * Sample mean: [pic]

Variance (measure of dispersion of the individuals about the mean)

• Pop. variance: [pic]* Sample variance:[pic]

The quantities (Yi -  ) are called deviations.

To express these measures of dispersion in the original units of observation:

• Pop. standard deviation: [pic] * Sample standard deviation: [pic]

To express the standard deviation in units of the mean (or %):

• Pop. coeff. of variation: [pic] * Sample coeff. of variation: [pic]

Visualization of central tendency and dispersion using boxplots

[pic]

• Review ST&D p. 58 Estimation and inference, p53: 3.8 Distribution of means

Measures of dispersion of sample means

An important population parameter is the sample variance of the mean ([pic]).

If you repeatedly sample a population by taking samples of size n, the variance of those sample means is what we call the sample variance of the mean.

It relates very simply to the population variance:

Variance of the mean: [pic]

We can estimate[pic] for a population by taking r independent, random samples of size n from that population, calculating the sample means [pic], and then calculating the variance of those sample means.

[pic]

The square root of [pic] is called standard error (or standard deviation of a mean).

Standard error: [pic]

• As with the standard deviation, this is a quantity in the original units of observation.

• The SE is important in determining confidence intervals and the powers of tests.

The Normal distribution (~N)

If you measure a quantitative trait most of the measurements will cluster near the population mean (µ), and as you consider values further and further from µ, individuals exhibiting those values become rarer.

• Some basic characteristics of this kind of distribution are:

1) The maximum value occurs at µ;

2) The dispersion is symmetric about µ (i.e. the mean, median, and mode of the population are equal); and

3) The “tails” asymptotically approach zero.

A distribution which meets these basic criteria is known as a normal distribution.

• The following conditions tend to result in a normal distribution:

1) There are many factors which contribute to the observed value of the trait;

2) These many factors act independently of one another; and

3) The individual effects of the factors are additive and of comparable magnitude.

• Many biological and ecological variables are approximately normally distributed.

• The bell-shaped normal distribution is also known as a Gaussian curve, named after Friedrich Gauss who figured out the formal mathematics:

• [pic] Z(Y) is the height of the curve at a given observed value Y.

• The location and shape are uniquely determined by only two parameters, µ and σ2.

• If we set µ = 0 and σ2 = 1, we obtain a standard normal curve [N(0,1)]:

• By varying the value of µ, one can center Z(Y) anywhere on the x-axis.

• By varying σ2, one can freely adjust the width of the central hump.

|[pic] |[pic] |[pic] |

To convert any ~N into a standard N curve:

Standard N curve where -( centers to 0

( =0, (=1 /( puts variation in units of (

The following % of items lie within the indicated limits:

( ( ( contains 68.27% of the items

( ( 2( contains 95.45% of the items

( ( 3( contains 99.73% of the items

Conversely:

50% of the items fall between ( ( 0.674(

95% of the items fall between ( ( 1.960(

99% of the items fall between ( ( 2.576(

Q1: From a ~N population of finches with mean weight µ = 17.2 g and variance σ2 = 36 g2, what is the probability of randomly selecting an individual finch weighing > than 22 g?

Solution: To answer this, first convert the value 22 g to its corresponding normal score:

[pic]

Table A14: 21.19% of the area lies to the right of Z = 0.8. Then, 22 g is not an unusual weight for a finch in this population (less than 1 SD from the mean).

Q2: From the same population. What is the probability of randomly selecting a sample of 20 finches with an average weight of more than 22 g?

This question is asking for the probability of selecting a sample of a certain average value.

For a sample of size n = 20, the appropriate distribution to consider is the normal distribution of sample means

for sample size n = 20 (µ = 17.2 g and [pic]

With this in mind, we proceed as before:

[pic]

Table A14: only 0.02% of the area lies to the right of Z = 2.67 (only 0.02% chance)

22 g is an extremely unusual mean weight for a sample of twenty finches in this population (it is >3 SE from the mean!).

One final word about the wide applicability of the normal distribution:

Use of the normal distribution table (page 612, Appendix A4)

For any value of Z, the table reports the area under the curve to the right of Z.

This area to the right of Z is the theoretical probability of randomly picking an individual from N(0,1) whose value is greater than Z.

Normal probability plot (Q-Q plot) ST&D p. 566

14 malt extract values: 77.7, 76.0, 76.9, 74.6, 74.7, 76.5, 74.2, 75.4, 76.0, 76.0, 73.9, 77.4, 76.6, 77.3 (ST&D p. 30, Lab1). N=14 (

Divide ~N in 14 intervals = area.

Normal line: slope=s=1.227, intercept= [pic] =75.943. y= a+bx

Graphic tool for assessing normality

-----------------------

Grading                        

• Homework 25 %

• In-class quiz 5 % (Jan. 29, 9:00 a.m.)

• First exam      35 % (Feb. 12 and due Feb. 17, 9:00 a.m.)

• Second exam  35 % (March 12 and due March 17, 5:00 p.m.)

[pic]

Box Plots

median

mean

1.5 IQ

range

interqartile

(IQ) range

*

0

Outliers

0 >1.5 IQ and3 IQ

µ

Frequency

of observation

Observed

value

[pic]

[pic]

Location and Scale transformation (when ((0 and/or ((1)

N(1,1) -(= N(0,1)

Z= (Y-()/(

N(0,2) /(= N(0,1)

[pic]

[pic]

[pic]

[pic]

-5 0 1 5

-5 0 5

-5 0 5

-5 0 5

68.27%

95.45%

.45

99.731 5

-5 0 5

-5 0 5

-5 0 5

68.27%

95.45%

.45

99.73%7%

17.2

Y

22.0

Question: What is this area?

Or: P(Y≥22) = X

0.8

Z

0

Answer:

P(Y≥22) = P(Z≥0.8) = 0.2119

The central limit theorem states that, as sample size increases, the distribution of sample means drawn from a population of any distribution will approach a normal distribution with mean µ and variance σ2/n.

P(0.42(Z ( 1.61)=

P(Z ( 0.42) - P(Z ( 1.61)=

0.3372 - 0.0537 = 0.2835

From Table

P(Z ( 1.17)= 0.121 (pb inside Table)

If asked

P(Z ( 1.17)=1- P(Z ( 1.17)= 0.879

P(|Z| ( 1.05)=

2 * P(Z ( 1.05)=

2 * 0.1469= 0.2938

P(-1.61(Z ( 0.42)=

P(Z ( -1.61) - P(Z (0.42)=

1- P(Z ( 1.61) - P(Z ( 0.42)=

[1- 0.0537] - 0.3372=

0.9463 - 0.3372=0.6091

[pic]

78.4

Sahpiro-Wilk test for ~N

Correlation coefficient between the data and the normal scores.

W=1 perfect ~N

W=0.8 ~N?

SAS

PROC UNIVARIATE

NORMAL;

Pr ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download