Calculating Probability



Ch2.1 Numerical Summary Measures of Center for Data

---------------------------------------------------------------------------------------------------------

Topics:

• Measures of Center: mean, median

(Note: Here we will cover the summary measure for DATA only. We will cover the measures for DISTRIBUTIONS in the 4th and 5th weeks after we introduce the concept of probability distribution.)

---------------------------------------------------------------------------------------------------------

I: Measures of Center for Data:

(1) Mean

• Mean [pic]of n observations [pic] is (here subscript doe not contain information on the magnitude of a data point)

[pic]

Ex. Sue wanted to study the systolic blood pressure (BP), x, of the NCSU freshmen; 7 freshmen were randomly selected and their BP values are

121, 110, 114, 100, 103, 130, 130 (Note: [pic])

The sample mean of BP is

[pic]

We use one more digit in the mean than the original data

Note: The R function to calculate mean is mean()

> bp mean(bp)

(2) Median

• Median is the middle value of the data such that there are same numbers of data points above it and below it.

• To get the Median [pic]of the n observations in the sample:

1) Sort the data, from the smallest to the largest

2) If n is odd, then [pic]= the middle value, i.e.,

[pic]= the [pic]th data point

If n is even, then [pic]= the average of the middle two values, i.e.,

[pic]= [pic] [ the [pic]th data point + the [pic]th data point ]

Ex. In the BP example, there are 7 observations: 121, 110, 114, 100, 103, 130, 130. The sample median is:

Sorted data is: 100 103 110 114 121 130 130. So [pic]

Ex. In BP example, there is one more data point 105. Then the sample median becomes:

The sorted data becomes: 100 103 105 110 114 121 130 130. So [pic].

Note: The R function to calculate mean is median()

> median(bp)

Comment (1) : Mean vs. Median

1. Mean is sensitive to outliers (extreme values), while median is less affected by outliers.

Ex. Data set 1: {1, 2, 3}. Data set 2: {1, 2, 99}.

The mean of data set 1: 2

The median of data set 1: 2

The mean of data set 2: 34

The median of data set 2: 2

2. Mean is the __balance point_ of the data.

A balance point is the point such that

|sum of the distance of |= |sum of the distance of |

|the points above the mean | |the points below the mean |

Ex. A sample consists of 5 data points 1, 2, 3, 10, 14.

The mean [pic]= 6

|Data point [pic] |Data point[pic] |

| |1, 2, 3 |

|10, 14 | |

|Total distance to [pic]=4+8=12 |Total distance to [pic]=5+4+3=12 |

Median is the _midpoint__ of a the distribution

That is, half of the data points are above or below the median.

3. The relationship between mean and median depends on the shape of the distribution

a. For symmetric distribution, mean [pic] median

[pic]

b. For positively-skewed distribution, mean > median

[pic]

c. For negatively-skewed distribution, mean < median

[pic]

← In other word, from the relationship between mean and median, we can guess the shape of the distribution

Comment (2) : Change of Unit

Mean and median share the same unit as the measuring scale. The values change with the measuring unit.

Original data: [pic], transformation: y = ax + b.

New data:

When unit of measure changes from [pic] to [pic], then

The new mean [pic]

The new median [pic]

Ex. The temperatures in Raleigh in the next 6 days are predicted to be 43, 39, 33, 39, 45 and 48 in Fahrenheit. What are the mean and median of these temperatures. What are the mean ([pic]) and median ([pic]) if we switch to Centigrade? Note that [pic].

Mean = 41.2 (F)

Median = 41 (F)

New mean in C = 41.17*5/9 = 22.9 (C)

New median in C = 22.8 (C)

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download