Measures of Location



Measures of Location

Averages

Averages can be tricky.

Consider:

|Rate of Return | | | | | |

| |Year 1 |Year 2 |Year 3 |Year 4 |Year 5 | |

| | | | | | | |

| |0.07 |0.1 |0.12 |0.3 |0.15 | |

What is the average rate of return over the five year period?

Arithmetic average = .148

Correct average = .145321

(The correct average is that value which when compounded for 5 years gives the same result as the observed compounding rates, in other words the solution to the equation: [pic] )

Consider:

Dallas and Fort Worth are approximately 30 miles apart. On a round trip from Dallas to Fort Worth and back, you average 30 mph on the first leg from Dallas to Fort Worth. How fast to you have to travel on the return leg from Fort Worth to Dallas so that you average 60 mph for the round trip?

Usual answer: 90 mph

Correct answer: it is impossible

Both of the above are common errors.

Measures of Location

The Arithmetic Average

The arithmetic average of a set of values is the sum of the values divided by the number of values.

If x1, x2, . . . . xn represent the n numerical values from a random sample, then the formula for the sample mean is:

[pic]

To find the average( when I use this term subsequently, I will mean the arithmetic average), using EXCEL, one uses the function “average”. It is used just like the “median” function.

Specifically, one types “=average( range of data)”. For the data on steel thickness, you would have something that looks like the below:

By closing the parentheses, you get the average for the data as 354.55.

Computation of the Arithmetic Mean

From Grouped Data

If we do not have the raw data but only the frequency distribution of the data, the formula for the sample mean becomes:

[pic]

EXCEL does not compute this formula directly. To compute this in EXCEL for the steel thickness data, one can use the following procedure:

| | |m(i) |f(i) | |

|Interval | |Midpoint |Freq |f(i)*m(i) |

| | | | | |

|341.5 |344.5 |343 |1 |343 |

|344.5 |347.5 |346 |3 |1038 |

|347.5 |350.5 |349 |8 |2792 |

|350.5 |353.5 |352 |8 |2816 |

|353.5 |356.5 |355 |20 |7100 |

|356.5 |359.5 |358 |13 |4654 |

|359.5 |362.5 |361 |5 |1805 |

|362.5 |365.5 |364 |2 |728 |

| | | | | |

| | | |60 |21276 |

| | | | | |

| | | |Average |354.6 |

If one defines the proportion of observations in a bin as

[pic]

then the formula for the mean from grouped data (and also the formula for a discrete probability distribution) is:

[pic]

Using the above, it is then possible to generalize the definition of the mean for data from a continuous distribution with probability density function f(x) as:

Computation with the Average

Consider the problem of having two groups of people, 50 people in Group 1 with an average hourly wage of $15.00 and 100 people in Group 2 with an average hourly wage of $17.00, can I find the mean of the pooled group of 150 people.

The average of the pooled group is just the total hourly wages of all 150 people divided by the 150 people. Using the formula for the arithmetic average, one can show that:

[pic]

Therefore the sum of the hourly wages in the first group is 50 x 15 = 750.

The sum of the hour wages in the second group is 100 x 17 = 1700. Finally the mean of the pooled group is:

pooled average = (750 + 1700)/ (50 + 100) = $16.33

This can be written in formula terms as:

[pic]

This is a special case of the formula for multiple groups:

[pic]

Consider the following example which we discussed previously in connection with the median:

| |Group |Group |Change |

| |1 |2 | |

| | | | |

| |5 |4 |-1 |

| |10 |12 |2 |

| |15 |18 |3 |

| |20 |19 |-1 |

| |25 |23 |-2 |

| | | | |

|Average |15 |15.2 |0.2 |

Notice that the change in the means is the same as the mean of the changes.

Summary

Criterion Median Mean

Ease of Understanding High Reasonable

Computation Moderate Easy

Effect of Outliers None High

Use in Further Computation None Easy

Accuracy for Inference to

Population for a fixed sample

of size n 25% worse than mean Baseline

Simpson’s Paradox

Consider the following data found in the file “meandemo.xls”:

| | | |Male | | |Female |

| | |Males |Average | |Females |Average |

| | | | | | | |

|Prof | |35 | 60,000 | |5 | 65,000 |

| | | | | | | |

|Assoc Prof | |25 | 50,000 | |20 | 55,000 |

| | | | | | | |

|Asst Prof | |15 | 40,000 | |15 | 45,000 |

| | | | | | | |

| |Average | | 52,667 | | | 52,500 |

Or the following data also found in the file “meandemo.xls”:

| | |Time 1 | | |Time 2 | |Median |

| |Time 1 |Median | |Time 2 |Median | |Change |

| | | | | | | | |

| |30 | | |31 | | | |

|Group 1 |35 |35 | |32 |32 | |-3 |

| |48 | | |75 | | | |

| | | | | | | | |

| |14 | | |60 | | | |

|Group 2 |85 |85 | |83 |83 | |-2 |

| |98 | | |85 | | | |

| | | | | | | | |

| |60 | | |61 | | | |

|Group 3 |63 |63 | |62 |62 | |-1 |

| |65 | | |98 | | | |

| | | | | | | | |

| | | | | | | | |

|All | | | | | | | |

|Groups | |60 | | |62 | |2 |

Measures of Scale

The simplest way to measure scale is to find the average distance of each datapoint from the measure of location (in our case the arithmetic mean). Symbolically this can be written:

[pic]

The fact that some deviations are positive and some negative can be corrected in one of two ways:

1) Use the absolute value to compute the mean absolute deviation (MAD), which in formula terms is:

[pic]

or 2) Use the square of the deviations which in formula terms gives:

[pic]

and,

[pic]

In EXCEL, the function “stdev” uses the above formula for computing the sample standard deviation:

For the steel thickness data, you would type “=stdev(range)” as shown below:

This yields the value of s=4.492549.

EXCEL does not automatically compute the standard deviation if the data is grouped. The computing formula to use in this case is given by:

[pic]

and then taking the square root.

The necessary terms can be computed in EXCEL as shown in the following table for the steel data:

| | |m(i) |f(i) | | |

|Interval | |Midpoint |Freq |f(i)*m(i) |f(i)*m(i)*m(i) |

| | | | | | |

| | | | | | |

|341.5 |344.5 |343 |1 | 343 | 117,649 |

|344.5 |347.5 |346 |3 | 1,038 | 359,148 |

|347.5 |350.5 |349 |8 | 2,792 | 974,408 |

|350.5 |353.5 |352 |8 | 2,816 | 991,232 |

|353.5 |356.5 |355 |20 | 7,100 | 2,520,500 |

|356.5 |359.5 |358 |13 | 4,654 | 1,666,132 |

|359.5 |362.5 |361 |5 | 1,805 | 651,605 |

|362.5 |365.5 |364 |2 | 728 | 264,992 |

| | | | | | |

| | |Sum |60 | 21,276 | 7,545,666 |

which yields an estimate of s = 4.5031.

If only the proportion of observations in each bin is available, then the following approximate formula may be used:

[pic]

which in this case yields the value of s = 4.465423.

The standard deviation for data following a theoretical distribution function f(x) can also be defined as:

[pic]

and,

[pic]

Further Uses of the Mean and Standard Deviation

The Mound Rule:

For data which is “mound” shaped, approximately

Percent of Data Region

68% mean +/- one standard deviation

95% mean +/- two standard deviations

99.7% mean +/- three standard deviations

For the steel thickness data (which is mound shaped) the exact results are:

|Region | |Values | | |% |

| | | | | | |

|mean +/- 1 sd |350.1 |to |359.0 |73.0% |

|mean +/- 2 sd |345.6 |to |363.5 |96.7% |

|mean +/- 3 sd |341.1 |to |368.0 |100.0% |

Chebyshev’s Inequality

For any distribution, at least 100(1- 1/k2)% of the data must lie in the region, the mean +/- k standard deviations.

Specifically, for k=2, at least 75% of the data must lie in the range mean +/- 2 standard deviations.

For k=3, at least 88.9% of the data must lie in the range mean +/- 3 standard deviations.

Measures of Relative Position

Class Mean Standard Deviation

Monday 85 6

Wednesday 90 8

A Student from the Monday night class takes the Wednesday exam and scores 92 To what score in the Monday night class, does this score correspond?

Define:

and

[pic]

For the example,

t = (92-90)/8 = .25

xMonday = 85 + .25 x 6 = 86.5

-----------------------

[pic]

[pic]

[pic]

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download

To fulfill the demand for quickly locating and searching documents.

It is intelligent file search solution for home and business.

Literature Lottery

Related searches