Chapter 13



Chapter 13

STATISTICS

13.1 Visual Displays of Data

13.2 Measures of Central Tendency

13.3 Measures of Dispersion

13.6 Regression and Correlation( if time permits)

13.1 Visual Displays of Data

DEFINITIONS:

The distribution of a variable provides the possible values that a variable can take on and how often these possible values occur. The distribution of a variable shows the pattern of variation of the variable.

Frequency Distributions

When a data set includes many repeated items, it can be organized into a frequency distribution, which lists the distinct data values (x) along with their frequencies ( f ). The frequency designates the number of times the corresponding item occurred in the data set.

Relative frequency of each distinct item is the fraction, or percentage, of the data set represented by the item. If n denotes the total number of items, and a given item, x, occurred f times, then the relative frequency of x is f/n.

Example 1

The 25 members of a psychology class were polled as to the number of siblings in their individual families. Construct a frequency distribution and a relative frequency distribution for their responses, which are shown here.

2, 3, 1, 3, 3, 5, 2, 3, 3, 1, 1, 4, 2, 4, 2, 5, 4, 3, 6, 5, 1, 6, 2, 2, 2

Solution

Let's Do It! Frequency Distributions

Quiz Scores in an Economics Class

The following data are quiz scores for the members of an economics class.

[pic]

Use the given data to do the following:

(a) Construct frequency and relative frequency distribution table.

(b) Construct a histogram.

(c) Construct a frequency polygon.

Grouped Frequency Distributions

Data sets containing large numbers of items are often arranged into groups, or classes. All data items are assigned to their appropriate classes, and then a grouped frequency distribution can be set up and a graph displayed.

[pic]

Example 2

Forty students, selected randomly in the school cafeteria on a Monday morning, were asked to estimate the number of hours they had spent studying in the past week (including both in-class and out-of-class time). Their responses are recorded here.

[pic]

Tabulate a grouped frequency distribution and a grouped relative frequency distribution and construct a histogram and a frequency polygon. Use 7 classes with uniform width of 10 and use a lower limit of 10 inches for the first class.

Solution

Let’s Do it! Heights of Baseball Players

The heights (in inches) of the 54 starting players in a baseball tournament were as follows.

[pic]

Use five classes with a uniform class width of 5 inches, and use a lower limit of 45 inches for the first class.

a) Construct grouped frequency and relative frequency distributions, in a table similar to example above. (In each case, follow the suggested guidelines for class limits and class width.)

(b) Construct a histogram.

(c) Construct a frequency polygon.

Stem-and-Leaf Displays

The stem-and-leaf display conveys at a glance the same pictorial impressions that a histogram would convey without the need for constructing the drawing. It also preserves the exact data values.

Example 3

Present the study times data of Example 2 in a stem-and-leaf display.

[pic]

Solution

The tens digits, to the left of the vertical line, are the “stems,” while the corresponding ones digits are the “leaves.” We have entered all items from the first row of the original data, from left to right, then the items from the second row, and so on through the fourth row.

[pic]

Notice that the stem-and-leaf display of Example 3 conveys at a glance the same pictorial impressions that a histogram would convey without the need for constructing the drawing. It also preserves the exact data values.

Let’s Do It!

Construct the stem and leaf representation of the data.

[pic]

Pie Chart

A graphical alternative to the bar graph is the circle graph, or pie chart, which uses a circle to represent the total of all the categories and divides the circle into sectors, or wedges (like pieces of pie), whose sizes show the relative magnitudes of the categories.

The angle around the entire circle measures 360°. For example, a category representing 20% of the whole should correspond to a sector whose central angle is

20% of 360°=0.20*360=72o

Example 4

Nola Akala found that, during her first semester of college, her expenses fell into categories as shown in Table .Present this information in a circle graph.

Solution

The central angle of the sector for food is 0.30*(360o)=108o, for Rent is 0.25*(360o)=90o.

Calculate the other four angles similarly.

A circle graph shows, at a glance, the relative magnitudes of various categories

End of section 13.1. Start your online homework on MyMathLab.

13.2 Measures of Central Tendency

It would be desirable to have a single number to serve as a kind of representative value for the whole set of numbers—that is, some value around which all the numbers in the set tend to cluster, a kind of “middle” number or a measure of central tendency.

Three such measures are discussed in this section.

1) Mean 2) Median 3) Mode

Mean The most common measure of central tendency. The mean of a set of data items is found by adding up all the items and then dividing the sum by the number of items

[pic]

Example

Mean Number of Children per Household

Suppose that the number of children in a simple random sample of 10 households is as follows: 2, 3, 0, 2, 1, 0, 3, 0, 1, 4

(a) Calculate the sample mean number of children per household.

Solution

a) The sample mean number of children per household is given by:

[pic]

.

Let's Do it!

Last year’s annual sales for eight different flower shops are given below. Find the mean annual sales for the eight shops.

[pic]

Median

DEFINITION:

The median of a set of n observations, ordered from smallest to largest, is a value such that half of the observations are less than or equal to that value and half the observations are greater than or equal to that value.

If the number of observations is odd, the median is the middle observation.

If the number of observations is even, the median is any number between the two middle observations, including either of the two middle observations.

To be consistent, we will define the median as the mean or average of the two middle observations.

Location of the median: (n+1)/2, where n is the number of observations.

Example

The ages of twenty subjects are given as follows

32 37 39 40 41 41 41 42 42 43 44 45 45 45 46 47 47 49 50 51

Solution

Calculating (n+1)/2 we get (20+1)/2 = 10.5. So the two middle observations are the 10th and 11th observations, namely 43 and 44. The median is the mean of these two middle observations, (43+44)/2=43.5 years.

32 37 39 40 41 41 41 42 42 43 44 45 45 45 46 47 47 49 50 51 [pic]

Let's Do It! 1

Median Number of Children per Household

Find the median number of children in a household from this sample of 10 households, that is, find the median of

Number of Children: 2 3 0 1 4 0 3 0 1 2

(a) Order the observations from smallest to largest:

(b) Median = ______________

Another Measure—The Mode

DEFINITION:

The mode of a set of observations is the most frequently occurring value; it is the value having the highest frequency among the observations.

The mode of the values: { 0, 0, 0, 0, 1, 1, 2, 2, 3, 4 } is 0

For { 0, 0, 0, 1, 1, 2, 2, 2, 3, 4 } two modes, 0 and 2 (bimodal)

What would be the mode for { 0, 1, 2, 4, 5, 8 } ?

For {0, 0, 0, 0, 0, 1, 2, 3, 4, 4, 4, 4, 5 } ?

Example

1=White,

2=Asian,

3=African-American,

4=Hispanic,

5=American Indian,

6=No category listed,

Then the mode would be the value 1.

Central Tendency in Frequency Table

Weighted Mean ( mean of frequency table)

The weighted mean of a group of (weighted) items is the sum of all products of items times weighting factors, divided by the sum of all weighting factors. The weighted mean formula is commonly used to find the mean for a frequency distribution. In this case, the weighting factors are the frequencies.

[pic]

Median of frequency table

[pic]

Example

Find the mean, median and modal salary for a small company that pays annual salaries to its employees as shown in the frequency distribution in the margin.

Solution

Create a third column to weigh the x’s by multiplying each x by its frequency. Then, divide the total number of observation which is represented by the sum of the frequencies.

Mean Salary =[pic]

To Find the median salary, we must first find it’s position [pic]. This indicates that it is the 23rd observation that is Median = $18,500

To find the mode, we locate the x value with the highest frequency. That is x=$18,500

Let’s Do It!

Find the medians for the following distributions.

[pic]

[pic]

End of section 13.2. Start your online homework on MyMathLab.

13.3 Measures of Dispersion

Both sets of data have the same mean, median and mode but the values obviously differ in another respect -- the variation or spread of the values.

The values in List 1 are much more tightly clustered around the center value of 60. The values in List 2 are much more dispersed or spread out.

List 1: 55, 56, 57, 58, 59, 60, 60, 60, 61, 62, 63, 64, 65

mean = median = mode = 60

X

X

XXXXXXXXXXX .

35 40 45 50 55 60 65 70 75 80 85

List 2: 35, 40, 45, 50, 55, 60, 60, 60, 65, 70, 75, 80, 85

mean = median = mode = 60

X

X

X X X X X X X X X X X .

35 40 45 50 55 60 65 70 75 80 85

What is needed here is some measure of the dispersion, or spread, of the data.

Two of the most common measures of dispersion, the range and the standard deviation, are discussed in this section.

Range

The range is the simplest measure of variability or spread.

Range is just the difference between the largest value and the smallest value.

[pic]

Let’s Do It!

Find the range of list 1 and list 2 above. Determine which list has a larger spread.

Standard Deviation

.…...a measure of the spread of the observations from the

mean.

.……think of the standard deviation as an “average distance of the observations from the mean.”

Example 5.9 Standard Deviation—What Is It?

[pic]

Mean: (0+5+7)/3= 4

Deviations: -4, 1, 3

Squared Deviations: 16, 1, 9

[pic]

[pic]

[pic]

Let’s Do It!

Find the standard deviation of the sample by using the step-by-step process

7, 9, 18, 22, 27, 29, 32, 40

What is the mean of the data?

Complete the table of deviations and their squares below

[pic]

Compute the standard deviation s of the data.

Central tendency and dispersion (or “spread tendency”) are different and independent aspects of a set of data. Which one is more critical can depend on the specific situation.

Consider a situation involves target shooting (also illustrated at the side). The five hits on the top target are, on average, very close to the bulls eye, but the large dispersion (spread) implies that improvement will require much effort. On the other hand, the bottom target exhibits a poorer average, but the smaller dispersion means that improvement will require only a minor adjustment of the gun sights. (In general, consistent errors can be dealt with and corrected more easily than more dispersed errors.)

Coefficient of Variation Look again at the top target pictured above.

The dispersion, or spread, among the five bullet holes may not be especially impressive if the shots were fired from 100 yards, but would be much more so at, say, 300 yards. There is another measure, the coefficient of variation, which takes this distinction into account. It is not strictly a measure of dispersion, as it combines central tendency and dispersion. It expresses the standard deviation as a percentage of the mean.

Often this is a more meaningful measure than a straight measure of dispersion, especially when comparing distributions whose means are appreciably different.

[pic]

Example

Compare the dispersions in the two samples A and B.

A: 12, 13, 16, 18, 18, 20 B: 125, 131, 144, 158, 168, 193

Use you calculator to find the mean and standard deviation for the two samples A and B.

[pic]

From the calculated values, we see that the value of the sample B has a larger standard deviation than sample A. But sample A actually has the larger relative dispersion (coefficient of variation). The dispersion within sample A is larger as a percentage of the sample mean.

Let’s Do it!

Two brands of car batteries, both carrying 6-year warranties, were sampled and tested under controlled conditions. Five of each brand failed after the numbers of months shown here.

Brand A: 75, 65, 70, 64, 71

Brand B: 69, 70, 62, 72, 60

(a) Calculate both Brands means.

Brand A mean= Brand B mean=

(b) Calculate both brands standard deviations.

Brand A standard deviation s=

Brand B standard deviation s=

(c) Which brand battery apparently lasts longer?

(d) Calculate the coefficient of variation for both brands.

(e) Which brand battery has the more consistent lifetime? Explain why?

End of section 13.3. Start your online homework on MyMathLab.

TI Quick Steps

Obtaining Summary Measures

Step 1 Clear data.

Step 2 Enter data to be summarized.

Step 3 Obtain the summary measures for the data in L1.

Summary measures are obtained by requesting the 1-Var Stats from under the STAT CALC menu list. The sequence of buttons is as follows:

The 1-Var Stats are now displayed in the window. Notice that both the sample standard deviation [pic] and the mean [pic] are provided.

-----------------------

[pic]

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download

To fulfill the demand for quickly locating and searching documents.

It is intelligent file search solution for home and business.

Literature Lottery

Related searches