Math 3307



Math 3307

Module 1: Representations of Data

Descriptive Statistics

Central Tendency

Spread

Fractiles

Rates of change

Z score

Representations

Dot diagrams

Charts

Stem and Leaf Plots

Box and Whisker Plots

Scatter Plots

Descriptive Statistics

Descriptive statistics take raw data and present it in a way that highlights the important material WITHOUT drawing inferences or generalizations for the viewer. Predictive statistics provide a means to make a judgment, a prediction, or an inference about a situation.

For example, taking the 2010 census data and noting that the population of the United States is now 307, 006, 550 people and reporting this is descriptive. Even noting that in 2000 the population was 281, 421, 906 and the new number is 1.091 times larger than the populations in 2009 is descriptive. Taking these numbers, plotting a linear regression line and predicting the population of the USA in 2015 is NOT descriptive, it is making an estimate, a prediction. Making generalizations is NOT descriptive either. If you come to a generalization about a situation, you are out of the area of descriptive statistics.

Problem DS1

Which of the following conclusions may be obtained from the following data by purely descriptive methods and which require generalizations?

A student in my Spring Pre-calculus class took 4 consecutive daily quizzes and got the following scores: 3, 8, 10, and 12.

a. On only 1 day did he get less than 5 right.

b. The student’s number correct increased on each successive quiz.

c. The student got better at guessing what I was going to ask each day.

d. On the last day the student copied his answers from his neighbor.

Problem DS2

Smith and Jones are hairdressers. On a recent day, Smith cut the hair of 4 male clients and 2 female clients. While Jones cut hair on 3 males and 3 females.

a.) The amount of time it takes Smith and Jones to do a haircut is approximately the same.

b.) Smith always cuts hair on more males than females.

c.) The two always have the same number of clients per day.

d.) Over a week, Smith averages 6 clients a day.

Problem DS3

Which of the following conclusions can be obtained by descriptive methods and which require generalizations?

Driving the same model of car, 5 different drivers averaged 15.5, 14.7, 16.0, 15.5 and 14.8 mpg.

a.) None of the drivers averaged more than 16 mpg.

b.) The second driver must have driven on rural roads.

c.) 15.5 is the average mpg most often achieved.

d.) The third driver drove faster than the other 3.

Typical types of summaries of data

Measures of Central Tendency – these are the numbers that describe what is normal, usual, and in the middle or the center. These terms are very loose and need firming up mathematically, of course.

The most popular measure of “centeredness” is the Mean (sometimes called the average).

The mean of n numbers is the sum of the numbers divided by n. If you are working with a data set of measurements, the mean is denoted: [pic].

There are some very cogent reasons for it’s popularity:

It can always be calculated and it’s easy to calculate.

It is unique: there is only ONE mean for a data set.

It uses EVERY data point; nothing is eliminated.

It doesn’t depend on chance or luck.

There are some equally important reasons to take the mean with a grain of salt:

It is heavily affected by outliers!

Recall the data on the number of pets owned by the 3307 population!

Problem CT1

An elevator in PGH is designed to carry a maximum load of 3,200 pounds. If it is loaded with 18 people with a mean weight of 166 pounds, is it in any danger of being overloaded?

Weighted Mean

Sometimes each data point is not “equal” in weight, meaning some have more importance than others. For example, in my Math 3379 class there are 4 papers; the first 3 are 10% of the grade and the fourth is a term paper worth 20% of the grade. In order to calculate a student’s average on this 50% of the course grade, I would take the 3 grades and TWICE the term paper grade and divide by 5. Note that you use “proportional” multiplication to even things up!

Problem CT2

Having received a bonus of $20,000 for accepting early retirement, a company’s sales representative invested $6,000 in a bond paying 3.75%, $10,000 in a mutual fund paying 3.96%, and $4,000 in a CD paying 3.25%. Find the weighted mean of these percentages.

Problem CT3

A lecturer counts the final exam in a course 4 times as much as each of the 3 small exams during the semester. Which of the following students has the higher average?

| |Test 1 |Test 2 |Test 3 |Final |

|Mikey |72 |80 |65 |82 |

|Lizbeth |81 |87 |75 |78 |

Problem CT4

A home appliance store has the following inventory:

|Refrigerator |# in stock |Size in CuFt |Price - $ |

|A |18 |15 |416 |

|B |12 |21 |549 |

|C |9 |19 |649 |

|D |14 |21 |716 |

|E |25 |24 |799 |

A. What is the average size of these refrigerators?

B. What will the average income per unit if they sell them all?

C. What is the average price for a refrigerator?

Another measure of central tendency is the Median:

The median is the value that is at the numerical middle of the data if there are an odd number of data points and they are arranged in order by size. It is the mean of the 2 middle data points if the number of data points is even and arranged in order by size.

The formula for finding the location of the median for n data points is

0.5(n + 1). The process is to order the data and then find the measurement at that location.

Problem CT5

In golf the holes are rated for a recommended number of strokes needed to sink the golf ball into the hole. A score of par means the golfer used the recommended number, a birdie is one fewer than recommended, a bogey is one more than the recommended number, an eagle is 2 fewer strokes.

At a recent televised tournament, 7 golfers had the following scores, ranked alphabetically by last name: par, birdie, par, par, birdie, bogey, and eagle.

What was the median score?

Problem CT6

Find the median location for

A. n = 19

B. n = 52

The final measure of central tendency is the Mode. This is the number that occurs most frequently in a data set.

Problem CT7

What is the mode for the data in Problem CT5?

The Mode is the measurement in the data set that occurs most often.

Problem CT8

Which of the following bars shows the mode in this histogram?

[pic]

Relationships among Mean, Median, and Mode:

Problem CT 9

|x axis |STTR |STTL |Symm |

|1 |1 |1 |1 |

|2 |2 |2 |2 |

|3 |4 |3 |3 |

|4 |5 |4 |4 |

|5 |4 |5 |5 |

|6 |3 |6 |5 |

|7 |2 |8 |4 |

|8 |2 |5 |3 |

|9 |1 |4 |2 |

|10 |1 |3 |1 |

Calculate mean, median, and mode for these 3 charts. Mark on the x-axis where each goes.

[pic]

[pic]

[pic]

Summarize your results with a mnemonic device.

Which measurement is most sensitive to outliers? Mean or Median?

What does it mean to say “most sensitive”

Discuss this idea using the salaries of baseball players.

Problem CT 10

The data shown in the table are the median prices of existing homes in the USA from 1981 through 1986. If the average prices of existing homes were calculated for each of these years, how do you think these values would compare to the median prices shown?

Would the average price be higher, lower, or the same?

|Year |Median |

|1981 |66,460 |

|1982 |67,800 |

|1983 |70,300 |

|1984 |72,400 |

|1985 |75,500 |

|1986 |80,300 |

Problem CT 11

|Car A |27.9 |30.4 |30.6 |31.4 |31.7 |

|Car B |31.2 |28.7 |31.3 |28.7 |31.3 |

|Car C |28.6 |29.1 |28.5 |32.1 |29.7 |

Above is mileage data from 3 compact cars on 5 trials each. Each car was manufactured by a different car company.

If the manufacturers of Car A want to advertise how fuel efficient their car is, what statistics might they use to substantiate their claim?

If the manufacturers of Car B want to advertise how fule efficient their car is, what

statistics might they use to substantiate their claim?

What about the maker of Car C?

Measures of Variability

A measure of variability is a number that describes the spread or the variety of measurements in a data set.

The range of a data set is equal to the largest measurement minus the smallest measurement.

The sample variance is calculated with the following formula for n data points:

[pic]

First calculate the sample mean,

then subtract the mean from each measurement individually and

square the answer.

Add up all the squares and divide by n ( 1.

The standard deviation for a set of data is the square root of the variance: s.

Problem MV 1

Calculate the mean for each sample below. Calculate the variance for each sample.

Discuss the information available in the variance.

[pic]

[pic]

Problem MV 2

Here is a data set: (8, 2, 2, 7, 4, 6, 5, 3, 4)

Describe this data set using mean, median, mode, range, and standard deviation.

Problem MV 3

Three sets of data are shown below. What are the number of data points in each set? What is the mean for each set (do this WITHOUT a calculator!). Rank the sets from the most variable to the least variable and tell why you made those choices. (again: calculator free).

Hint: use the formula for variance to help you reason it out!

[pic]

[pic]

[pic]

[pic]

Problem MV 4

Consider the following 2 samples:

Sample A: 10, 0, 1, 9, 10, 0

Sample B: 0, 5, 10, 5, 5, 5

Describe these data sets using mean, median, mode, range, and variance. What statistics are the same and what statistics are different. Which data set is the more variable and why? Which is the better predictor of variability: range or variance?

Grouped Data for Variance calculations

If f is the frequency of a data measurement, then the following formula calculates the variance for the data:

[pic]

Problem MV 5

The data in the following table are for the inner diameters of some tubes manufactured by a machine. This table is called a “distribution” because it gives the values and their frequency. Find the mean diameter and the variance for the tubes.

|D, inches |frequency |

|2.0 |2 |

|2.2 |4 |

|2.3 |6 |

|2.8 |3 |

|3.0 |5 |

Problem MV 7

The following table is a distribution of the top speeds in mph at which 30 racers were clocked in an auto race. Find the mean and variance for the race.

| Top Speed |Number of racers |

|145 |9 |

|150 |8 |

|160 |11 |

|170 |2 |

Fractiles and Percentages

A fractile ranking means that a given number of measurements lie below the given measurement and a given number above.

Suppose your child comes home to tell you that she’s in the 90th percentile of her class on a particular test. This means that 90% of the children have lower scores or the same score as she does and 10% have higher scores. You do need to be a little careful with these measurements of relative ranking, though. It could be that 91% of the children failed the test and 9% passed. In this scenario, of course, being in the 90% percentile isn’t much to brag about. You need absolute measures AND relative measures to evaluate a situation about fractiles.

Deciles divide the measurements into 10ths and quartiles divide the measurements into quarters. The median is both a decile and a quartile ranking.

Let’s look at quartiles:

Q1 is the median of all measurements less than the median of the data set.

Q3 is the median of all measurements greater than the median of the data set.

Problem FP 1

The 21 meetings of the West U Orchid Breeders club had the following attendances:

22, 24, 23, 24, 27, 25, 24, 19, 24, 26, 28, 32, 21, 24, 25, 23, 26, 25, 18, 24

Find all 3 measure of central tendency, Q1, Q3, and the standard deviation for the data set.

Problem FP 2

Find the positions of the median, Q1, and Q3 for

A. n = 32

B. n = 35

Problem FP 3

The following numbers are weekly lumber production (in million board feet) for a company in Oregon. Find the first quartile and the 90th percentile for the data.

390 406 447 410 370 338 410 320 359 392 315 480

Percentage change in a measurement:

The percent change in a measurement is often of interest to managers, doctors, and teachers. It is used as a measure of efficacy.

The calculation is

[pic]

Suppose you have a student who was reading poorly – 15 words a minute. You train the student using your favorite method and test him again to find him reading 27 words a minute. The percent change is

[pic] which is 80%.

You would then report an 80% improvement in speed.

Problem PC 1

You’ve been looking at a sweater in the store but it costs $135 and that’s too much. BUT one day you go and check and it’s been marked down to $65…what is the percent change?

Problem PC2

A student has been working with a tutor on his math skills. His weekly quiz average was a 65% when he started with the help program.

His quizzes are 30 points each. During the program his weekly grades are

20, 23, 21, 28, 27, 29

What is the percent change in his average? Would you say that the tutoring helped?

Z score

Z scores are used on data that are collected from populations that have a normal distribution for the property under scrutiny. A z-score tells you how far from the mean a particular measurement is. A z-score is calculated with the following formula:

[pic]

where x is a particular data measurement and the other 2 symbols stand for the mean and standard deviation of a particular population. Note that standard deviation is the square root of variance.

Sketch a normal distribution here:

The Empirical Rule for normally distributed data:

Approximately 68.3 percent of the observations will fall within one standard deviation of the mean ( [pic]). Approximately 95.4 percent of the observations will fall within 2 standard deviations of the mean. Approximately 99.7 percent of the observations will fall within 3 standard deviations of the mean.

A rough estimate of the range is the mean +/( 3 standard deviations. Why is this true?

ZS Problem 1

If you have 2 students applying for entrance to a G&T program and you have room for only one, which one will you pick based on the following test information?

Gina got a 78 on a test with an average of 72 and a standard deviation of 5.

Mike got an 87 on a test with an average of 85 and standard deviation 1.5.

Who is the stronger student and how do you know?

ZS Problem 2

Given the following distribution

|Measurement |number |

|1 |0 |

|2 |3 |

|3 |1 |

|4 |5 |

|5 |2 |

|6 |7 |

|7 |5 |

|8 |6 |

|9 |3 |

|10 |0 |

|11 |1 |

|12 |0 |

|13 |2 |

Discuss the measures of central tendency

• mean

• median

• mode

the measures of variability

• range

• variance

• standard deviation

and give the z score for the measurement 7.

Verify the Empirical Rule by making a dot or bar chart of the data and marking off where each of the standard deviations from the mean are. (s, 2s, 3s)

ZS Problem 3

The mean salary of the employees at a high school in Missouri is $28, 500 with a standard deviation of $2,100.

Discuss the Empirical Rule and who might fit where on a bar chart of employee salaries.

The state announces a flat raise of $500 per employee for the next year. Find the mean and standard deviation of the new salaries.

Who will benefit the most in a percentage change analysis?

ZS Problem 4

Given that the mean is 90 and the standard deviation is 1.4 give the numbers of the 2,000 data points that should be within 1, 2, and 3 standard deviations of the mean. Then count the numbers that actually ARE within these bounds.

|Value |Frequency |

|0 |1 |

|1 |2 |

|2 |4 |

|3 |8 |

|4 |20 |

|5 |35 |

|6 |60 |

|7 |120 |

|8 |25 |

|9 |500 |

|10 |1000 |

ZS Problem 5

For 50 days, the number of vehicles using a particular road was tracked by a city engineer. She found that the mean was 385 and the standard deviation was 15 vehicles. Suppose you are interested in opening a franchise shop along the road and you know you need traffic between 340 and 430 cars per day to be successful. How many days have this much traffic? Is this a good location or a marginal location?

|Country |In operation |Under construction |

|[pic] | | |

| |Number |Electr. net output |

| | |MW |

|Single |41.8 |22.6 |

|Married |113.3 |61.1 |

|Widowed |13.9 |7.5 |

|Divorced |16.3 |8.8 |

There are a couple of ways to display this information graphically. One is a histogram or bar chart and another is a pie chart.

Pie chart

Histogram

| |

Why was it important to use the percentages and not the raw counts?

Charts Problem 1

Here’s some 2000 Census Data – percent of the population by state. Note that it is not quite strictly descending order – the data was in descending order during the 1990 census and when I cut out the intervening years – since some states have lower percentages than in 1990, they got “out of order”

How would you display this data in a small box in the middle of a report? You don’t want visual distortion; you do want to avoid a histogram with 51 bars, though!

|  |  |April 1, |  |

| | | | |

| | | | |

| | | | |

|[pic][pic][pic] | | | |

| |  |2000 |  |

|  |United States |281,421,906 |% |

|  |  |  |  |

|1  |California |33,871,648 |11.04 |

|2  |Texas |20,851,820 |7.41 |

|3  |New York |18,976,457 |6.74 |

|4  |Florida |15,982,378 |5.68 |

|5  |Illinois |12,419,293 |4.41 |

|6  |Pennsylvania |12,281,054 |4.36 |

|7  |Ohio |11,353,140 |4.03 |

|8  |Michigan |9,938,444 |3.53 |

|9  |Georgia |8,186,453 |2.91 |

|10  |North Carolina |8,049,313 |2.86 |

|11  |New Jersey |8,414,350 |2.99 |

|12  |Virginia |7,078,515 |2.52 |

|13  |Washington |5,894,121 |2.09 |

|14  |Arizona |5,130,632 |1.82 |

|15  |Massachusetts |6,349,097 |2.26 |

|16  |Indiana |6,080,485 |2.16 |

|17  |Tennessee |5,689,283 |2.02 |

|18  |Missouri |5,595,211 |1.99 |

|19  |Maryland |5,296,486 |1.88 |

|20  |Wisconsin |5,363,675 |1.91 |

|21  |Minnesota |4,919,479 |1.75 |

|22  |Colorado |4,301,261 |1.53 |

|23  |Alabama |4,447,100 |1.58 |

|24  |South Carolina |4,012,012 |1.43 |

|25  |Louisiana |4,468,976 |1.59 |

|26  |Kentucky |4,041,769 |1.44 |

|27  |Oregon |3,421,399 |1.22 |

|28  |Oklahoma |3,450,654 |1.23 |

|29  |Connecticut |3,405,565 |1.21 |

|30  |Iowa |2,926,324 |1.04 |

|31  |Mississippi |2,844,658 |1.01 |

|32  |Arkansas |2,673,400 |0.95 |

|33  |Kansas |2,688,418 |0.96 |

|34  |Utah |2,233,169 |0.79 |

|35  |Nevada |1,998,257 |0.71 |

|36  |New Mexico |1,819,046 |0.65 |

|37  |West Virginia |1,808,344 |0.64 |

|38  |Nebraska |1,711,263 |0.61 |

|39  |Idaho |1,293,953 |0.46 |

|40  |New Hampshire |1,235,786 |0.44 |

|41  |Maine |1,274,923 |0.45 |

|42  |Hawaii |1,211,537 |0.43 |

|43  |Rhode Island |1,048,319 |0.37 |

|44  |Montana |902,195 |0.32 |

|45  |Delaware |783,600 |0.28 |

|46  |South Dakota |754,844 |0.27 |

|47  |Alaska |626,932 |0.22 |

|48  |North Dakota |642,200 |0.23 |

|49  |Vermont |608,827 |0.22 |

|50  |District of Columbia |572,059 |0.20 |

|51  |Wyoming |493,782 |0.18 |

|  |  |  |0.00 |

|  |  |  |0.00 |

|  |Puerto Rico |3,808,610 |1.35 |

Charts Problem 2

| |United States |

| | |

| | |

| |AGE DISTRIBUTION |

| |[pic] |

| | |

| | |

| | |

| |When drawn as a "population pyramid," age distribution can hint at patterns of growth. |

| |A top heavy pyramid, like the one for Grant County, North Dakota, suggests negative population |

| |growth that might be due to any number of factors, including high death rates, low birth rates, |

| |and increased emigration from the area. |

| |A bottom heavy pyramid, like the one drawn for Orange County, Florida, suggests high birthrates, |

| |falling or stable death rates, and the potential for rapid population growth. |

| |But most areas fall somewhere between these two extremes and have a population pyramid |

| |that resembles a square, indicating slow and sustained growth with the birth rate exceeding |

| |the death rate, though not by a great margin. |

Discuss this representation of ages from the census 10 years ago. What kind of difficulties did the authors overcome with this particular version of a histogram?

What kinds of ancillary information can be drawn from this data?

Charts Problem 3

Although there have been advances in medical technology and donation, the demand for organ, eye and tissue donation still vastly exceeds the number of donors. More than 100,000 men, women and children currently need life-saving organ transplants.

• Every 10 minutes another name is added to the national organ transplant waiting list.

• An average of 18 people die each day from the lack of available organs for transplant.

• In 2009, there were 8,021 deceased organ donors and 6,610 living organ donors resulting in 28,465 organ transplants.

• Last year, more than 42,000 grafts were made available for transplant by eye banks within the United States.

• According to research, 98% of all adults have heard about organ donation and 86% have heard of tissue donation.

• 90% of Americans say they support donation, but only 30% know the essential steps to take to be a donor.

Statistics

110,541 Patients Waiting*

60,758 Multicultural Patients*

1,785 Pediatric Patients*

28,663 Organ Transplants Performed in 2010

14,502 Organ Donors in 2010

|Waiting list candidates as of 5pm 6/13/11 |

|All [pic] |111,671 |

|Kidney |89,060 |

|Pancreas |1,369 |

|Kidney/Pancreas |2,191 |

|Liver |16,291 |

|Intestine |266 |

|Heart |3,178 |

|Lung |1,770 |

|Heart/Lung |66 |

|[pic]All candidates will be less than the sum due to candidates waiting for multiple organs |

|Transplants performed January - March 2011 |

|Total |6,709 |

|Deceased Donor |5,276 |

|Living Donor |1,433 |

|Based on OPTN data as of 06/03/2011 |

|Donors recovered January - March 2011 |

|Total |3,346 |

|Deceased Donor |1,921 |

|Living Donor |1,425 |

|Based on OPTN data as of 06/03/2011 |

Let’s try to think of a more compelling way to present this data. How would you arrange this information in a more visual style?

Presentation:

Charts Problem 4

Fifty-four candidates entering an astronaut training program were given a psychological profile test measuring bravery. NASA grouped the data to make it more compact.

Note that the scores are grouped into units of the SAME length. Why is this important?

Would you present this as a pie chart?

A dot diagram?

A bar chart or histogram?

| | |

|Score in points |# of candidates |

| | |

|60 - 79 |8 |

| | |

|80 - 99 |16 |

| | |

|100 - 119 |18 |

| | |

|120 - 139 |8 |

| | |

|140 - 159 |6 |

What do you think about the extreme values on the results?

C. Stem and Leaf Plots

An improvement on dot diagrams, stem and leaf plots work on data with many various measurements. It is fairly low tech and can be quickly done in a meeting or on the fly. I find them exceptionally useful in small classes (n < 50) for a quick grade analysis.

The stems are the 10’s and the leaves are the single digits in each day’s total. It can be useful to organize the leaves in order, too.

Here is one of my classes, a final:

[pic]

Turn the page sideways (clockwise)…note the resemblance to a dot diagram! What does this tell you about my class?

Note that in each case, there was somebody pretty close to the next level.

What grade is “BELOW”?

Sometimes if the data is unusually condensed, you might split the stems making more rows rather than fewer rows.

Here are some quiz grades out of 130 points:

112 114 114 116 118 119 120 121 122 123 124 125 125 126 127 127 129

The best data presentation is to show 110 – 114, 115 – 119, 120 – 124, 125 – 129 rather than just 2 stems with LOOOOONG leaf lines:

[pic]

Note that the stems are now both a hundreds and a tens digit!

SL Problem 1 --

A hotel has 85 rooms. In February of last year they had the following rental statistics:

75 79 37 57 60 64 35 73 62 81 43 72 78 54 69 75 78 49 59 80 58 76 52 49 42 62 81 77

Produce a stem and leaf plot of this data.

SL Problem 2

The following weights are ounces packed in 30 one pound bags. Display the data and analyze the data.

15.6 15.9 16.2 16.0 15.6 15.9 16.0 15.6 15.6 16.0 1506 15.9 16.2 15.6 16.2

16.0 15.8 15.9 16.2 15.8 15.8 16.2 16.2 16.0 16.2 15.9 16.2 15.8 16.2 16.0

SL Problem 3

Decide which representation you’d like to use with this data to show the age of the presidents at inauguration.

Consider doing a time plot*, too. Are we electing younger people than earlier in our history?

How could you present the categorical data? Party affliation, home state, religion…

*a chronological presentation with time on the x axis.

Presidents

Find information about U.S. presidents, including party affiliation, term in office, age at inauguration, age at death, and more.

|  |

|age |freq |

|25 |1 |

|30 |9 |

|35 |11 |

|40 |11 |

|45 |10 |

|50 |40 |

|55 |37 |

|60 |33 |

|65 |20 |

|70 |15 |

|75 |4 |

|  |  |

|  |191 |

| | |

|Sierra Leone: 29.5 |

|Japan: 73.8 |

The UN calculated the Disability Adjusted Life Expectancy for citizens of the 191 member countries in 1999. The table above is their findings.

Five number summary:

Max

Q3

Median

Q1

Min

IQR: interquartile range (upper quartile – lower quartile)

Upper fence: 1.5 IQR above Q3;

Lower fence: 1.5 IQR below Q2.

[never really show these in your box plot]

Box: Q3, Median, Q2. Make a rectangle; any width works fine

Whiskers: lines to the most extreme value inside the fences, top with horizontal “stop”

Show asterisks as outlier values.

Vertical display:

Horizontal display under histogram:

BW01

Here is some pre-lesson grades (10 points) plotted with a “double stem”…each 10’s category is broken into 2 parts: 0 – 4 and 5 - 9

3. 57

4. 0023

4 5666899

5 234

5 56789

6 1224

6 9

7 23

7 8

8 1

Recreate the first 4 measurements from either end:

Find Q1 and Q3 and the Median

Find the “fences”

Do a horizontal box and whisker plot. Are there any outliers in this data?

BW02

Comparing groups with box and whisker plots.

A student designed an experiment to test the efficiency of 4 coffee containers from different manufacturers by pouring coffee at 180( into each container and then measuring the temperature difference after 30 minutes. She did the experiment 5 times – using different cups of the same type each time (she didn’t reuse any of the cups). So she used 20 cups total, 5 from each manufacturer.

The 5 number summary average temperature differences are in the table below

| |Min |Q1 |Median |Q3 |Max |IQR |

|Cup 1 |6(F |6 |8.33 |14.25 |18.5 |8.25 |

|Cup 2 |0(F |1 |2 |4.5 |7 |3.5 |

|Cup 3 |9(F |11.5 |14.25 |21.75 |24.5 |10.25 |

|Cup 4 |6(F |6.50 |8.50 |14.25 |17.5 |7.75 |

Using VERTICAL box and whisker diagrams and a vertical axis of Temperature Change,

Compare the data. Which cup has the best heat retention property?

Scatter Plots, Time Plots, and Line Plots

Plotting: Problem 1

The Bureau of Labor Statistics tracks the buying power of our currency by using a fixed basket of goods and services. It prices the items and records how much the same items cost over time. The base period is the average cost of the basket for some given period of time of the time series. The base period for the following data is 1982 – 1984 – the basket costs about $100 for the time period.

1970 $39

1973 $44

1976 $57

1979 $73

1982 $97

1985 $108

1988 $118

Plot the data and see if you can discuss both trend and rate of change of the purchasing power of $100.

Plotting: Problem 2

Deaths from cancer:

1940 120 per 100,000 people

1945 137

1950 140

1955 147

1960 148

1965 153

1970 160

1975 170

1980 182

1985 190

1990 201

Plot the data and think critically about whether cancer is getting much, much more prevalent or if there’s something else going on socially, too!

Plotting: Problem 3

Sometimes 2 different views can each provide information for decisions.

Here are 20 measurements, taken over 20 hours IN ORDER. They measure the tension on a wire grid behind an electronic display. If the tension is too high or too low, the display quits working for safety reasons.

265.5 297.0 269.6 283.3 304.8 280.4 283.5 257.4 317.5 327.4 264.7 307.7

310.0 343.3 328.1 342.6 338.8 340.1 374.6 336.1

Make a stem plot using 2 digits for the stem PLUS make a time plot. What items of interest to the managers do you see in EACH display. Describe the distributions and what the management might need to do.

Scatter plots

Here is some data taken after an airport opened near a neighborhood. The first column is the number of weeks since the airport opened and the second column is the sound frequency range to which the person’s hearing will respond.

|Weeks |Range |

|47 |15.1 |

|56 |14.1 |

|116 |13.2 |

|178 |12.7 |

|19 |14.6 |

|75 |13.8 |

|160 |11.9 |

|31 |14.8 |

|12 |15.3 |

|164 |12.6 |

|43 |14.7 |

|74 |14.0 |

| | |

|x |Formula |

|47 |14.4775 |

|56 |14.32 |

|116 |13.27 |

|178 |12.185 |

|19 |14.9675 |

|75 |13.9875 |

|160 |12.5 |

|31 |14.7575 |

|12 |15.09 |

|164 |12.43 |

|43 |14.5475 |

|74 |14.005 |

We’ll graph these with the vertical axis going from 12 to 15 and the horizontal axis going from 0 to 200.

Linear regression line: y = (0.0175x +15.3

If you plot this line THROUGH your scatter plot it will be APPROXIMATELY the line the data points are going in. (these points are on the right in the table).

Naturally the data points will be off the plotted line. How much off is recorded in a statistic called the “r”, regression coefficient. An r of 0 is very bad – your data is basically a cloud. An 4 of 1 is PERFECT, every point is on the line.

The r for this data is (.88. A negative slope to the line and the points are quite close to the line.

[pic]

[pic]

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download