Stats - DePaul University



ISP 121, Winter, 2007

Section 201 (TTh, 10:10 – 11:40)

Section 202 (TTh, 11:50 – 1:20)

Activity 5: Descriptive Statistics

Answer the questions given here in a Word document. E-mail the document to me at the end of class.

1.  From the QRC data site, download and open the file AgeAtInauguration.xls, which lists the age at inauguration of every US President to date.  We want to summarize this data in a number of ways.  Along the way I will remind you of the Excel commands to produce the relevant summaries.  Remember that you really don't need to memorize the names of Excel functions.  They can always be accessed through the paste function button.

a. Who was the oldest president at inauguration? Who was the youngest?  (Hey! I always thought John F. Kennedy was the youngest president.  Can you figure out what's going on here?)

b. Calculate the mean (average) age at inauguration. (To calculate the average of a data series, use the command =AVERAGE(......).)

c. Calculate the median age at inauguration.  (To calculate the average of a data series, use the command =MEDIAN(......).)

d. How does our current president, George W. Bush, compare to the average?

e. How did the previous president, Bill Clinton, compare to the average?

f. Just as we did in d. and e., it is often interesting to know where a particular data point in a dataset lies compared to the others. (You might be too well acquainted with this phenomenon in standardized tests.)  A useful tool for calculating the position of a datapoint in a data is the percent rank.  This number tells you approximately how many percent of the data is less than the datapoint.   The syntax for this command is =PERCENTRANK(dataseries, datavalue).  Using this command, find the percentage of presidents whose age at inauguration was younger than Bill Clinton's.

g. Find the percentage of presidents who were older than George H. W. Bush (that is the elder one) when inaugurated.

h. What is the percent rank of the median?

i. Recently we had two of the older presidents (Ronald Reagan, the oldest in history, and George H. W. Bush) but we have also had two of the youngest (John F. Kennedy and Bill Clinton).  Using this data, investigate the question whether presidents inaugurated since 1950 are on average older or younger than the presidents inaugurated before 1950.  Briefly explain your methodology.

2.

a. Can it happen in a dataset that almost every data point is above the average?  Explain why or why not.  If it can, make up an example.

b. Can it happen in a dataset that almost every data point is above the median?  Explain why or why not.  If it can, make up an example.

3. From the QRC site, download and open the file ChicagoBulls1996-97.xls which contains the salaries of the Chicago Bulls players at the start of the 1996-97 season. 

a. Calculate the mean and median salary and include it in your Word document.

b. Suppose Michael Jordan had been paid 60 million dollars instead of 30 million.  What would the mean have been in that situation?  What would the median have been in that situation?  (If you used Excel to do a, all you have to do is type in 60000000 in place of 30140000 and everything will update automatically.)

c. Suppose Michael Jordan had been paid 500 million dollars instead of 30 million.  What would the mean have been in that situation?  What would the median have been in that situation?

Because of the property demonstrated in b and c, the median is called a resistant measure because it is not so sensitive to extreme outliers.  Generally, the median is a more realistic measure of the center of a dataset, but it is not always the most useful.  If the distribution of the data is relatively symmetric, then the mean and the median will be close to each other.

[pic]

Copied from .

4. Download and open the file OldFaithful.xls which contains data on the Old Faithful geyser in Yellowstone National Park and pictured above.  When this data was collected, the geyser erupted about every hour with some consistency, hence its name.  (It is now erupting about every 1.5 hours.) The file contains data on the length of the eruption and the interval between eruptions.

a. What is the mean interval between eruptions?

b. Give the five number summary (min, first quartile, median, third quartile, max) for the interval between eruptions.

c. Make a frequency distribution of the interval data by using Excel's histogram tool.  Here is how. Go to Tools->Data Analysis->Histogram and click ok. (If Data Analysis does not appear on this menu, go to Tools ( Add-Ins, select Analysis ToolPak and click OK. Now if you go back to Tools, Data Analysis should be showing.) (If this doesn’t work on your machine, then use the Chart Wizard to create a column or bar chart.) You will get a window that looks like:

[pic]

Fill it in so that it looks like

[pic]

Then click OK. Fix the graph up a bit (delete the legend, add a title), then paste it into your Word document.

d. Describe the distribution of the intervals between eruptions.

e. Do the standard measures (means, medians, standard deviations, quartiles) adequately describe this data?

f. What advice would you give visitors to Yellowstone National Park about Old Faithful based on the data you looked at here?

5. The histogram you created in the previous step can be improved upon.

a. Go back into the Old Faithful spreadsheet and re-select Tools / Data Analysis / Histogram and click OK. Enter the proper data range in the Input Range box. For now, leave the Bin Range box blank. Make sure the Chart Output box is clicked, and click OK. To make your chart really look like a histogram, you must double click on one of the bars on the chart, go to the options tab, and set the gap width to 0. While you are there, delete the “Frequency” legend.

b. To get control of the bins, you have to set them up in a column. In column D on your original sheet, type 40 in cell D8, 45 in cell D9, 50 in cell D10, etc., up to 110. (There is an easy way of doing this if you use a formula.) Repeat the process of creating the histogram: enter the proper data range, check Chart Output box, and this time in the Bin Range box enter the cell range of your 40 – 110 values (should be D8 to D22). Make the histogram, set the gap width to 0, and delete the legend. Paste this copy of the histogram into your Word document.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download