M316 – Kaplan



Elementary Statistical Methods

Kaplan – Spring 2004

Project Expectations, Assignments and Due Date

The purpose of these projects is for students to become involved in “doing statistics” and to learn to write using statistical language. In the completion of these assignments, you will generate, organize and analyze data and then use your analysis to make conclusions.

You and your partner will complete four mini-projects during the semester. While both students will be responsible for the collection, organization, analysis and conclusions of each of the four projects, each student will be responsible for writing 2 of the 4 projects. You will earn points the statistics work (which will be the same for both students) and points for the writing of the 2 projects for which you were responsible for the writing.

In the writing of the project, you should explain your project’s objectives, how you obtained your data, the inferences you draw from your data and any reservations you have about your conclusions. It should be fair, honest and interesting – the kind of report you would find informative and enjoy reading.

Any relevant data that you use should be included as an appendix to your report. This appendix may be handwritten as long as it is clear, clean and readable. Data that are used in the body of your report should be presented clearly and effectively in tables or graphs.

The body of your report must be typed and should use clear, concise, persuasive prose. It should be long enough to make the points you are trying to make, but not so long that the reader becomes bored. Not counting appendices, tables and graphs, I would be surprised if your report has fewer than 3 or more than 5 pages.

The four categories for the projects are: displays, regression, confidence intervals, and hypothesis testing. Below are due dates and suggested topics from which you may choose. If you wish to investigate another topic in a category, you must propose your topic in writing to the instructor at least two weeks before the project is due.

Project Suggestions:

1. Creating displays – Due Friday, February 6

a. Go to a local grocery store and collect data for at least 60 breakfast cereals: cereal name, grams of sugar per serving and shelf location (bottom, middle, top). Group the data by shelf location and use boxplots to compare the sugar content by shelf location.

b. Go to a local grocery store or drug store and collect data for at least 60 shampoos: shampoo name, price per ounce and shelf location (bottom, middle, top). Group the data by shelf location and use boxplots to compare the price per ounce by shelf location.

c. Use a computer to simulate 1,000 flips of a fair coin. Record the fraction of the flips that were heads after 10, 100, 1000 flips. Repeat this experiment 100 times and then use three histograms to summarize your results.

2. Simple regression – Due Friday, February 27

a. For each of the 50 states, calculate Bill Clinton’s percentage of the total votes cast for the Democratic and Republican presidential candidates in 1992. Do not include votes for other (independent, third party) candidates. Do the same for the 1996 election. Is there a statistical relationship between these two sets of data? Are there any apparent outliers or anomalies?

b. Select an automobile model and year (at least three years old) that is of interest to you -- for example, a 1993 Saab 900S convertible. Now find at least 30 of those cars that are for sale (either from dealers or private owners) and record the odometer mileage (x) and the asking price (y). As best as you can, try to keep the cars as similar as possible. For example, ignore color, but do not mix together 4-cylinder and 6-cylinder cars or manual and automatic transmissions. Estimate the equation y = a + bx + e and summarize your results.

c. Pick a date and approximate time of day (for example 10:00 am on April 1) for scheduling non-stop flights from an airport near you to at least a dozen large U.S. cities. Determine the cost of a coach seat on each of these flights and the distance covered by each flight. Use your data to estimate a simple linear regression model with ticket cost the dependent variable and distance the explanatory variable. Are there any outliers?

3. Confidence Intervals – Due Wednesday, April 7

a. Estimate the average number of hours that students at this school sleep each day, including both nighttime sleep and daytime naps. Also estimate the percentage who have been up all night without sleeping at least once during the current semester.

b. Estimate and compare the average words per sentence in People, Time and New Republic magazines.

c. Estimate the percentage of the seniors at this college who regularly read a daily newspaper, the percentage who can name the two U.S. senators from their home state, the percentage who are registered to vote and the percentage who would certainly vote if a presidential election were held today.

d. Estimate the percent of female seniors at this University who expect to be married within five years of graduation and who expect to have children within five years of graduation. Estimate the average number of biological children that female seniors at this University expect to have in their lifetime.

4. Hypothesis testing – Due Friday, April 30

a. Conduct a taste test of either Coke versus Pepsi or Diet Coke versus Diet Pepsi. Survey at least 50 randomly selected students who identify themselves beforehand as cola drinkers with a definite preference for one of the brands of cola you are testing. Give each student a cup with each cola that has been coded in a way known only to you. Calculate the fraction of your sample who choice in that taste test matches the brand identified beforehand as their favorite. (Do not tell your subjects that this is a test of their ability to identify their favorite brand; tell them it is a test of which tastes better.) Determine the two-sided p-value for a test of the null hypothesis that there is a 0.5 probability that a cola drinker will choose his or her favorite brand.

b. Find five avid basketball players and ask each of them to shoot 100 free throws. Do not tell them the purpose of this exercise, which is to determine if a missed free throw is equally likely to bounce to the same or opposite side as their shooting hand. Use your data for each of these players to calculate the two-sided p-value for testing the null hypothesis that a missed free throw by this player is equally likely to bounce to either side.

c. Ask 50 female students these four questions: Among female students at this college, is your height above average or below average? Is your weight above average or below average? Is your intelligence above average or below average? Is your physical attractiveness above average or below average? Ask 50 male students the same questions (in comparison to male students at this college). Try to design a survey procedure that will ensure candid answers. For each gender and each question, test the null hypothesis that p = 0.5.

d. College students are said to experience the Frosh 15 – an average weight gain of 15 pounds during their first year at college. Test this folklore by asking at least 100 randomly selected students how much weight they gained or lost during their first year in college. Determine the two-sided p-value for testing the null hypothesis that the population mean is a 15-pound gain, and also determine a 95 percent confidence interval for the population mean.

Elementary Statistical Methods

Kaplan – Spring 2004

Grading Rubric for Project 1

Statistics: 40 points divided into 4 parts of 10 points each. The following outline gives general guidelines for the assignment of partial credit.

A. Generating Data

a. 5 points – it is clear that data was collected but reader had questions about how the data was collected and/or what data was collected

b. 8 points – data was collected in an organized manner, reasonable for the task and it is clear to the reader both what data was collected and how it was collected.

c. 10 points – data was collected in a manner that employed some type of randomization

NOTE: In all subsequent projects, it is expected that the ideas about generating data contained in chapters 7 and 8 of the text will be used and discussed for 8-point credit.

B. Organizing the Data

a. 5 points – A graph or table of the data exists

b. 8 points – The graph allows the reader to easily compare the three (or more) distributions.

c. 10 points – The graphs or tables show an additional piece of information helpful to understanding the data.

C. Analysis

a. 2 points each for correct discussion of shape, center and spread

b. 2 points for using the appropriate set of measures

c. 2 points for discussing correctly the rationale behind the choice of summary statistics

D. Conclusions

a. 3 points for statement of hypotheses

b. 4 points for a list of statistics in comparisons

c. 3 points for comparative writing that incorporated the data and/or summary statistics

Writing (20 points total): In general, 2 points were deducted for each “issue.”

• Data is a plural noun.

• There is a difference between the assignment and the objective. In this case, the objective was to compare the distribution of three or more sets of data.

• Integrate graphics and tables in writing; put raw data in appendix.

• Please use the spellchecker and proofread.

• One goal of the writing assignments is to learn to integrate data into the narrative and to write data driven conclusions.

• Do not use binders that make it difficult for me to write on the pages or keep the report together.

• Be clear about who should get credit for the writing points.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download