Group Projects on Estimation and Hypothesis Testing



Projects on Estimation and Hypothesis Testing

General Guidelines

This assignment is due Wednesday, December 7th at the beginning of class. Papers that are late or not stapled will not be accepted. Leave time for unforeseen emergencies.

Standard Operating Procedures (SOP) for Excel Printouts

1. Turn gridlines on and turn labels on. Your resulting printout should then be gridded, and each row and column should appear with a numerical or letter label. You can find these options on the File/Page Setup menu by clicking on the Sheet tab that appears.

2. Adjust the column widths of your work so that the columns are wide enough to contain the data, yet fit well on the page. When printing, look at both landscape and portrait orientation for your printout, and choose the one that fits best. Your problem must fit on a single page. You can make your data fit on one page by checking the “fit to 1 page wide by 1 page high” box. If you do this, make sure that you haven’t made so many columns (or rows) that the printing is too tiny to read!

3. Print two copies of your sheet: one with the numeric values, and one with the formulae that generated them. You can switch between these views by using the control-~ key combination. Note that you’ll probably need to resize your column widths when you display formulas. Failure to do so might cut some information off.

4. Clearly label the meanings of the data that you enter. If the population mean for the number of rabbits caught is 35 per week, indicate this (e.g., “population mean”, “mean number caught (pop)”, etc.). Make sure that it’s clear where your answer is. Although my templates are more detailed than you will need for this work, they can give you some ideas for organization.

5. Any graphs in your work should be properly identified and labeled. This includes title, axis labels, and (if more than one series is in the graph) a sensible legend.

Project #1: The Name Game

Go to the website . Scroll to the bottom of the page and select the top 1000 baby names for people born in the same decade that you were born. Find your own first name on this list. (If your name is not on the list, you cannot do this particular project.) Read the information given above the data, as it is relevant to your work.

Your tasks are as follows:

1. Define your “cohort” as the set of all Americans of your gender who were born in the same decade as you were. Estimate the fraction of your cohort who share your first name based on the data at the website. (Note that this is an estimate, since not all of the people in your cohort had social security numbers, so not all of your cohort are included in the database.) Build the 99% confidence interval for the fraction of your cohort who share your first name.

2. At the 2% level of significance, determine if there is evidence that your name is less popular in the 1990s cohort than it was in your own cohort.

3. The website includes the top 10 names for all years from 1880 until 2004. Assume that this list is correct for the entire US population for those years. Generate a simple random sample of 25 years (be sure to do so appropriately; the RANDBETWEEN function could be useful here) and note the number of letters in the most popular girl’s name for each of those 25 years. Test the null hypothesis that the average number of letters in the most popular girl’s name during the period 1880 to 2004 is 4.5 letters long. Use a 10% level of significance.

The report should be professional in appearance, done with a word processor and Excel, and should include:

• A cover page including your name, your section, and the name of the project .

• The text of each of the tasks above, along with your answer for that task. This writeup of your answer to the task should include

o all relevant data raw data (such as the years and names selected for task 3).

o a check, where appropriate, of the assumptions needed to apply a statistical method. (You should conduct the statistical analysis even if the assumptions are not met, but should comment on this in your conclusions.)

o your calculations, shown on an Excel spreadsheet, with correct notation used. (You may write “x-bar” for sample mean, “sigma-sub-p” for the standard error of the proportion, and so on.) You may use my spreadsheets in your project whenever the hypothesis test you are conducting involves two populations. You may not use my spreadsheets in your project for confidence intervals or for hypothesis tests involving a single population. Follow the on the first page for submission guidelines for Excel work. Note, in particular, that you will be giving me two copies of your spreadsheet: the numerical view and the formula view.

o your final conclusions, interpreting the results of your spreadsheet. Your conclusions should refer to the particular task at hand. Don’t just tell me, “We reject the null hypothesis.” What was the null, and what does it mean to reject it?

o A clear explanation of how you generated your simple random sample. Note that “making up some random years” is not satisfactory!

It’s up to you whether you print your spreadsheets on a separate sheet or imbed them in your Word document, but it’s probably easier for you to print them on separate sheets.

Project #2: Tornado!

Visit the website . Use the site to gather the data on all tornadoes of intensity F2 or higher that occurred in Oklahoma during the 25 year period from January 1, 1969 to December 31, 2003. (The site should return information on all 548 such tornados, including the number of people killed and injured in each of these storms.) You can move the data from this site to Excel by highlighting all of the data and doing a cut and paste command. While you will be using this data in your analysis below, you need not submit the raw data on all 548 storms with your report.

Your tasks are as follows:

1. Take a simple random sample of 50 storms selected from this population. Use the data from these 50 storms to build a 95% confidence interval for the average number of casualties per storm. (A person is a casualty if he or she is either injured or killed by the storm.) The RANDBETWEEN function in Excel can be useful in choosing this sample; don’t just “make up random numbers”!

2. Use all of the population data (all 548 storms) to determine the actual average number of casualties per storm from this population. Compare the result of this calculation with the confidence interval you constructed, and discuss your findings. If your confidence interval failed, suggest likely reason.

3. Using the same sample of 50 storms obtained above, conduct the appropriate hypothesis test to see if there is evidence that someone was hurt or killed in at least 40% of Oklahoma tornados of intensity F2 or higher during the period from 1969 to 2003. Use a 5% level of significance.

4. Suppose I wished to use the data from the 35 year period 1969 to 2003 to predict the number of casualties from tornadoes of strength F2 or higher in 2005. The results, even with a large sample size, are almost certain to show considerable inaccuracy. Discuss why this is so.

The report should be professional in appearance, done with a word processor and Excel, and should include:

• A cover page including your name, your section, and the name of the project .

• The text of each of the tasks above, along with your answer for that task. This writeup of your answer to the task should include

o all relevant data raw data (such as the information on the 50 storms that constitute your sample). You need not include the population data on all 548 storms.

o a check, where appropriate, of the assumptions needed to apply a statistical method. (You should conduct the statistical analysis even if the assumptions are not met, but should comment on this in your conclusions.)

o your calculations, shown on an Excel spreadsheet, with correct notation used. (You may write “x-bar” for sample mean, “sigma-sub-p” for the standard error of the proportion, and so on.) You may not use my spreadsheets for this work. Follow the on the first page for submission guidelines for Excel work. Note, in particular, that you will be giving me two copies of your spreadsheet: the numerical view and the formula view.

o your final conclusions, interpreting the results of your spreadsheet. Your conclusions should refer to the particular task at hand. Don’t just tell me, “We reject the null hypothesis.” What was the null, and what does it mean to reject it?

o A clear explanation of how you generated your simple random sample.

It’s up to you whether you print your spreadsheets on a separate sheet or imbed them in your Word document, but it’s probably easier for you to print them on separate sheets.

Project #3: The Bible, Chapter and Verse

Get a copy of the Holy Bible, either the Revised Standard Version or the King James Version.

Your tasks are as follows:

1. Obtain a simple random sample of 50 Bible chapters, and note the number of verses in each of these chapters. (You may find Excel’s RANDBETWEEN function to be useful here.)

2. It may be tempting to create the simple random sample by randomly selecting a book of the Bible, then randomly selecting a chapter from that book, and repeating this process 50 times. This does not result in a simple random sample. Explain clearly why not.

3. Use this sample to estimate the average length (in verses) of chapters in the Bible. Your margin of error for this confidence interval should be +3 verses. What is the confidence level for the interval that you have constructed?

4. At the 10% level of significance, do you reject the null hypothesis that the average length of a Bible chapter is 21 verses? Base your conclusion on the 50 chapters in your simple random sample.

The report should be professional in appearance, done with a word processor and Excel, and should include:

• A cover page including your name, your section, and the name of the project .

• The text of each of the tasks above, along with your answer for that task. This writeup of your answer to the task should include

o all relevant data raw data (such as the information on the 50 chapters that constitute your sample).

o a check, where appropriate, of the assumptions needed to apply a statistical method. (You should conduct the statistical analysis even if the assumptions are not met, but should comment on this in your conclusions.)

o your calculations, shown on an Excel spreadsheet, with correct notation used. (You may write “x-bar” for sample mean, “sigma-sub-p” for the standard error of the proportion, and so on.) You may not use my spreadsheets for this work. Follow the on the first page for submission guidelines for Excel work. Note, in particular, that you will be giving me two copies of your spreadsheet: the numerical view and the formula view.

o your final conclusions, interpreting the results of your spreadsheet. Your conclusions should refer to the particular task at hand. Don’t just tell me, “We reject the null hypothesis.” What was the null, and what does it mean to reject it?

o A clear explanation of how you generated your simple random sample. This is crucial for this project.

It’s up to you whether you print your spreadsheets on a separate sheet or imbed them in your Word document, but it’s probably easier for you to print them on separate sheets.

Project #4: Counting Cards

We have a deck of 10 playing cards, numbered 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. (That is, there is one card with each number.) The cards are shuffled, then the cards are dealt face up onto the table, one at a time. When the first card is revealed, the player says “One”. When the second card is revealed, the player says “Two”. The player continues in this way until the last card is revealed and the player says “Ten”. If at any time the number on the card is the same as the number said by the player, the card is called a “match”. Here’s an example:

Cards dealt (in order): 5 1 2 3 8 4 10 6 9 7

This deal would give a match on “9”, since the ninth card revealed was a 9. It would not give a match on any other number. The first card would not be a match, for example, since the number spoken was “One”, while the number on the card was 5.

The file on the website, CARDS.XLS, contains the results for 500 repetitions of this process. Each row of the file contains one shuffle-and-deal. The first column records what repetition it is. The next 10 columns give the order in which the cards are dealt. The last column records the number of observed matches among the 10 cards.

Your tasks are as follows:

1. Expand the file to include one more column. For any given row, column M should contain a “1” if there was at least one match in that row, and a “0” if there was no match in the row. So, for example, the eleventh row of the file contains the deal

8 |2 |5 |6 |1 |3 |9 |7 |4 |10 | |

This deal contains two matches (the numbers 2 and 10), and so column M should contain a “1” (at least one match). You can easily write formulas for computing the values in this column by using the IF statement in Excel. (If you don’t know how to do this, choose Help from the Excel menu bar and ask about “=IF”.)

2. Suppose we define a “win” as a deal that includes at least one match. Using all 500 deals in the file, test the null hypothesis that a deal results in a win at least half of the time. Use the 1% level of significance.

3. For this task only, assume that the standard deviation in the number of matches in a deal is 1. Use this to determine the appropriate sample size if we wish to test the null hypothesis that the average number of matches in a deal is 1. We want to conduct this test at the 5% level of significance with a margin of error of at most 0.1.

4. Using this sample size, conduct the hypothesis test for the null hypothesis that the average number of matches in a deal is 1. For this task do not assume that the population standard deviation is 1. Take your sample from the beginning of the file, using as many rows as you need. That is, if you decide on a sample size of 205, use the first 205 deals appearing in the file as your sample.

The report should be professional in appearance, done with a word processor and Excel, and should include:

• A cover page including your name, your section, and the name of the project .

• The text of each of the tasks above, along with your answer for that task. This writeup of your answer to the task should include

o a check, where appropriate, of the assumptions needed to apply a statistical method. (You should conduct the statistical analysis even if the assumptions are not met, but should comment on this in your conclusions.)

o your calculations, shown on an Excel spreadsheet, with correct notation used. (You may write “x-bar” for sample mean, “sigma-sub-p” for the standard error of the proportion, and so on.) You may not use my spreadsheets for this work. Follow the on the first page for submission guidelines for Excel work. Note, in particular, that you will be giving me two copies of your spreadsheet: the numerical view and the formula view.

o your final conclusions, interpreting the results of your spreadsheet. Your conclusions should refer to the particular task at hand. Don’t just tell me, “We reject the null hypothesis.” What was the null, and what does it mean to reject it?

It’s up to you whether you print your spreadsheets on a separate sheet or imbed them in your Word document, but it’s probably easier for you to print them on separate sheets.

-----------------------

IMPORTANT!!!

This project is to be done by you individually, without assistance from anyone except Dr. Stevens. This includes but is not limited to any discussion of the project with classmates, other students, faculty, and family. While you are allowed to use any textbooks or library books that you wish, any discussion or sharing of your methods, spreadsheets, data, or results with other students constitutes an honor violation and will be dealt with as such. You should know that I have successfully pursued such violations in the past.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download