Chapter 3: Describing Relationships (first spread)



Chapter 11 ATE: Inference for Distributions of

Categorical Data

Alternate Examples and Activities

[Page 677]

Alternate Activity: A fair die?

Before beginning this activity, have students make a 6-sided die out of paper, clay, wood, or any other material. Have each student roll their die 60 times and record the how often each number shows up on top. Do the results give convincing evidence that the die isn’t fair? Finish the activity by modifying the instructions for “The candy man can” Activity in the student text, using [pic]= 10 as the expected count for each possible outcome.

[Page 679]

Alternate Example: A fair die?

Jenny made a six-sided die in her ceramics class and rolled it 60 times to test if each side was equally likely to show up on top.

Problem: Assuming that her die is fair, calculate the expected counts for each color.

Solution: If the die is fair, each of the six sides has a [pic] probability of ending up on top. This means each expected counts is [pic]= 10.

[Page 680]

Alternate Example: A fair die?

Here are the results of Jenny’s 60 rolls of her ceramic die and the expected counts.

|Outcome |Observed |Expected |

|1 |13 |10 |

|2 |11 |10 |

|3 |6 |10 |

|4 |12 |10 |

|5 |10 |10 |

|6 |8 |10 |

|Total |60 |60 |

Problem: Calculate the value of the chi-square statistic.

Solution: [pic]= 0.9 + 0.1 + 1.6 + 0.4 + 0 + 0.4 = 3.4.

[Page 683]

Alternate Example: A fair die?

When Jenny rolled her ceramic die 60 times and calculated the chi-square statistic, she got [pic] = 3.4.

Problem: Using the appropriate degrees of freedom, calculate the P-value. What conclusion can you make about Jenny’s die?

Solution: Since there are six possible outcomes when rolling her die, the degrees of freedom = 6 – 1 = 5. Using Table C, the P-value is greater than 0.25 since the [pic] statistic is smaller than the lowest critical value in the df = 5 row. Using technology, [pic]cdf(3.4,1000,5) = 0.64. Since the P-value is quite large, we do not have convincing evidence that her die is unfair. However, this doesn’t prove that her die is fair.

[Page 686]

Alternate Example: Landline surveys

According to the 2000 Census, of all US residents age 20 and older, 19.1% are in their 20’s, 21.5% are in their 30’s, 21.1 % are in their 40’s, 15.5% are in their 50’s, and 22.8% are 60 and older. The table below shows the age distribution for a sample of US residents age 20 and older. Members of the sample were chosen by randomly dialing landline telephone numbers.

|Category |Count |

|20-29 |141 |

|30-39 |186 |

|40-49 |224 |

|50-59 |211 |

|60+ |286 |

|Total |1048 |

Do these data provide convincing evidence that the age distribution of people who answer landline telephone surveys is not the same as the age distribution of all US residents?

State: We want to perform a test of the following hypotheses using [pic] = 0.05:

[pic]: The age distribution of people who answer landline telephone surveys is the same as the age distribution of all US residents.

[pic]: The age distribution of people who answer landline telephone surveys is not the same as the age distribution of all US residents.

Plan: If conditions are met, we will perform a chi-square goodness-of-fit test.

• Random The data came from a random sample of US residents who answer landline telephone surveys.

• Large Sample Size The expected counts are 1048(0.191) = 200.2, 1048(0.215) = 225.3, 1048(0.211) = 221.1, 1048(0.155) = 162.4, 1048(0.228) = 238.9. All expected counts are at least 5.

• Independent Because we are sampling without replacement, there must be at least 10(1048) = 10,480 U.S. residents who answer landline telephone surveys. This is reasonable to assume.

Do:

• Test Statistic [pic]

• P-value Using 5 – 1 = 4 degrees of freedom, P-value = [pic]cdf(48.2,1000,4) [pic] 0.

Conclude: Because the P-value is less than [pic] = 0.05, we reject [pic]. We have convincing evidence that the age distribution of people who answer landline telephone surveys is not the same as the age distribution of all US residents.

[Page 688]

Alternate Example: Birthdays and hockey

In his book Outliers, Malcolm Gladwell suggests that a hockey player’s birth month has a big influence on his chance to make it to the highest levels of the game. Specifically, since January 1 is the cut-off date for youth leagues in Canada (where many NHL players come from), players born in January will be competing against players up to 12 months younger. The older players tend to be bigger, stronger, and more coordinated and hence get more playing time, more coaching, and have a better chance of being successful. To see if this is true, a random sample of 80 National Hockey League players from the 2009-2010 season was selected and their birthdays were recorded. Overall, 32 were born in the first quarter of the year, 20 in the second quarter, 16 in the third quarter, and 12 in the fourth quarter. Do these data provide convincing evidence that the birthdays of NHL players are not uniformly distributed throughout the year?

State: We want to perform a test of the following hypotheses using [pic] = 0.05:

[pic]: The birthdays of NHL hockey players are equally likely to occur in each quarter of the year.

[pic]: The birthdays of NHL hockey players are not equally likely to occur in each quarter of the year.

Plan: If conditions are met, we will perform a chi-square goodness-of-fit test.

• Random The data came from a random sample of NHL players

• Large Sample Size If birthdays are equally likely to be in each quarter of the year, then the expected counts are all [pic]= 20. These counts are all at least 5.

• Independent Because we are sampling without replacement, there must be at least 10(80) = 800 NHL hockey players. In the 2009-2010 season, there were 879 NHL players, so this condition is met.

Do:

• Test Statistic [pic]

• P-value Using 4 – 1 = 3 degrees of freedom, P-value = [pic]cdf(11.2,1000,3)= 0.011.

Conclude: Because the P-value is less than [pic] = 0.05, we reject [pic]. We have convincing evidence that the birthdays of NHL players are not uniformly distributed throughout the year.

[Page 690]

Alternate Example: Birthdays and hockey

In the previous Alternate Example, we concluded that the birthdays of NHL players were not uniformly distributed throughout the year. However, Gladwell’s claim wasn’t just that the distribution wasn’t uniform—he specifically claimed that NHL players are more likely to be born early in the year. Comparing the observed and expected counts, it seems like he was correct. There were 12 more players born in the first quarter than expected, while there were 4 fewer than expected in the third quarter and 8 fewer than expected in the fourth quarter.

[Page 696]

Alternate Example: Saint-John’s-wort and depression

An article in the Journal of the American Medical Association (April 10, 2002, vol 287, no 14) reports the results of a study designed to see if the herb, Saint-John's-wort, is effective in treating moderately severe cases of depression. The study involved 338 subjects who were being treated for major depression. The subjects were randomly assigned to receive one of three treatments: St. John's wort (an herb), Zoloft (a prescription drug) or placebo for an 8-week period. The table below summarizes the results of the experiment.

| |St. John’s Wort |Zoloft |Placebo |Total |

|Full Response |27 |27 |37 |91 |

|Partial Response |16 |26 |13 |55 |

|No Response |70 |56 |66 |192 |

|Total |113 |109 |116 |338 |

Problem:

(a) Calculate the conditional distribution (in proportions) of the type of response for each treatment.

(b) Make an appropriate graph for comparing the conditional distributions in part (a).

(c) Compare the distributions of response for each treatment.

Solution:

(a) For the Saint-John’s-wort treatment:

Full: 27/113 = 0.239 Partial: 16/113 = 0.142 No: 70/113 = 0.619

For the Zoloft treatment:

Full: 27/109 = 0.248 Partial: 26/109 = 0.239 No: 56/109 = 0.514

For the Placebo treatment:

Full: 37/116 = 0.319 Partial: 13/116 = 0.112 No: 66/116 = 0.569

(b)

[pic]

(c) Surprisingly, a higher proportion of subjects receiving the placebo had a full response compared to subjects receiving Saint-John’s-wort or Zoloft. Overall, a higher proportion of Zoloft users had at least some response, followed by placebo users and then Saint-John’s-wort users.

[Page 700]

Alternate Example: Saint-John’s-wort and depression

Here is a summary of the results of the experiment comparing the effects of St. John’s wort, Zoloft, and a placebo.

| |Saint-John’s-wort |Zoloft |Placebo |Total |

|Full Response |27 |27 |37 |91 |

|Partial Response |16 |26 |13 |55 |

|No Response |70 |56 |66 |192 |

|Total |113 |109 |116 |338 |

Problem: Calculate the expected counts for the three treatments, assuming that all three treatments are equally effective.

Solution: Since 91/338 = 26.9% of all patients had a full response, we expect 26.9% of patients in each treatment group to have a full response.

St. John’s wort: [pic] = 30.4

Zoloft: [pic] = 29.3

Placebo: [pic] = 31.2

Similarly we expect 55/338 = 16.3% of patients in each treatment group to have a partial response. This gives expected counts of St. John’s wort: 18.4, Zoloft: 17.7, and Placebo: 18.9.

Finally, we expect 192/338 = 56.8% of patients in each treatment group to have no response. This gives expected counts of St. John’s wort: 64.2, Zoloft: 61.9, and Placebo: 65.9.

[Page 702]

Alternate Example: Saint-John’s-wort and depression

Here is a summary of the results of the experiment comparing the effects of St. John’s wort, Zoloft, and a placebo. Expected counts are listed in parentheses below the observed counts.

| |St. John’s Wort |Zoloft |Placebo |Total |

|Full Response |27 |27 |37 |91 |

| |(30.4) |(29.3) |(31.2) | |

|Partial Response |16 |26 |13 |55 |

| |(18.4) |(17.7) |(18.9) | |

|No Response |70 |56 |66 |192 |

| |(64.2) |(61.9) |(65.9) | |

|Total |113 |109 |116 |338 |

Problem: Calculate the chi-square statistic. Show work.

Solution: [pic]8.72

[Page 702]

Alternate Activity: Does background music influence what customers buy?

After calculating [pic] = 18.28, ask your students how they can tell if this is a big or small value. Since they do not know what degrees of freedom to use, you might suggest simulating the randomization distribution of [pic] in this context, assuming that the music does not influence what customers buy. Using 243 index cards, label 99 “French”, 31 “Italian” and 113 “Other” to represent the actual purchases. Then, shuffle the cards and divide them at random into three piles: one pile of 84 to represent the 84 purchases made with no music playing, 75 to represent the purchases when French music was playing, and 84 to represent the purchases when Italian music was playing. Then, calculate the observed number of bottles of each type in each treatment group and calculate the [pic] value for the simulated experiment. Repeat this process over and over, graphing each [pic] value on a dotplot. Then, students should be able to see that [pic] = 18.28 would be very unlikely to occur simply due to the chance variation in random assignment.

[Page 704]

Alternate Example: Saint-John’s-wort and depression

Earlier we started a significance test of

[pic]: There is no difference in the distribution of responses for patients with moderately severe cases of depression when taking Saint-John’s-wort, Zoloft, or a placebo.

[pic]: There is a difference in the distribution of responses for patients with moderately severe cases of depression when taking Saint-John’s-wort, Zoloft, or a placebo.

The value of [pic] was 8.72.

Problem:

(a) Verify that the conditions for this test are satisfied.

(b) Calculate the P-value for this test.

(c) Interpret the P-value in context.

(d) What is your conclusion?

Solution:

(a) Random The treatments were randomly assigned.

Large sample size The expected counts (30.4, 29.3, 31.2, 18.4, 17.7, 18.9, 64.2, 61.9, 65.9) are all at least 5.

Independent Knowing the response of one subject should not provide any additional information about the response of any other patient.

(b) df = (3 – 1)(3 – 1) = 4, P-value = [pic]cdf(8.72, 1000, 4) = 0.0685.

(c) Assuming that the treatments are equally effective, the probability of observing a difference in the distributions of responses among the three treatment groups as large or larger than the one in the study is about 0.07.

(d) Since the P-value is greater than [pic] = 0.05, we fail to reject the null hypothesis. We do not have convincing evidence that there is a difference in the distribution of responses for patients with moderately severe cases of depression when taking Saint-John’s-wort, Zoloft, or a placebo.

[Page 706]

Alternate Example: Superpowers

In Chapter 1, an Alternate Example examined the distribution of superpower preference for a random sample of 200 children (ages 9-17) from the United Kingdom who filled out a CensusAtSchool survey. Do American children have the same preferences? To find out, a random sample of 215 U.S. children (ages 9-17) was selected from those who filled out a CensusAtSchool survey (censusatschool/). Here are the results from both samples:

| |UK |US |Total |

|Fly |54 |45 |99 |

|Freeze Time |52 |44 |96 |

|Invisibility |30 |37 |67 |

|Super Strength |20 |23 |43 |

|Telepathy |44 |66 |110 |

|Total |200 |215 |415 |

Problem:

(a) Construct an appropriate graph to compare the distribution of superpower preference for U.K. and U.S. children.

(b) Do these data provide convincing evidence that the distribution of superpower preference differs among U.S. and U.K. children?

Solution:

(a) The bar graph compares the conditional distributions (in proportions) of superpower preference for the children in each country’s sample.

[pic]

(b)

State: We want to perform a test of the following hypotheses using [pic] = 0.05:

[pic]: There is no difference in the distribution of superpower preference for U.S. and U.K. children.

[pic]: There is a difference in the distribution of superpower preference for U.S. and U.K. children.

Plan: If conditions are met, we will perform a chi-square test for homogeneity.

• Random The data are from separate random samples of U.K. and U.S. children.

• Large Sample Size The expected counts (listed below) are all at least 5.

|Expected counts |UK |US |Total |

|Fly |47.7 |51.3 |99 |

|Freeze Time |46.3 |49.7 |96 |

|Invisibility |32.3 |34.7 |67 |

|Super Strength |20.7 |22.3 |43 |

|Telepathy |53.0 |57.0 |110 |

|Total |200 |215 |415 |

• Independent The samples were independently selected. Also, since there are more than 10(200) = 2000 children in the UK and more than 10(215) children in the US, both samples are less than 10% of their populations.

Do:

• Test Statistic [pic]

• P-value Using (5 – 1)(2 – 1) = 4 degrees of freedom, P-value = [pic]cdf(6.29,1000,3)= 0.1784.

Conclude: Because the P-value is greater than [pic] = 0.05, we fail to reject [pic]. We do not have convincing evidence that there is a difference in the distribution of superpower preference for U.S. and U.K. schoolchildren.

[Page 710]

Alternate Example: Ibuprofen or acetaminophen?

In a study reported by the Annals of Emergency Medicine (March 2009), researchers conducted a randomized, double-blind clinical trial to compare the effects of ibuprofen and acetaminophen plus codeine as a pain reliever for children recovering from arm fractures. There were many response variables recorded, including the presence of any adverse effect, such as nausea, dizziness, and drowsiness. Here are the results:

| |Ibuprofen |Acetaminophen |Total |

| | |plus Codeine | |

|Adverse effects |36 |57 |93 |

|No adverse effects |86 |55 |141 |

|Total |122 |112 |234 |

Problem:

(a) Explain why it was important to investigate this question with a randomized, double-blind clinical trial.

(b) Is the difference between the two groups statistically significant? Conduct an appropriate chi-square test to find out.

Solution:

(a) It is important that the treatments in an experiment be randomly assigned so that the two treatment groups are roughly equivalent at the beginning of the study. The effects of lurking variables should be balanced out among the two groups because of the randomization. It is also important that both the patients and those administering the drugs and measuring the response do not know who is receiving which treatment. This will keep the expectations the same for both groups of patients and not favor one treatment over the other.

(b) State: We want to perform a test of the following hypotheses using [pic] = 0.05:

[pic]: There is no difference in the proportions of patients like these who suffer adverse effects when taking ibuprofen or acetaminophen plus codeine.

[pic]: There is a difference in the proportions of patients like these who suffer adverse effects when taking ibuprofen or acetaminophen plus codeine.

Plan: If conditions are met, we will perform a chi-square test for homogeneity.

• Random The treatments were assigned at random.

• Large Sample Size The expected counts (listed below) are all at least 5.

|Expected counts |Ibuprofen |Acetaminophen | |

| | |plus Codeine | |

|Adverse effects |48.5 |44.5 |93 |

|No adverse effects |73.5 |67.5 |141 |

|Total |122 |112 |234 |

• Independent Knowing if one subject had an adverse effect shouldn’t give any additional information about the responses of other subjects, so the observations can be considered independent.

Do:

• Test Statistic [pic]

• P-value Using (2 – 1)(2 – 1) = 1 degrees of freedom, P-value = [pic]cdf(11.15,1000,1)= 0.0008.

Conclude: Because the P-value is less than [pic] = 0.05, we reject [pic]. We have convincing evidence that there is a difference in the proportions of patients like these who suffer adverse effects when taking ibuprofen or acetaminophen plus codeine.

[Page 711]

Alternate Example: Ibuprofen or acetaminophen?

Since we are comparing the proportion of subjects with adverse effects for just two treatments, we can also use a two-sample z test for the following hypotheses:

[pic]: [pic]= 0

[pic]: [pic][pic] 0

Using technology, z = –3.339 and P-value = 0.0008. The P-value is exactly the same as the P-value from the chi-square test and [pic].

[Page 713]

Alternate Example: Allergies

In an Alternate Example from chapter 5, we investigated the relationship between gender and having allergies for a random sample of 40 students who completed a CensusAtSchool survey. Here is a two-way table that summarizes the data:

| |Female |Male |Total |

|Allergies |10 |8 |18 |

|No Allergies |13 |9 |22 |

|Total |23 |17 |40 |

Problem:

(a) In chapter 5, we concluded that the events “female” and “allergies” were not independent for the members of the sample. Calculate appropriate conditional distribution to verify that this is true.

(b) Make a well-labeled graph that compares the conditional distributions in part (a).

(c) Write a few sentences describing the relationship between the two variables.

Solution:

(a) For females, 10/23 = 0.435 had allergies while 13/23 = 0.565 did not. For males, 8/17 = 0.471 had allergies while 9/17 = 0.539 did not. Since P(allergies | female)[pic]P(allergies | male), the events “female” and “allergies” are not independent.

(b)

[pic]

(c) The proportion of males that have allergies is slightly higher than the proportion of females who have allergies, suggesting that there might be a relationship between gender and having allergies. However, the proportions are so close that there may not be a relationship between these variables in the population and the difference we see is due to sampling variability.

[Page 715]

Alternate Example: Allergies

The null hypothesis for the previous alternate example is that there is no association between gender and allergies in the population of U.S. high school students who filled out the CensusAtSchool survey. Here is a two-way table that summarizes the data:

| |Female |Male |Total |

|Allergies |10 |8 |18 |

|No Allergies |13 |9 |22 |

|Total |23 |17 |40 |

If the null hypothesis is true, then the same proportion of males and females should have allergies. Since 18/40 = 45% of the overall sample had allergies, this means we should expect that 45% of the females and 45% of the males should have allergies. Likewise, we expect that 22/40 = 55% of each gender will not have allergies.

For the females:

Allergies: [pic] = 10.35 No allergies: [pic] = 12.65

For the males:

Allergies: [pic] = 7.65 No allergies: [pic] = 9.35

[Page 717]

Alternate Example: Allergies

Here is a complete table of observed counts and expected counts (in parentheses).

|observed |Female |Male |Total |

|(expected) | | | |

|Allergies |10 |8 |18 |

| |(10.35) |(7.65) | |

|No Allergies |13 |9 |22 |

| |(12.65) |(9.35) | |

|Total |23 |17 |40 |

Do the data provide convincing evidence of an association between gender and having allergies for U.S. high school students who filled out the CensusAtSchool survey?

State: We want to perform a test of the following hypotheses using [pic] = 0.05:

[pic]: There is no association between gender and having allergies in the population of U.S. high school students who filled out the CensusAtSchool survey.

[pic]: There is an association between gender and having allergies in the population of U.S. high school students who filled out the CensusAtSchool survey.

Plan: If conditions are met, we will perform a chi-square test for association/independence.

• Random The sample was randomly selected.

• Large Sample Size The expected counts (see table above) are all at least 5.

• Independent Knowing the responses of one student shouldn’t tell us anything about the responses of other students. Also, there are more than 10(40) = 400 high school students in the U.S. who filled out the CensusAtSchool survey.

Do:

• Test Statistic [pic]

• P-value Using (2 – 1)(2 – 1) = 1 degrees of freedom, P-value = [pic]cdf(0.051,1000,1)= 0.821.

Conclude: Because the P-value is much greater than [pic] = 0.05, we fail to reject [pic]. We do not have convincing evidence that there is an association between gender and having allergies in the population of U.S. high school students who filled out the CensusAtSchool survey.

[Page 719]

Alternate Example: Online social networking

An article in the Arizona Daily Star (April 9, 2009) included the following table:

| |18-24 |25-34 |35-44 |45-54 |55-64 |65+ |Total |

|Use Online Social Networks |137 |126 |61 |38 |15 |9 |386 |

|Do Not Use Online Social Networks |46 |95 |143 |160 |130 |124 |698 |

|Total |183 |221 |204 |198 |145 |133 |1084 |

Suppose that you decide to analyze this data using a chi-square test. However, without any additional information about how the data was collected, it isn’t possible to know which chi-square test is appropriate.

Problem:

(a) Explain how you know that a goodness-of-fit test is not appropriate for analyzing these data.

(b) Describe how these data could have been collected so that a test for homogeneity is appropriate.

(c) Describe how these data could have been collected so that a test for association/ independence is appropriate.

Solution:

(a) Since there are either two variables or two or more populations, a goodness-of-fit test is not appropriate. Goodness-of-fit tests are only appropriate when analyzing the distribution of one variable in one population.

(b) To make a test for homogeneity appropriate, we would need to take 6 independent random samples, one from each age category, and then ask each person whether or not they use online social networks. Or, we could take 2 independent random samples, one of online social network users and one of people that do not use online social networks, and ask each member of each sample how old they are.

(c) To make a test for association/independence appropriate, we would take one random sample from the population and ask each member about their age and whether or not they use online social networks. This seems like the most reasonable method to collect the data, so a test of association/independence is probably the best choice. But, we can’t know for sure unless we know how the data were collected.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download