AP Stats. Ch. 11 Review WS - MR. BRINKHUS' WEBSITE

AP Stats. Ch. 11 Review WS Tell whether each situation below is best suited for a (a) chi-square goodness of fit test, (b) a chi-square test for homogeneity, or (c) a chi-square test for independence/association. 1. Two populations, two SRSs taken; one categorical variable; compare the distribution of the categorical variable for each population In this case use a chi-square test for homogeneity. Example: you suspect that the color distribution of skittles has changed over the years. You buy a large bag of skittles and get hold of a large bag of skittles that is five years old. The null hypothesis is the color distribution for currently produced skittles is the same as the color distribution for skittles produced in 2008. The alternative hypothesis is that the two distributions are NOT the same.

2. One population, one SRS taken; one categorical variable; compare the distribution of the categorical variable against a hypothesized distribution In this case use a chi-square goodness of fit test. Example: recall the Skittles activity--the company claims that they make 25% red Skittles, 20% Orange Skittles, etc. We are testing that claim by taking a single SRS of Skittles. The null hypothesis is that the color distribution of Skittles is the same as the distribution that the company claims, and the alternative hypothesis is that they are not the same.

3. One population one SRS taken; two categorical variables; determine if there's an association between the two categorical variables In this case use a chi-square test for association/independence. Example: We suspect that there is an association between sex and age at American Apparel. We take a large SRS of all employees at A.A. and categorize the data according to age and sex. This forms our two-way table. The null hypothesis is that there is NO association between age and gender. The alternative hypothesis is that there is an association.

Alternatively, the null hypothesis can be that age is independent of sex, while the alternative hypothesis is that age and sex are dependent.

4. One population one census taken; two categorical variables; determine if there's an association between the two categorical variables. Same as #3 except that when conducting the test we do NOT have to check the 10% rule (independence) or the random condition since this is NOT a sample--it is a census. The only condition that must be verified is the large samples conditions (expected counts must be 5 or more).

5. An article in a newspaper included the following table:

Age (years):

18?24

25?34

35?44

45?54

55?64

65+

Total

Use online social networks:

137

126

61

38

15

9

386

Do not use online social

46

95

143

160

130

124

698

networks:

Total:

183

221

204

198

145

133

1084

Suppose that you decide to analyze these data using a chi-square test. However, without any additional information about how the data

were collected, it isn't possible to know which chi-square test is appropriate.

(a) Explain how you know that a goodness-of-fit test is not appropriate for analyzing these data.

Can't use this test since we have TWO categorical variables (age and use of social networks).

(b) Describe how these data could have been collected so that a test for homogeneity is appropriate. Test for homogeneity requires multiple SRSs from multiple populations. Therefore, take 6 separate SRSs based on age and ask the subjects in each SRS if they use social networks.

(c) Describe how these data could have been collected so that a test for association/independence is appropriate. A test for association/independence requires a single large sample for a single population. In this case we would have to take a single large SRS and categorize the data into an age group and whether or not they use social networks.

6. In some countries, people believe that blood type has a strong impact on personality. For example, Type B blood is thought to be

associated with passion and creativity. A statistics student at a large U.S. university decides to test this theory. Reasoning that people

involved in the arts should be passionate and creative, she takes a SRS of students majoring in arts at her university and asks them for

their blood type. Here are her results (observed number of arts majors with each blood type):

Type A

Type B

Type AB

Type O

50

23

10

67

Assume the distribution of blood type among all U.S. residents is as follows: Type A: 42%;Type B: 10%; Type AB: 4%; Type O: 44%.

(a) The student wants to carry out a test of significance to see if the distribution of blood types among arts majors or minors is different

from the U.S. distribution. State the null and alternative hypotheses for this test.

H0: The distribution of blood type among students majoring in arts at the university is the same as the U.S.'s claimed blood type

distribution.

OR pA = .42, pB = .10, pAB = .04, pO = .44 where pi = proportion of arts students with a specific blood type

Ha: The distribution of blood type among students majoring in arts at the university is different than the U.S.'s claimed blood type distribution. OR At least one of the pi's is incorrect.

(b) Find the expected counts for each blood type under the assumption that the null hypothesis is true.

Type A: 63

Type B: 15

Type AB: 6

Type O: 66

(c) Discuss whether the conditions for this test have been met. Random: "she takes an SRS of students" Large Sample Size: All expected counts are at least 5. (63, 15, 6, 66) Independent: The blood type of one student is independent from the blood type of another student. We can assume that there are more than 10(150) = 1500 students majoring in arts at this university.

(d) Find the value of the test statistic and the P-value of the test, and make the appropriate conclusion. Use

.

df = 3

p-value cdf

Assuming H0 is true (the distribution of blood type among students majoring in arts at the university is the same as the U.S.'s claimed

blood type distribution), there is a .022 probability of getting a value of 9.63 or more, purely by chance. This provides strong

evidence against H0 and is statistically significant at a = 0.05 level. Therefore, we reject H0 and can conclude that the distribution of

blood type among students majoring in arts at the university is different than the U.S.'s claimed blood type distribution.

(e) Based on your answer to (d), which error is it possible that you have made, Type I or Type II? Describe that error in the context of the problem. Type I error: Conclude to reject H0 but H0 is true. We conclude that the distribution of blood types for arts majors/minors is different from the U.S. distribution when it is not different.

(f) Use the components of the chi-square statistic to perform a follow-up analysis on the impact of blood type on personality. The largest component of is 4.267 because the number of arts students with Type B blood was higher than expected. This supports the claim that arts students are more likely to have Type B blood.

7) Have older cell-phone users caught on to texting? Simple random samples of people under 30 and over 30 were asked whether they

made more phone calls or sent more text messages on their cell phone on a typical day. Here are the results:

Under 30

Over 30

More texts

146

32

More phone calls

34

98

Give the name of the appropriate chi-square procedure and state the null and alternative hypotheses for the test that will address this

question.

Since two SRSs were taken a chi-square test for homogeneity should be used.

H0: the distribution of preferred communication for the under 30 population is the same as the distribution of preferred communication

for the over 30 population.

Ha: the distribution of preferred communication for the under 30 population is different than the distribution of preferred communication

for the over 30 population.

8) A person studying fathers' involvement in their children's education interviews a simple random sample of fathers of school-age

children. One question concerns regularly scheduled parent-teacher conferences.

Attended

Attended

Attended

Total

all

some

none

Fathers in two-parent families

109

132

203

444

Fathers in single-parent families

15

10

13

38

Non-resident fathers

11

5

82

98

Total

135

147

298

580

(a) Use appropriate graphical techniques to illustrate the relationship between these two variables.

Logic dictates that the explanatory variable in this case is the row variable. I suspect that a father who does not live with the child may

be less involved than a father from a single-parent family. Therefore, the explanatory variable is placed along the x-axis using three

stacked-bar graphs. For each row, divide the row totals into each cell. For example the first row then becomes 109/444, 132/444, and

203/444. Convert these into percentages and form a stacked bar graph that totals 100%. Do this for all three row values and see if,

visually there seems to be an association between the row and column variables. This association will be confirmed with numerical

evidence in (b).

(b) Based on your graphical analysis, discuss the relationship between family type and how often fathers attended parent-teacher conferences. Support your conclusions with the appropriate statistical test. State: H0: there is no association between family type and how often fathers attend parent-teacher conferences

Ha: there is an association between family type and how often fathers attend parent-teacher conferences

Plan: chi-square test for association Random-- "simple random sample of fathers" Large samples-you must compute the expected counts for EACH cell using the formula (row total x column total) / grand total, and show that each expected count is 5 or more. This tells us that the distribution is approximately a chi-square distribution (NOT NORMAL!) Independence-- One father's response should have no effect on another father's response. We can safely assume that there are more than 10(580) = 5800 families with school-age children. NOTE: if multiple SRSs had been taken, then the 10% rule MUST HOLD FOR EACH AND EVERY POPULATION/SAMPLE IN THE STUDY!!!

Do:

54.7742, degrees of freedom = 4, p-value = 3.622x10-11.

Conclude: Assuming H0 is true (there is no association between family type and how often fathers attend parent-teacher conferences), there is close to a 0 probability of getting a value of 54.7742 or more, purely by chance. This provides very strong evidence against H0 and is statistically significant at a = 0.05 level. Therefore, we reject H0 and can conclude that there is an association between family type and how often fathers attend parent-teacher conferences.

9) A study was conducted to determine where moose are found in a region containing a large burned area. A map of the study area was partitioned into the following four habitat types. (1) Inside the burned area, not near the edge of the burned area, (2) Inside the burned area, near the edge, (3) Outside the burned area, near the edge, and (4) Outside the burned area, not near the edge. The figure below shows these four habitat types

The proportion of total acreage in each of the habitat types was

determined for the study area. Using an aerial survey, moose

locations were observed and classified into one of the four habitat

types. The results are given in the table below.

Habitat Type

Proportion of Total Number of Moose

Acreage

Observed

1

0.340

25

2

0.101

22

3

0.104

30

4

0.455

40

Total

1.000

117

(a) The researchers who are conducting the study expect the number of moose observed in a habitat type to be proportional to the

amount of acreage of that type of habitat. Are the data consistent with this expectation? Conduct an appropriate statistical test to

support your conclusion. Assume the conditions for inference are met.

State: H0: the moose distribution across the 4 different habitats is proportional to the amount of acreage in those habitats. Ha: the moose distribution across the 4 different habitats is NOT proportional to the amount of acreage in those habitats.

Plan: chi-square goodness of fit test (1 population, 1 sample, 1 categorical variable) We were told the conditions for inference are met.

Do: I used the chi-square goodness of fit test on my calculator, and my results were = 43.689, df = 3, p-value = 1.76x10-9.

Conclude: Assuming H0 is true (the moose distribution across the 4 different habitats is proportional to the amount of acreage in those habitats), there is close to a 0 probability of getting a value of 43.689 or more, purely by chance. This provides very strong evidence against H0 and is statistically significant at a = 0.05 level. Therefore, we reject H0 and can conclude that the moose distribution across the 4 different habitats is NOT proportional to the amount of acreage in those habitats. They probably don't like barbequed vegetation.

(b) Relative to the proportion of total acreage, which habitat types did the moose seem to prefer? Explain. This has to do with the chi-square components. The largest chi-square component, (O-E)2/E is 26.132, which corresponds to habitat 3. Since the observed number of moose in habitat 3 was 30 and the expected number of moose was 12.168, many more moose seemed to prefer this habitat over the others.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download