Chapter 3: Describing Relationships (first spread)



Chapter 1.1: Analyzing Categorical Data

Qualitative (Categorical) Data: - either inherently categorical: sex, race, etc.

- created by grouping values of a quantitative variable together: adult/child

To analyze categorical data, we use counts or percents of individuals that fall into various categories. The values of a categorical variable are labels for the categories, such as male or female.

Frequency Tables or Relative Frequency Tables are very useful in analyzing the frequency or % that fall into each category. The tables are also sometimes helpful to create a graphical display.

Types of Displays for categorical data: Bar graph and pie chart

Bar Graph (chart): individual bars with space between each bar

Pie Chart: displays how the population is divided using percentages.

Example: What Personal Media Do You Own?

|Device |Percent who Own |

|Cell Phone |85% |

|MP3 Player |83% |

|Handheld Video |41% |

|Game Player | |

|Laptop |38% |

|Portable CD/ |20% |

|Tape Player | |

Here are the percent of 15-18 year olds that own the following personal media devices, according to the Kaiser Family Foundation:

Questions:

1. Make a well-labeled bar graph to display the data and describe what you see. (you can draw by hand and take a picture and attach it to a new document or use software)

2. Would it be appropriate to make a pie chart for these data? Why or why not?

Graphs: Good and Bad

Question: Deceptive Graph

3) The ad to the right for DIRECTV has multiple problems. See how many you can point out?

Two-Way Tables: Present two categorical variables, one with labels in the column headings and one with labels in the row headings. The table provides a way to analyze the relationships between the two variables.

Marginal Distributions: Appear in the margins and are the totals or percents by row or by column. The counts tell you how many individuals fall into a given category and the percents tell you the percentage of all individuals that fall into the given category. Often, percents are more meaningful than counts. *Note, you may observe round off error when calculating percents.

Marginal distributions do not explain the relationship between the 2 variables. We must use percents to determine the relationship and strength between the 2 variables (see conditional distributions below)

Super Powers

A sample of 200 children from the United Kingdom ages 9-17 was selected from the CensusAtSchool website (). The gender of each student was recorded along with which super power they would most like to have: invisibility, super strength, telepathy (ability to read minds), ability to fly, or ability to freeze time. Here are the results:

| |Female |Male |Total |

|Invisibility |17 |13 | |

|Super Strength |3 |17 | |

|Telepathy |39 |5 | |

|Fly |36 |18 | |

|Freeze Time |20 |32 | |

|Total | | | |

Questions:

4. Use the data in the two-way table to calculate the marginal distribution (in percents) of superpower preferences. (copy and fill in the chart)

5. Make a graph to display the marginal distributions. Describe what you see.

Conditional Distributions of a variable describe the values of that variable among individuals who have a specific value of another variable. There is a separate conditional distribution for each value of the other variable.

6. Calculate the conditional distribution of responses for the males and females given each Super Powers from the table above. (example how to show work)

| |Female |Male |

|Invisibility |96/2367 = 4.1% | |

|Super Strength | | |

|Telepathy | | |

|Fly | | |

|Freeze Time | | |

7. Copy the gold box on chapter 18 How to organize a Statistical Problem: A Four Step Process

State:

Plan:

Do:

Conclude:

8. Based on the survey data, can we conclude that boys and girls differ in their preference of superpower? Give appropriate evidence to support your answer. (for help use example on page 18)

Chapter 1.1: Analyzing Categorical Data (continued)

Side-by-side bar graph: very useful for examining conditional probabilities. Which variable will be the x-axis label?

Segmented bar graphs: This is not the preferred graph of choice because it can be difficult to make comparisons. (not AP Stats but seen a great deal in publications)

Association (or relationship): when specific values of one variable tend to occur in common with specific values of the other. Name two variables that might be associated with each other.

• Don’t use the word correlation as it is reserved for describing the strength of a linear relationship between 2 quantitative variables.

• To state a definitive association, formal inference is required (chi-square test)

• Be very cautious when stating that two categorical variables have an association because there may be a lurking variable in the background. Think of a lurking variables: ice cream sales and number of drownings, drinking wine & better heart health, etc.

Answer the Titanic Data questions at top of page 20. #9 and 10.

Simpson’s Paradox: when an apparent association between 2 categorical variables is reversed when we consider a 3rd variable.

Page 20: Do Helicopters Save Lives?

11. At first, what appears to be the relationship between means of transport and death rates? 12. Once you look at the data broken out by the injury degree of seriousness, what do you notice?

13. Look at survival rates for each transport method for both serious and less serious accidents. What do you notice? Summarize your thoughts below.

Note: Simpson’s paradox is not an AP topic but it is important to discuss because it demonstrates the impact that lurking variables may have on an apparent relationship. In the above example, the seriousness of the accident was a lurking variable. The more serious the accident, the more likely to call a helicopter AND for death to occur.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download