September 14, 2005



Relations in Categorical Data Name ________________________________

The following table represents Census Bureau data (as of 2000, in thousands of persons) on the years of school completed by Americans of different ages. Many people under age 25 have not completed their education, so they are left out of the table. Both variables, age and education, are grouped into categories.

| | |AGE GROUP | | |

|EDUCATION |25 to 34 |35 to 54 |55+ |TOTAL |

|Did not complete high school |4,474 |9,155 |14,224 | |

|Completed high school |11,546 |26,481 |20,060 | |

|1-3 years of college |10,700 |22,618 |11,127 | |

|4+ years of college |11,066 |23,183 |10,596 | |

|TOTAL | | | | |

MARGINAL DISTRIBUTIONS

• Row totals give the distribution of education level among all people over 25 years of age.

• The distributions of education and age alone are called marginal distributions because they appear at the right and bottom margins of the two-way table.

1. Complete the row and column totals in the table above.

2. Find the percent of people 25 years of age or older that have at least 4 years of college.

3. Find the remaining marginal distributions of education level (in percents).

No high school: High school: 1-3 years college:

4. What percent of people aged 25 to 34 have completed 4 years of college?

5. Find the remaining percentages of people who completed 4 years of college for the other age groups.

35 to 54: 55+:

CONDITIONAL DISTRIBUTIONS

rmation about the 25 to 34 age group occupies the first column in the table. To find the complete

distribution of education in this group, look only at that column. Compute each count as a percent of the

column total of 37,786.

No high school: High School:

1-3 years college: 4+ years college:

2. The four percents together are the conditional distributions of education, given that a person is 25 to 34

years of age. The term conditional is used because the distribution refers only to people who satisfy the

condition that they are 25 to 34 years old.

3. Create a segmented bar graph for the conditional distribution above.

SIMPSON’S PARADOX

To help consumers make informed decisions about health care, the government releases data about patient outcomes in hospitals. Here is a two-way table of the survival of patients after surgery in these two hospitals. All patients undergoing surgery in a recent time period are included. “Survived” means that the patient lived at least 6 weeks following surgery.

| |Hospital A |Hospital B |

|Died |63 |16 |

|Survived |2037 |784 |

|TOTAL | | |

When patients are classified as in “good” or “poor” condition before the surgery, the following tables result.

GOOD CONDITION POOR CONDITION

| |Hospital A |Hospital B |

|Died |57 |8 |

|Survived |1443 |192 |

|TOTAL | | |

| |Hospital A |Hospital B |

|Died |6 |8 |

|Survived |594 |592 |

|TOTAL | | |

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download