Categorical DataAnalysis

STA 4504 - 5503: Outline of Lecture Notes, c Alan Agresti

Categorical Data Analysis

1. Introduction

? Methods for response (dependent) variable Y having

scale that is a set of categories

? Explanatory variables may be categorical or continuous or both

1

Example

Y = vote in election (Democrat, Republican, Independent)

xs - income, education, gender, race

Two types of categorical variables

Nominal - unordered categories

Ordinal - ordered categories

Examples

Ordinal

patient condition

(excellent, good, fair, poor)

government spending

(too high, about right, too low)

2

Nominal

transport to work

favorite music

(car, bus, bicycle, walk, . . . )

(rock, classical, jazz, country, folk, pop)

We pay special attention to

binary variables

(success - fail)

for which nominal - ordinal distinction unimportant.

3

Probability Distributions for Categorical Data

The binomial distribution (and its multinomial distribution generalization) plays the role that the normal

distribution does for continuous response.

Binomial Distribution

? n Bernoulli trials - two possible outcomes for each

(success, failure)

? = P (success), 1 ? = P (f ailure) for each trial

? Y = number of successes out of n trials

? Trials are independent

Y has binomial distribution

4

n!

y (1 ? )n?y , y = 0, 1, 2, . . . , n

P (y) =

y!(n ? y)!

y! = y(y ? 1)(y ? 2) (1) with 0! = 1 (factorial)

Example

Vote (Democrat, Republican)

Suppose = prob(Democrat) = 0.50.

For random sample size n = 3, let y = number of Democratic votes

3!

p(y) =

.5y .53?y

y!(3 ? y)!

3! 0 3

p(0) =

.5 .5 = .53 = 0.125

0!3!

p(1) =

3! 1 2

.5 .5 = 3(.53) = 0.375

1!2!

y P (y)

0 0.125

1 0.375

2 0.375

3 0.152

1.0

5

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download