Stat 521 Notes 17



Stat 521 Notes 21

Discrete Response Models:

Reading: Wooldridge, Chapter 15.

I. Binary Response Models

In the next few lectures, we will consider models in which the response variable is discrete. We will use maximum likelihood to estimate these models. Initially, we look at the case where the outcome is binary: yes/no, participation/nonparticipation, employed/unemployed. After that we will look at more complicated cases where the outcome may take on a number of values, possibly ordered (high school dropout /

high school /college), or categorical (employed/unemployed/out-of-the-labor-force).

As a first example, we will look at the decision to go to college using data from the National Longitudinal Survey of Youth (NLSY). The data set, which we used in Notes 1, consists of 930 observations on men between 28 and 36 years. We will consider modeling how the decision goes to college depends on iq, mother’s education, father’s education and race. 452 of the 930 men went to college (49%).

nlsdata=read.table("nls_2008.txt");

logwage=nlsdata$V1;

educ=nlsdata$V2;

exper=nlsdata$V3;

age=nlsdata$V4;

fatheduc=nlsdata$V5; # father's education in years

motheduc=nlsdata$V6; # mother's education in years

kww=nlsdata$V7; # a test score

iq=nlsdata$V8; # IQ

white=nlsdata$V9; # Dummy variable for being white

college=

Let [pic]1 if man i went to college and 0 otherwise. Let [pic](iq, father’s education, mother’s education, white).

The simplest thing to do is to use a linear probability model:

[pic],

where [pic]is assumed to be uncorrelated with [pic] so that [pic].

# Linear Probability model

linprob.model=lm(college~iq+fatheduc+motheduc+white);

summary(linprob.model);

Call:

lm(formula = college ~ iq + fatheduc + motheduc + white)

Residuals:

Min 1Q Median 3Q Max

-0.98541 -0.38893 0.01068 0.35239 1.38182

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) -1.108766 0.096198 -11.526 < 2e-16 ***

iq 0.014014 0.001045 13.405 < 2e-16 ***

fatheduc 0.012004 0.003289 3.650 0.000277 ***

motheduc 0.018019 0.004205 4.286 2.01e-05 ***

white -0.113874 0.046672 -2.440 0.014879 *

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4296 on 925 degrees of freedom

Multiple R-squared: 0.2652, Adjusted R-squared: 0.262

F-statistic: 83.44 on 4 and 925 DF, p-value: < 2.2e-16

A problem with the linear probability model is that it may lead for some values of the covariates to predicted values outside the interval [0, 1]. For example, a person with a 130 IQ whose parents both got doctorates is estimated to have a 1.31 “probability” of going to college:

-1.108766+130*.014014+20*.012004+20*.018019

[1] 1.313514

Such predictions would clearly be awkward, and in practice researchers adjust them to lie within the [0,1] interval, but that is ad hoc.

An alternative approach is to use a latent index model. The latent index [pic]is assumed to follow a linear model

[pic] (1.1)

We do not observe [pic] but observe [pic], which is equal to the indicator that the latent index is positive:

[pic].

Different assumptions about [pic] lead to different models.

Probit model:

The probit model assumes that the conditional distribution of [pic] given [pic] is standard normal (i.e., normal with mean zero and variance one). The reason for fixing the variance is that we cannot distinguish between a data generating process with parameter [pic] and variance [pic] and one with parameter [pic] and variance 1. Thus, fixing the variance of [pic]to be one is just a normalization and not a substantive assumption.

Under the probit model, the probability that [pic] is 1 is:

[pic]

where [pic]is the standard normal CDF.

The log likelihood is

[pic]

with first derivatives

[pic].

Solving for the maximum likelihood estimator requires numerical optimization. It is made slightly more complicated by the fact that even the evaluation of the log likelihood function

requires numerical methods since there is no closed form solution for the normal distribution function. R has an automatic way to fit probit models. The probit model is a special case of a generalized linear model which can be fit using the glm command in R.

# Probit Model

y=college

probit.model=glm(y~iq+fatheduc+motheduc+white,family=binomial(link="probit"));

summary(probit.model);

Call:

glm(formula = y ~ iq + fatheduc + motheduc + white, family = binomial(link = "probit"))

Deviance Residuals:

Min 1Q Median 3Q Max

-2.3559 -0.9472 -0.2363 0.8727 3.4654

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) -5.158770 0.376084 -13.717 < 2e-16 ***

iq 0.044431 0.003811 11.659 < 2e-16 ***

fatheduc 0.036253 0.010546 3.438 0.000587 ***

motheduc 0.060168 0.014110 4.264 2.01e-05 ***

white -0.318401 0.159044 -2.002 0.045288 *

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 1288.5 on 929 degrees of freedom

Residual deviance: 1004.9 on 925 degrees of freedom

AIC: 1014.9

Number of Fisher Scoring iterations: 4

IQ, father’s education, mother’s education and white are all significantly associated with going to college.

The parameter estimates themselves are not all that meaningful. A more meaningful summary of the effect of a variable is the average derivative – the average of the derivatives of [pic] with respect to [pic]across the units. For the normal linear regression model, the average derivative is just [pic]. For the probit model, the derivative for unit i is

[pic]

where [pic]is the standard normal probability density function. The average derivative is

[pic]

# Calculate average derivatives for Probit Model

x.tr.beta=predict.glm(probit.model); # This returns X'beta for each unit

phi.x.tr.beta=dnorm(x.tr.beta); # standard normal density at X'beta

# Create a matrix in which each column is phi.x.tr.beta and there are as

# many columns as coefficients in beta

tempmat=matrix(rep(phi.x.tr.beta,length(coef(probit.model))),ncol=length(coef(probit.model)));

# Create a matrix in which each row is beta and there are number of rows

# equal to number of units

tempmat2=matrix(rep(coef(probit.model),length(y)),byrow=TRUE,nrow=length(y));

# Calculate derivatives for each unit

derivmat=tempmat*tempmat2;

# Compute average derivatives

avgderivmat=apply(derivmat,2,mean);

> avgderivmat

[1] -1.58666259 0.01366561 0.01115014 0.01850571 -0.09792926

The average derivatives are:

| |Average Derivative |Standard Error |

|Intercept |-1.587 |0.092 |

|IQ |0.014 |0.001 |

|Father’s Education |0.011 |0.003 |

|Mother’s Education |0.019 |0.005 |

|White |-0.098 |0.058 |

A one point increase in IQ increase the probability that a person would go to college by 0.014 holding everything else fixed.

The standard errors are calculated using the bootstrap.

# Bootstrap standard errors for average derivatives

noboots=200;

set.seed(10);

avgderivboots=matrix(rep(0,noboots*5),ncol=5);

for(i in 1:noboots){

resamplevec=sample(1:length(college),replace=TRUE);

probit.model=glm(college[resamplevec]~iq[resamplevec]+fatheduc[resamplevec]+motheduc[resamplevec]+white[resamplevec],family=binomial(link="probit"));

# Calculate average derivatives for Probit Model

x.tr.beta=predict.glm(probit.model); # This returns X'beta for each unit

phi.x.tr.beta=dnorm(x.tr.beta); # standard normal density at X'beta

# Create a matrix in which each column is phi.x.tr.beta and there are as

# many columns as coefficients in beta

tempmat=matrix(rep(phi.x.tr.beta,length(coef(probit.model))),ncol=length(coef(probit.model)));

# Create a matrix in which each row is beta and there are number of rows

# equal to number of units

tempmat2=matrix(rep(coef(probit.model),length(y)),byrow=TRUE,nrow=length(y));

# Calculate derivatives for each unit

derivmat=tempmat*tempmat2;

# Compute average derivatives

avgderivmat=apply(derivmat,2,mean);

avgderivboots[i,]=avgderivmat;

}

se.avgderiv.boot=apply(avgderivboots,2,sd);

> se.avgderiv.boot

[1] 0.091795369 0.001179432 0.003398914 0.004747073 0.058200784

Logit Model:

The logit model assumes that the distribution of [pic]in (1.1) is logistic.

[pic]

The logit model has

[pic].

The model can be estimated by maximum likelihood using the glm command with link=``logit’’

# Logit Model

logit.model=glm(college~iq+fatheduc+motheduc+white,family=binomial(link="logit"));

summary(logit.model);

Call:

glm(formula = college ~ iq + fatheduc + motheduc + white, family = binomial(link = "logit"))

Deviance Residuals:

Min 1Q Median 3Q Max

-2.3172 -0.9200 -0.2677 0.8635 3.1273

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) -8.906264 0.690648 -12.896 < 2e-16 ***

iq 0.076816 0.006842 11.228 < 2e-16 ***

fatheduc 0.061100 0.017804 3.432 0.000599 ***

motheduc 0.104628 0.024321 4.302 1.69e-05 ***

white -0.585277 0.275094 -2.128 0.033374 *

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 1288.5 on 929 degrees of freedom

Residual deviance: 1001.5 on 925 degrees of freedom

AIC: 1011.5

Number of Fisher Scoring iterations: 4

The coefficients from the logistic regression have an interpretation in terms of odds ratios. The odds of [pic] being 1 conditional on [pic] are [pic]. For the coefficient [pic] associated with the variable [pic], we have that [pic] is the ratio of the odds of [pic] when [pic] is increased by 1 and the other components of [pic]are held fixed to when the odds of [pic] given [pic].

The average derivatives for the logit model are

[pic]

# Calculate average derivatives for Logit Model

x.tr.beta=predict.glm(probit.model); # This returns X'beta for each unit

xfunc=exp(x.tr.beta)/(1+exp(x.tr.beta))^2;

# Create a matrix in which each column is xfunc and there are as

# many columns as coefficients in beta

tempmat=matrix(rep(xfunc,length(coef(logit.model))),ncol=length(coef(logit.model)));

# Create a matrix in which each row is beta and there are number of rows

# equal to number of units

tempmat2=matrix(rep(coef(logit.model),length(y)),byrow=TRUE,nrow=length(y));

# Calculate derivatives for each unit

derivmat=tempmat*tempmat2;

# Compute average derivatives

avgderivmat=apply(derivmat,2,mean);

> avgderivmat

[1] -1.88362933 0.01624619 0.01292242 0.02212830 -0.12378316

Note that although the coefficient estimates of the probit and logit model are very different (and have different interpretations), the average derivatives are very close, suggesting that in practice it does not make much difference which model we use.

| |Average Derivative |Average Derivative Logit |

| |Probit | |

|Intercept |-1.587 |-1.884 |

|IQ |0.014 |0.016 |

|Father’s Education |0.011 |0.013 |

|Mother’s Education |0.019 |0.022 |

|White |-0.098 |-0.124 |

II. Likelihood Ratio Tests

Consider the logit model. Suppose we would like to test the null hypothesis that neither father’s education nor mother’s education is associated with going to college conditional on IQ and race. In other words, we would like to test [pic]

When estimating a model by maximum likelihood, a general approach to testing a null hypothesis that sets certain parameters to a set value (such as zero) is the likelihood ratio test. The test statistic for the likelihood ratio test is

[pic]Under the null hypothesis, [pic]has a chi-squared distribution with degrees of freedom equal to the number of parameters being fixed under the null hypothesis. For the above null, the degrees of freedom is 2.

The residual deviance of a model is equal to -2*Maximized Log Likelihood.

# Maximum Likelihood under null or alternative

logit.model=glm(college~iq+fatheduc+motheduc+white,family=binomial(link="logit"));

summary(logit.model);

Call:

glm(formula = college ~ iq + fatheduc + motheduc + white, family = binomial(link = "logit"))

Deviance Residuals:

Min 1Q Median 3Q Max

-2.3172 -0.9200 -0.2677 0.8635 3.1273

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) -8.906264 0.690648 -12.896 < 2e-16 ***

iq 0.076816 0.006842 11.228 < 2e-16 ***

fatheduc 0.061100 0.017804 3.432 0.000599 ***

motheduc 0.104628 0.024321 4.302 1.69e-05 ***

white -0.585277 0.275094 -2.128 0.033374 *

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 1288.5 on 929 degrees of freedom

Residual deviance: 1001.5 on 925 degrees of freedom

AIC: 1011.5

Number of Fisher Scoring iterations: 4

The residual deviance is 1001.5 so 2*maximized log likelihood is -1001.5.

# Maximum likelihood under null model

logit.model.null=glm(college~iq+white,family=binomial(link="logit"));

> logit.model.null

Call: glm(formula = college ~ iq + white, family = binomial(link = "logit"))

Coefficients:

(Intercept) iq white

-8.22890 0.08184 -0.18363

Degrees of Freedom: 929 Total (i.e. Null); 927 Residual

Null Deviance: 1289

Residual Deviance: 1058 AIC: 1064

The residual deviance is 1058 so 2*maximized likelihood is -1058.

Thus,

[pic]

The cutoff for rejecting the test is

qchisq(.95,2)

[1] 5.991465

so we reject the null hypothesis. The p-value is

1-pchisq(56.5,2)

[1] 5.384582e-13

So there is strong evidence that at least one of father’s education and mother’s education is associated with going to college holding fixed iq and race.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download