Stat 521 Notes 17
Stat 521 Notes 21
Discrete Response Models:
Reading: Wooldridge, Chapter 15.
I. Binary Response Models
In the next few lectures, we will consider models in which the response variable is discrete. We will use maximum likelihood to estimate these models. Initially, we look at the case where the outcome is binary: yes/no, participation/nonparticipation, employed/unemployed. After that we will look at more complicated cases where the outcome may take on a number of values, possibly ordered (high school dropout /
high school /college), or categorical (employed/unemployed/out-of-the-labor-force).
As a first example, we will look at the decision to go to college using data from the National Longitudinal Survey of Youth (NLSY). The data set, which we used in Notes 1, consists of 930 observations on men between 28 and 36 years. We will consider modeling how the decision goes to college depends on iq, mother’s education, father’s education and race. 452 of the 930 men went to college (49%).
nlsdata=read.table("nls_2008.txt");
logwage=nlsdata$V1;
educ=nlsdata$V2;
exper=nlsdata$V3;
age=nlsdata$V4;
fatheduc=nlsdata$V5; # father's education in years
motheduc=nlsdata$V6; # mother's education in years
kww=nlsdata$V7; # a test score
iq=nlsdata$V8; # IQ
white=nlsdata$V9; # Dummy variable for being white
college=
Let [pic]1 if man i went to college and 0 otherwise. Let [pic](iq, father’s education, mother’s education, white).
The simplest thing to do is to use a linear probability model:
[pic],
where [pic]is assumed to be uncorrelated with [pic] so that [pic].
# Linear Probability model
linprob.model=lm(college~iq+fatheduc+motheduc+white);
summary(linprob.model);
Call:
lm(formula = college ~ iq + fatheduc + motheduc + white)
Residuals:
Min 1Q Median 3Q Max
-0.98541 -0.38893 0.01068 0.35239 1.38182
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.108766 0.096198 -11.526 < 2e-16 ***
iq 0.014014 0.001045 13.405 < 2e-16 ***
fatheduc 0.012004 0.003289 3.650 0.000277 ***
motheduc 0.018019 0.004205 4.286 2.01e-05 ***
white -0.113874 0.046672 -2.440 0.014879 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.4296 on 925 degrees of freedom
Multiple R-squared: 0.2652, Adjusted R-squared: 0.262
F-statistic: 83.44 on 4 and 925 DF, p-value: < 2.2e-16
A problem with the linear probability model is that it may lead for some values of the covariates to predicted values outside the interval [0, 1]. For example, a person with a 130 IQ whose parents both got doctorates is estimated to have a 1.31 “probability” of going to college:
-1.108766+130*.014014+20*.012004+20*.018019
[1] 1.313514
Such predictions would clearly be awkward, and in practice researchers adjust them to lie within the [0,1] interval, but that is ad hoc.
An alternative approach is to use a latent index model. The latent index [pic]is assumed to follow a linear model
[pic] (1.1)
We do not observe [pic] but observe [pic], which is equal to the indicator that the latent index is positive:
[pic].
Different assumptions about [pic] lead to different models.
Probit model:
The probit model assumes that the conditional distribution of [pic] given [pic] is standard normal (i.e., normal with mean zero and variance one). The reason for fixing the variance is that we cannot distinguish between a data generating process with parameter [pic] and variance [pic] and one with parameter [pic] and variance 1. Thus, fixing the variance of [pic]to be one is just a normalization and not a substantive assumption.
Under the probit model, the probability that [pic] is 1 is:
[pic]
where [pic]is the standard normal CDF.
The log likelihood is
[pic]
with first derivatives
[pic].
Solving for the maximum likelihood estimator requires numerical optimization. It is made slightly more complicated by the fact that even the evaluation of the log likelihood function
requires numerical methods since there is no closed form solution for the normal distribution function. R has an automatic way to fit probit models. The probit model is a special case of a generalized linear model which can be fit using the glm command in R.
# Probit Model
y=college
probit.model=glm(y~iq+fatheduc+motheduc+white,family=binomial(link="probit"));
summary(probit.model);
Call:
glm(formula = y ~ iq + fatheduc + motheduc + white, family = binomial(link = "probit"))
Deviance Residuals:
Min 1Q Median 3Q Max
-2.3559 -0.9472 -0.2363 0.8727 3.4654
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.158770 0.376084 -13.717 < 2e-16 ***
iq 0.044431 0.003811 11.659 < 2e-16 ***
fatheduc 0.036253 0.010546 3.438 0.000587 ***
motheduc 0.060168 0.014110 4.264 2.01e-05 ***
white -0.318401 0.159044 -2.002 0.045288 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1288.5 on 929 degrees of freedom
Residual deviance: 1004.9 on 925 degrees of freedom
AIC: 1014.9
Number of Fisher Scoring iterations: 4
IQ, father’s education, mother’s education and white are all significantly associated with going to college.
The parameter estimates themselves are not all that meaningful. A more meaningful summary of the effect of a variable is the average derivative – the average of the derivatives of [pic] with respect to [pic]across the units. For the normal linear regression model, the average derivative is just [pic]. For the probit model, the derivative for unit i is
[pic]
where [pic]is the standard normal probability density function. The average derivative is
[pic]
# Calculate average derivatives for Probit Model
x.tr.beta=predict.glm(probit.model); # This returns X'beta for each unit
phi.x.tr.beta=dnorm(x.tr.beta); # standard normal density at X'beta
# Create a matrix in which each column is phi.x.tr.beta and there are as
# many columns as coefficients in beta
tempmat=matrix(rep(phi.x.tr.beta,length(coef(probit.model))),ncol=length(coef(probit.model)));
# Create a matrix in which each row is beta and there are number of rows
# equal to number of units
tempmat2=matrix(rep(coef(probit.model),length(y)),byrow=TRUE,nrow=length(y));
# Calculate derivatives for each unit
derivmat=tempmat*tempmat2;
# Compute average derivatives
avgderivmat=apply(derivmat,2,mean);
> avgderivmat
[1] -1.58666259 0.01366561 0.01115014 0.01850571 -0.09792926
The average derivatives are:
| |Average Derivative |Standard Error |
|Intercept |-1.587 |0.092 |
|IQ |0.014 |0.001 |
|Father’s Education |0.011 |0.003 |
|Mother’s Education |0.019 |0.005 |
|White |-0.098 |0.058 |
A one point increase in IQ increase the probability that a person would go to college by 0.014 holding everything else fixed.
The standard errors are calculated using the bootstrap.
# Bootstrap standard errors for average derivatives
noboots=200;
set.seed(10);
avgderivboots=matrix(rep(0,noboots*5),ncol=5);
for(i in 1:noboots){
resamplevec=sample(1:length(college),replace=TRUE);
probit.model=glm(college[resamplevec]~iq[resamplevec]+fatheduc[resamplevec]+motheduc[resamplevec]+white[resamplevec],family=binomial(link="probit"));
# Calculate average derivatives for Probit Model
x.tr.beta=predict.glm(probit.model); # This returns X'beta for each unit
phi.x.tr.beta=dnorm(x.tr.beta); # standard normal density at X'beta
# Create a matrix in which each column is phi.x.tr.beta and there are as
# many columns as coefficients in beta
tempmat=matrix(rep(phi.x.tr.beta,length(coef(probit.model))),ncol=length(coef(probit.model)));
# Create a matrix in which each row is beta and there are number of rows
# equal to number of units
tempmat2=matrix(rep(coef(probit.model),length(y)),byrow=TRUE,nrow=length(y));
# Calculate derivatives for each unit
derivmat=tempmat*tempmat2;
# Compute average derivatives
avgderivmat=apply(derivmat,2,mean);
avgderivboots[i,]=avgderivmat;
}
se.avgderiv.boot=apply(avgderivboots,2,sd);
> se.avgderiv.boot
[1] 0.091795369 0.001179432 0.003398914 0.004747073 0.058200784
Logit Model:
The logit model assumes that the distribution of [pic]in (1.1) is logistic.
[pic]
The logit model has
[pic].
The model can be estimated by maximum likelihood using the glm command with link=``logit’’
# Logit Model
logit.model=glm(college~iq+fatheduc+motheduc+white,family=binomial(link="logit"));
summary(logit.model);
Call:
glm(formula = college ~ iq + fatheduc + motheduc + white, family = binomial(link = "logit"))
Deviance Residuals:
Min 1Q Median 3Q Max
-2.3172 -0.9200 -0.2677 0.8635 3.1273
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -8.906264 0.690648 -12.896 < 2e-16 ***
iq 0.076816 0.006842 11.228 < 2e-16 ***
fatheduc 0.061100 0.017804 3.432 0.000599 ***
motheduc 0.104628 0.024321 4.302 1.69e-05 ***
white -0.585277 0.275094 -2.128 0.033374 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1288.5 on 929 degrees of freedom
Residual deviance: 1001.5 on 925 degrees of freedom
AIC: 1011.5
Number of Fisher Scoring iterations: 4
The coefficients from the logistic regression have an interpretation in terms of odds ratios. The odds of [pic] being 1 conditional on [pic] are [pic]. For the coefficient [pic] associated with the variable [pic], we have that [pic] is the ratio of the odds of [pic] when [pic] is increased by 1 and the other components of [pic]are held fixed to when the odds of [pic] given [pic].
The average derivatives for the logit model are
[pic]
# Calculate average derivatives for Logit Model
x.tr.beta=predict.glm(probit.model); # This returns X'beta for each unit
xfunc=exp(x.tr.beta)/(1+exp(x.tr.beta))^2;
# Create a matrix in which each column is xfunc and there are as
# many columns as coefficients in beta
tempmat=matrix(rep(xfunc,length(coef(logit.model))),ncol=length(coef(logit.model)));
# Create a matrix in which each row is beta and there are number of rows
# equal to number of units
tempmat2=matrix(rep(coef(logit.model),length(y)),byrow=TRUE,nrow=length(y));
# Calculate derivatives for each unit
derivmat=tempmat*tempmat2;
# Compute average derivatives
avgderivmat=apply(derivmat,2,mean);
> avgderivmat
[1] -1.88362933 0.01624619 0.01292242 0.02212830 -0.12378316
Note that although the coefficient estimates of the probit and logit model are very different (and have different interpretations), the average derivatives are very close, suggesting that in practice it does not make much difference which model we use.
| |Average Derivative |Average Derivative Logit |
| |Probit | |
|Intercept |-1.587 |-1.884 |
|IQ |0.014 |0.016 |
|Father’s Education |0.011 |0.013 |
|Mother’s Education |0.019 |0.022 |
|White |-0.098 |-0.124 |
II. Likelihood Ratio Tests
Consider the logit model. Suppose we would like to test the null hypothesis that neither father’s education nor mother’s education is associated with going to college conditional on IQ and race. In other words, we would like to test [pic]
When estimating a model by maximum likelihood, a general approach to testing a null hypothesis that sets certain parameters to a set value (such as zero) is the likelihood ratio test. The test statistic for the likelihood ratio test is
[pic]Under the null hypothesis, [pic]has a chi-squared distribution with degrees of freedom equal to the number of parameters being fixed under the null hypothesis. For the above null, the degrees of freedom is 2.
The residual deviance of a model is equal to -2*Maximized Log Likelihood.
# Maximum Likelihood under null or alternative
logit.model=glm(college~iq+fatheduc+motheduc+white,family=binomial(link="logit"));
summary(logit.model);
Call:
glm(formula = college ~ iq + fatheduc + motheduc + white, family = binomial(link = "logit"))
Deviance Residuals:
Min 1Q Median 3Q Max
-2.3172 -0.9200 -0.2677 0.8635 3.1273
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -8.906264 0.690648 -12.896 < 2e-16 ***
iq 0.076816 0.006842 11.228 < 2e-16 ***
fatheduc 0.061100 0.017804 3.432 0.000599 ***
motheduc 0.104628 0.024321 4.302 1.69e-05 ***
white -0.585277 0.275094 -2.128 0.033374 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1288.5 on 929 degrees of freedom
Residual deviance: 1001.5 on 925 degrees of freedom
AIC: 1011.5
Number of Fisher Scoring iterations: 4
The residual deviance is 1001.5 so 2*maximized log likelihood is -1001.5.
# Maximum likelihood under null model
logit.model.null=glm(college~iq+white,family=binomial(link="logit"));
> logit.model.null
Call: glm(formula = college ~ iq + white, family = binomial(link = "logit"))
Coefficients:
(Intercept) iq white
-8.22890 0.08184 -0.18363
Degrees of Freedom: 929 Total (i.e. Null); 927 Residual
Null Deviance: 1289
Residual Deviance: 1058 AIC: 1064
The residual deviance is 1058 so 2*maximized likelihood is -1058.
Thus,
[pic]
The cutoff for rejecting the test is
qchisq(.95,2)
[1] 5.991465
so we reject the null hypothesis. The p-value is
1-pchisq(56.5,2)
[1] 5.384582e-13
So there is strong evidence that at least one of father’s education and mother’s education is associated with going to college holding fixed iq and race.
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.