Wharton Statistics Department - Statistics Department



Stat 521 Notes 14

Models for Panel Data that Violate the Strict Exogeneity Assumption

Reading: Wooldridge, Chapter 11.1

I. Review of Fixed Effects Model

[pic] (1.1)

This is a structural/casual equation in the sense that [pic]is supposed to represent the causal effect of changes in [pic] holding everything else fixed and [pic] represents the effect of all omitted variables on the outcome. [pic] can be correlated with [pic]. The key assumption of strict exogeneity in the fixed effects model is that the correlation between [pic] and [pic] arises only through the time invariant part [pic] of [pic].

Assumptions of Fixed Effects Model:

1. Assumption 1 (Independence Between Units): The vectors of individual outcomes [pic] and [pic] are independent for [pic].

2. Assumption 2 (Strict Exogeneity): [pic]. [pic] can be thought of as a time varying shock that is independent of the unobserved individual characteristic [pic] and the observed characteristics [pic].

2A. Assumption 2A. The [pic]’s are independent and identically distributed with constant variance [pic]. This assumption is made in the usual fixed effects inferences but can be relaxed by using robust standard errors.

Fixed Effects Model Estimation: Let [pic] be a dummy variable for unit [pic], i.e., [pic]

Then we can write model (1.1) as

[pic]

We can estimate [pic] by least squares regression of [pic] on [pic], pooling together the data for [pic], [pic].

Example: Papke (1994, Journal of Public Economics, “Tax Policy and Urban Development”) studies the effect of urban enterprise zones (EZs) on economic outcomes such as unemployment claims. Urban enterprise zones encourage development in blighted neighborhoods by offering entrepreneurs and investors tax and regulatory relief if they start businesses in the area. Papke considers a panel of 22 areas in Indiana from 1980 to 1988. Indiana’s EZ program began in 1983. To qualify for consideration to be an EZ, an area must have an unemployment rate at least 1.5 times the average statewide unemployment rate, and a resident household poverty rate at least 25 percent above the U.S. poverty level. Areas in the panel became EZs at different time points and some areas did not become EZs at all.

One model Papke uses is a fixed effects model:

[pic]

where [pic] is the log of unemployment claims in area i in year t, [pic] is a time dummy variable and [pic] is a dummy variable for whether the ith area was an EZ in year t. The fixed effects [pic]take account of permanent differences across areas such as industrial composition and composition of the labor force.

ezdata=read.table("ezunem.raw",header=TRUE,sep=",");

area=ezdata$city;

year=ezdata$year;

luclms=ezdata$luclms; # Log of Unemployment Claims

lag_luclms=ezdata$lag_luclms; # Unemployment Claims in previous year in area

ez=ezdata$ez; # Whether area is an urban enterprise zone in given year

# Fixed Effects Model

femodel=lm(luclms~ez+as.factor(year)+as.factor(area));

summary(femodel)

Call:

lm(formula = luclms ~ ez + as.factor(year) + as.factor(area))

Residuals:

Min 1Q Median 3Q Max

-0.57618 -0.10837 -0.00977 0.11364 0.49623

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 11.67615 0.08008 145.808 < 2e-16 ***

ez -0.10441 0.05542 -1.884 0.061291 .

as.factor(year)1981 -0.32163 0.06046 -5.320 3.30e-07 ***

as.factor(year)1982 0.13550 0.06046 2.241 0.026332 *

as.factor(year)1983 -0.21926 0.06046 -3.627 0.000381 ***

as.factor(year)1984 -0.57915 0.06232 -9.294 < 2e-16 ***

as.factor(year)1985 -0.59179 0.06550 -9.036 3.92e-16 ***

as.factor(year)1986 -0.62126 0.06550 -9.486 < 2e-16 ***

as.factor(year)1987 -0.88895 0.06550 -13.573 < 2e-16 ***

as.factor(year)1988 -1.22763 0.06550 -18.744 < 2e-16 ***

as.factor(area)2 -0.19348 0.09941 -1.946 0.053296 .

as.factor(area)3 -0.37894 0.09941 -3.812 0.000194 ***

as.factor(area)4 -0.54117 0.09941 -5.444 1.83e-07 ***

as.factor(area)5 0.01103 0.09472 0.116 0.907407

as.factor(area)6 0.55458 0.09452 5.867 2.32e-08 ***

as.factor(area)7 0.75007 0.09452 7.935 2.90e-13 ***

as.factor(area)8 -0.05876 0.09472 -0.620 0.535900

as.factor(area)9 0.35343 0.09472 3.731 0.000261 ***

as.factor(area)10 1.64501 0.09941 16.548 < 2e-16 ***

as.factor(area)11 -0.13032 0.09941 -1.311 0.191695

as.factor(area)12 -0.03498 0.09941 -0.352 0.725392

as.factor(area)13 -0.83257 0.09941 -8.375 2.15e-14 ***

as.factor(area)14 -0.87363 0.09472 -9.223 < 2e-16 ***

as.factor(area)15 -0.23542 0.09941 -2.368 0.019020 *

as.factor(area)16 0.43574 0.09941 4.383 2.06e-05 ***

as.factor(area)17 -0.44522 0.09452 -4.710 5.18e-06 ***

as.factor(area)18 -0.04289 0.09941 -0.431 0.666694

as.factor(area)19 0.09341 0.09941 0.940 0.348764

as.factor(area)20 -0.35098 0.09452 -3.713 0.000279 ***

as.factor(area)21 0.45779 0.09452 4.843 2.90e-06 ***

as.factor(area)22 0.21864 0.09941 2.199 0.029225 *

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2005 on 167 degrees of freedom

Multiple R-squared: 0.9332, Adjusted R-squared: 0.9212

F-statistic: 77.75 on 30 and 167 DF, p-value: < 2.2e-16

There is suggestive but inconclusive evidence that being designated an enterprise zone reduces log unemployment claims, p-value = 0.06; the estimated effect is that being designated an enterprise zone reduces unemployment claims by about 10%.

Comparison of fixed effects vs. random effects model: Hausman test.

Note: For the Hausman test, I did not describe it entirely correctly in Notes 13. We should only consider coefficients on variables that are time-varying, i.e., we should not consider the intercept, the coefficients on the fixed effects or the coefficients on any other time-constant variables. Let [pic] denote the fixed effects and random effects estimates of the time-varying variables. The Hausman test statistic is

[pic]

Under the null hypothesis that the effects [pic] are uncorrelated with the time-varying variables that are included in the model, [pic]is distributed as [pic] where [pic] is the dimension of [pic].

# Hausman test of fixed effects model vs. random effects model

betahat.femodel=coef(femodel)[2:10];

vcov.femodel=vcov(femodel)[2:10,2:10];

# Random Effects Estimates and Covariance Matrix using the lme4 package

library(lme4);

remodel=lmer(luclms~ez+as.factor(year)+(1|area));

betahat.remodel=fixef(remodel)[2:10];

vcov.remodel=vcov(remodel)[2:10,2:10];

# Hausman test statistic

h=matrix(betahat.femodel-betahat.remodel,nrow=1)%*%solve(vcov.femodel-vcov.remodel)%*%matrix(betahat.femodel-betahat.remodel,ncol=1);

# Compute p-value

pval=1-pchisq(as.numeric(h),length(betahat.femodel));

> h

1 x 1 Matrix of class "dgeMatrix"

[,1]

[1,] 0.07975095

> pval

[1] 1

Here, there is no evidence against the null hypothesis of uncorrelated effects under the maintained hypothesis of strict exogeneity.

II. Models for When Strict Exogeneity Fails

The time varying disturbance [pic] captures shocks to an area that do not represent permanent characteristics of the area, e.g., closure of a business in an area. The closure of a business in year t-1 is likely to still affect employment at year t and thus [pic]is likely to be correlated with [pic]. Furthermore, because the designation of an area as an economic zone depends on previous unemployment, [pic] is likely to be correlated with [pic]. This means strict exogeneity fails: [pic] is correlated with [pic] through their mutual correlation with [pic]. We can address this problem by adding [pic] to the model:

[pic] (1.2)

Here [pic]. It is not plausible that [pic] is strictly exogenous in (1.2) because this would mean [pic] but [pic] is correlated with [pic]. However, the following sequential moment restriction is plausible:

[pic] (1.3)

When assumption (1.3) holds, we say that [pic] are sequentially exogenous conditional on the unobserved effect.

Given model (1.2), assumption (1.3) is equivalent to

[pic].

Suppose [pic] is part of [pic] so we can write [pic].

Sequential exogeneity implies that after [pic] and [pic] have been controlled for, no past values of [pic] affect [pic]. Strict exogeneity requires that after [pic] and [pic] have been controlled for, no values of [pic] other than [pic] affect [pic].

When sequential exogeneity holds but not strict exogeneity, the fixed effects estimator is inconsistent. Generally,

[pic],

where [pic]. Under sequential exogeneity,

[pic] because [pic] and so [pic].

When [pic] includes [pic], then [pic] is correlated with [pic], meaning [pic] will not be zero and the fixed effects estimator will be biased.

We can obtain a consistent estimator of [pic] in (1.2) under sequential exogeneity as follows. First, we take first differences to eliminate the [pic]:

[pic]

We cannot estimate [pic] in the above equation by least squares since [pic] is correlated with [pic]. But we can use [pic] as an instrumental variable for [pic] since [pic] is uncorrelated with [pic] under sequential exogeneity. In other words, we have

[pic]

where [pic] denotes best linear expectation. We can estimate [pic] by two stage least squares:

1. Regress [pic] on [pic] and time dummies by least squares to find [pic].

2. Regress [pic] on [pic], [pic] and time dummies to estimate [pic].

# Calculate lagged EZ values

lagez=rep(NA,length(ez));

for(i in 1:22){

for(t in 1981:1988){

lagez[(area==i & year==t)]=ez[(area==i & year==t-1)];

}

}

# Calculate second lag of Y

second_lag_y=rep(NA,length(luclms));

for(i in 1:22){

for(t in 1982:1988){

second_lag_y[(area==i & year==t)]=luclms[(area==i & year==t-2)];

}

}

# Calculate lag Y - second lag of Y

first_diff_lag_y=lag_luclms-second_lag_y;

# Calculate EZ minus lagged EZ

first_diff_ez=ez-lagez;

# Subset of observations where we have a second lag of y

subset=!(is.na(second_lag_y));

# First stage regression

fsreg=lm(first_diff_lag_y[subset]~second_lag_y[subset]+first_diff_ez[subset]+as.factor(year[subset]));

first_diff_lag_y_hat=predict(fsreg);

# Second stage regression

first_diff_y=luclms-lag_luclms;

ssreg=lm(first_diff_y[subset]~first_diff_ez[subset]+first_diff_lag_y_hat+as.factor(year[subset]),x=TRUE);

# Calculate correct standard errors for two stage least squares

modelmat.ssreg=ssreg$x;

# Modify the model matrix so that actual first_diff_lag_y replaces

# first_diff_lag_y_hat

mod.modelmat=modelmat.ssreg;

mod.modelmat[,3]=first_diff_lag_y[subset];

# Calculate sigmahatusq where u is error in structural equation

sigmahatusq=(1/length(first_diff_lag_y_hat[subset]))*sum((first_diff_y[subset]-mod.modelmat%*%matrix(coef(ssreg),ncol=1))^2);

# Calculate variance of residuals in second stage regression (sigmahatesq)

sigmahatesq=deviance(ssreg)/(length(first_diff_lag_y_hat[subset]));

# Variance is variance from second stage regression times sigmahatusq/sigmahatesq

tsls.var=vcov(ssreg)*(sigmahatusq/sigmahatesq);

# CI for beta (effect of EZ)

betahat=coef(ssreg)[2];

se.betahat=sqrt(tsls.var[2,2]);

lci.betahat=betahat-1.96*se.betahat;

uci.betahat=betahat+1.96*se.betahat;

# CI for rho (effect of lagged y)

rhohat=coef(ssreg)[3];

se.rhohat=sqrt(tsls.var[3,3]);

lci.rhohat=rhohat-1.96*se.rhohat;

uci.rhohat=rhohat+1.96*se.rhohat;

> betahat

first_diff_ez[subset]

-0.2613225

> lci.betahat

first_diff_ez[subset]

-0.575849

> uci.betahat

first_diff_ez[subset]

0.05320405

> rhohat

first_diff_lag_y_hat

0.3553252

> lci.rhohat

first_diff_lag_y_hat

-0.8193982

> uci.rhohat

first_diff_lag_y_hat

1.530049

The point estimate for [pic] is -0.26 which would mean that being designated an enterprise zone reduces unemployment by approximately 26% but the the CI for [pic] is pretty wide (-0.58,-0.05) and does contain 0; there is not strong evidence that enterprise zones reduce unemployment claims under this model. We will study a way to improve the efficiency of our estimate of [pic] in the next class.

More discussion of strict exogeneity vs. sequential exogeneity assumption

The strict exogeneity assumption

[pic]

implies that there is no feedback between lagged dependent variables and future values of the explanatory variable.

The sequential exogeneity assumption

[pic]

implies that current shocks are uncorrelated with past and current values of [pic] but allows for feedback effects from lagged dependent variables (or lagged errors) to current and future values of [pic].

Examples where sequential exogeneity is more plausible than strict exogeneity include:

(1) Rational expectation models of household and firm decisions. In a rational expectations model, it is assumed that a household or firm’s current choice of [pic] is optimal given its current information, hence current shocks are uncorrelated with past and current values of [pic], but a shock in period t can affect choices of [pic].

(2) Effect of children on female labor force participation decisions. Let [pic] be labor force participation and [pic] be number of children. Strict exogeneity would require that labor supply decisions have no effect on fertility decisions at any point in the life cycle which is not realistic.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download