STATA Program for Probit/Logit Models



STATA Program for OLS

cps87_or.do

* the data for this project is a small subsample;

* of full time (30 or more hours) male workers;

* aged 21-64 from the out going rotation;

* samples of the 1987 current population survey;

* this line defines the semicolon as the ;

* end of line delimiter;

# delimit ;

* set memork for 10 meg;

set memory 10m;

* write results to a log file;

* the replace options writes over old;

* log files;

log using cps87_or.log,replace;

* open stata data set;

use c:\bill\stata\cps87_or;

* list variables and labels in data set;

desc;

* generate new variables;

* lines 1-2 illustrate basic math functoins;

* lines 3-4 line illustrate logical operators;

* line 5 illustrate the OR statement;

* line 6 illustrates the AND statement;

* after you construct new variables, compress the data again;

gen age2=age*age;

gen earnwkl=ln(earnwke);

gen union=unionm==1;

gen topcode=earnwke==999;

gen nonwhite=((race==2)|(race==3));

gen big_ne=((region==1)&(smsa==1));

* label the data;

label var age2 "age squared";

label var earnwkl "log earnings per week";

label var topcode "=1 if earnwkl is topcoded";

label var union "1=in union, 0 otherwise";

label var nonwhite "1=nonwhite, 0=white" ;

label var big_ne "1= live in big smsa from northeast, 0=otherwsie";

* get descriptive statistics;

sum;

* get detailed descriptics for continuous variables;

sum earnwke, detail;

* get frequencies of discrete variables;

tabulate unionm;

tabulate race;

* get two-way table of frequencies;

tabulate region smsa, row column cell;

*run simple regression;

reg earnwkl age age2 educ nonwhite union;

* run regression addinf smsa, region and race fixed-effects;

* the xi command constructs the dummies for you;

* the lowest numbered dummy is usually the;

* omitted variable;

xi: reg earnwkl age age2 educ union i.race i.region i.smsa;

more;

* close log file;

log close;

STATA Results for OLS

cps87_do.log

------------------------------------------------------------------------------

log: c:\bill\stata\cps87_or.log

log type: text

opened on: 6 Nov 2004, 08:14:10

. * open stata data set;

. use c:\bill\stata\cps87_or;

. * list variables and labels in data set;

. desc;

Contains data from c:\bill\stata\cps87_or.dta

obs: 19,906

vars: 7 6 Nov 2004 08:11

size: 636,992 (93.9% of memory free)

------------------------------------------------------------------------------

> -

storage display value

variable name type format label variable label

------------------------------------------------------------------------------

> -

age float %9.0g age in years

race float %9.0g 1=white, non-hisp, 2=place,

n.h, 3=hisp

educ float %9.0g years of education

unionm float %9.0g 1=union member, 2=otherwise

smsa float %9.0g 1=live in 19 largest smsa,

2=other smsa, 3=non smsa

region float %9.0g 1=east, 2=midwest, 3=south,

4=west

earnwke float %9.0g usual weekly earnings

------------------------------------------------------------------------------

> -

Sorted by:

. * generate new variables;

. * lines 1-2 illustrate basic math functoins;

. * lines 3-4 line illustrate logical operators;

. * line 5 illustrate the OR statement;

. * line 6 illustrates the AND statement;

. * after you construct new variables, compress the data again;

. gen age2=age*age;

. gen earnwkl=ln(earnwke);

. gen union=unionm==1;

. gen topcode=earnwke==999;

. gen nonwhite=((race==2)|(race==3));

. gen big_ne=((region==1)&(smsa==1));

. * label the data;

. label var age2 "age squared";

. label var earnwkl "log earnings per week";

. label var topcode "=1 if earnwkl is topcoded";

. label var union "1=in union, 0 otherwise";

. label var nonwhite "1=nonwhite, 0=white" ;

. label var big_ne "1= live in big smsa from northeast, 0=otherwsie";

. compress;

age was float now byte

race was float now byte

educ was float now byte

unionm was float now byte

smsa was float now byte

region was float now byte

earnwke was float now int

age2 was float now int

union was float now byte

topcode was float now byte

nonwhite was float now byte

big_ne was float now byte

. more;

. * get descriptive statistics;

. sum;

Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------

age | 19906 37.96619 11.15348 21 64

race | 19906 1.199136 .525493 1 3

educ | 19906 13.16126 2.795234 0 18

unionm | 19906 1.769065 .4214418 1 2

smsa | 19906 1.908369 .7955814 1 3

-------------+--------------------------------------------------------

region | 19906 2.462373 1.079514 1 4

earnwke | 19906 488.264 236.4713 60 999

age2 | 19906 1565.826 912.4383 441 4096

earnwkl | 19906 6.067307 .513047 4.094345 6.906755

union | 19906 .2309354 .4214418 0 1

-------------+--------------------------------------------------------

topcode | 19906 .0719381 .2583919 0 1

nonwhite | 19906 .1408118 .3478361 0 1

big_ne | 19906 .1409625 .3479916 0 1

. * get detailed descriptics for continuous variables;

. sum earnwke, detail;

usual weekly earnings

-------------------------------------------------------------

Percentiles Smallest

1% 128 60

5% 178 60

10% 210 60 Obs 19906

25% 300 63 Sum of Wgt. 19906

50% 449 Mean 488.264

Largest Std. Dev. 236.4713

75% 615 999

90% 865 999 Variance 55918.7

95% 999 999 Skewness .668646

99% 999 999 Kurtosis 2.632356

. more;

. * get frequencies of discrete variables;

. tabulate unionm;

1=union |

member, |

2=otherwise | Freq. Percent Cum.

------------+-----------------------------------

1 | 4,597 23.09 23.09

2 | 15,309 76.91 100.00

------------+-----------------------------------

Total | 19,906 100.00

. tabulate race;

1=white, |

non-hisp, |

2=place, |

n.h, 3=hisp | Freq. Percent Cum.

------------+-----------------------------------

1 | 17,103 85.92 85.92

2 | 1,642 8.25 94.17

3 | 1,161 5.83 100.00

------------+-----------------------------------

Total | 19,906 100.00

. more;

. * get two-way table of frequencies;

. tabulate region smsa, row column cell;

+-------------------+

| Key |

|-------------------|

| frequency |

| row percentage |

| column percentage |

| cell percentage |

+-------------------+

1=east, |

2=midwest, | 1=live in 19 largest smsa,

3=south, | 2=other smsa, 3=non smsa

4=west | 1 2 3 | Total

-----------+---------------------------------+----------

1 | 2,806 1,349 842 | 4,997

| 56.15 27.00 16.85 | 100.00

| 38.46 18.89 15.39 | 25.10

| 14.10 6.78 4.23 | 25.10

-----------+---------------------------------+----------

2 | 1,501 1,742 1,592 | 4,835

| 31.04 36.03 32.93 | 100.00

| 20.58 24.40 29.10 | 24.29

| 7.54 8.75 8.00 | 24.29

-----------+---------------------------------+----------

3 | 1,501 2,542 1,904 | 5,947

| 25.24 42.74 32.02 | 100.00

| 20.58 35.60 34.80 | 29.88

| 7.54 12.77 9.56 | 29.88

-----------+---------------------------------+----------

4 | 1,487 1,507 1,133 | 4,127

| 36.03 36.52 27.45 | 100.00

| 20.38 21.11 20.71 | 20.73

| 7.47 7.57 5.69 | 20.73

-----------+---------------------------------+----------

Total | 7,295 7,140 5,471 | 19,906

| 36.65 35.87 27.48 | 100.00

| 100.00 100.00 100.00 | 100.00

| 36.65 35.87 27.48 | 100.00

. more;

. *run simple regression;

. reg earnwkl age age2 educ nonwhite union;

Source | SS df MS Number of obs = 19906

-------------+------------------------------ F( 5, 19900) = 1775.70

Model | 1616.39963 5 323.279927 Prob > F = 0.0000

Residual | 3622.93905 19900 .182057239 R-squared = 0.3085

-------------+------------------------------ Adj R-squared = 0.3083

Total | 5239.33869 19905 .263217216 Root MSE = .42668

------------------------------------------------------------------------------

earnwkl | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

age | .0679808 .0020033 33.93 0.000 .0640542 .0719075

age2 | -.0006778 .0000245 -27.69 0.000 -.0007258 -.0006299

educ | .069219 .0011256 61.50 0.000 .0670127 .0714252

nonwhite | -.1716133 .0089118 -19.26 0.000 -.1890812 -.1541453

union | .1301547 .0072923 17.85 0.000 .1158613 .1444481

_cons | 3.630805 .0394126 92.12 0.000 3.553553 3.708057

------------------------------------------------------------------------------

. more;

. * run regression addinf smsa, region and race fixed-effects;

. * the xi command constructs the dummies for you;

. * the lowest numbered dummy is usually the;

. * omitted variable;

. xi: reg earnwkl age age2 educ union i.race i.region i.smsa;

i.race _Irace_1-3 (naturally coded; _Irace_1 omitted)

i.region _Iregion_1-4 (naturally coded; _Iregion_1 omitted)

i.smsa _Ismsa_1-3 (naturally coded; _Ismsa_1 omitted)

Source | SS df MS Number of obs = 19906

-------------+------------------------------ F( 11, 19894) = 920.86

Model | 1767.66908 11 160.697189 Prob > F = 0.0000

Residual | 3471.66961 19894 .174508375 R-squared = 0.3374

-------------+------------------------------ Adj R-squared = 0.3370

Total | 5239.33869 19905 .263217216 Root MSE = .41774

------------------------------------------------------------------------------

earnwkl | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

age | .070194 .0019645 35.73 0.000 .0663435 .0740446

age2 | -.0007052 .000024 -29.37 0.000 -.0007522 -.0006581

educ | .0643064 .0011285 56.98 0.000 .0620944 .0665184

union | .1131485 .007257 15.59 0.000 .0989241 .1273729

_Irace_2 | -.2329794 .0110958 -21.00 0.000 -.254728 -.2112308

_Irace_3 | -.1795253 .0134073 -13.39 0.000 -.2058047 -.1532458

_Iregion_2 | -.0088962 .0085926 -1.04 0.301 -.0257383 .007946

_Iregion_3 | -.0281747 .008443 -3.34 0.001 -.0447238 -.0116257

_Iregion_4 | .0318053 .0089802 3.54 0.000 .0142034 .0494071

_Ismsa_2 | -.1225607 .0072078 -17.00 0.000 -.1366886 -.1084328

_Ismsa_3 | -.2054124 .0078651 -26.12 0.000 -.2208287 -.1899961

_cons | 3.76812 .0391241 96.31 0.000 3.691434 3.844807

------------------------------------------------------------------------------

. more;

. * close log file;

. log close;

log: c:\bill\stata\cps87_or.log

log type: text

closed on: 6 Nov 2004, 08:14:19

------------------------------------------------------------------------------

STATA Program for Probit/Logit Models

workplace.do

* this data for this program are a random sample;

* of 10k observations from the data used in;

* evans, farrelly and montgomery, aer, 1999;

* the data are indoor workers in the 1991 and 1993;

* national health interview survey. the survey;

* identifies whether the worker smoked and whether;

* the worker faces a workplace smoking ban;

* set semi colon as the end of line;

# delimit;

* ask it NOT to pause;

set more off;

* open log file;

log using c:\bill\jpsm\workplace1.log,replace;

* use the workplace data set;

use c:\bill\jpsm\workplace1;

* print out variable labels;

desc;

* get summary statistics;

sum;

* run a linear probability model for comparison purposes;

* estimate white standard errors to control for heteroskedasticity;

reg smoker age incomel male black hispanic

hsgrad somecol college worka, robust;

* run probit model;

probit smoker age incomel male black hispanic

hsgrad somecol college worka;

*predict probability of smoking;

predict pred_prob_smoke;

* get detailed descriptive data about predicted prob;

sum pred_prob, detail;

* predict binary outcome with 50% cutoff;

gen pred_smoke1=pred_prob_smoke>=.5;

label variable pred_smoke1 "predicted smoking, 50% cutoff";

* compare actual values;

tab smoker pred_smoke1, row col cell;

* ask for marginal effects/treatment effects;

mfx compute;

* the same type of variables can be produced with;

* prchange. this command is however more flexible;

* in that you can change the reference individual;

prchange, help;

* get marginal effect/treatment effects for specific person;

* male, age 40, college educ, white, without workplace smoking ban;

* if a variable is not specified, its value is assumed to be;

* the sample mean. in this case, the only variable i am not;

* listing is mean log income;

prchange, x(age=40 black=0 hispanic=0 hsgrad=0 somecol=0 worka=0);

* using a wald test, test the null hypothesis that;

* all the education coefficients are zero;

test hsgrad somecol college;

* how to run the same tets with a -2 log like test;

* estimate the unresticted model and save the estimates ;

* in urmodel;

probit smoker age incomel male black hispanic

hsgrad somecol college worka;

estimates store urmodel;

* estimate the restricted model. save results in rmodel;

probit smoker age incomel male black hispanic

worka;

estimates store rmodel;

lrtest urmodel rmodel;

* run logit model;

logit smoker age incomel male black hispanic

hsgrad somecol college worka;

* ask for marginal effects/treatment effects;

* logit model;

mfx compute;

log close;

STATA Results for Probit/Logit Models

workplace.log

------------------------------------------------------------------------------

log: c:\bill\jpsm\workplace1.log

log type: text

opened on: 4 Nov 2004, 07:29:21

. * use the workplace data set;

. use c:\bill\jpsm\workplace1;

. * print out variable labels;

. desc;

Contains data from c:\bill\jpsm\workplace1.dta

obs: 16,258

vars: 10 28 Oct 2004 05:27

size: 325,160 (96.9% of memory free)

------------------------------------------------------------------------------

> -

storage display value

variable name type format label variable label

------------------------------------------------------------------------------

> -

smoker byte %9.0g is current smoking

worka byte %9.0g has workplace smoking bans

age byte %9.0g age in years

male byte %9.0g male

black byte %9.0g black

hispanic byte %9.0g hispanic

incomel float %9.0g log income

hsgrad byte %9.0g is hs graduate

somecol byte %9.0g has some college

college float %9.0g

------------------------------------------------------------------------------

> -

Sorted by:

. * get summary statistics;

. sum;

Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------

smoker | 16258 .25163 .433963 0 1

worka | 16258 .6851396 .4644745 0 1

age | 16258 38.54742 11.96189 18 87

male | 16258 .3947595 .488814 0 1

black | 16258 .1119449 .3153083 0 1

-------------+--------------------------------------------------------

hispanic | 16258 .0607086 .2388023 0 1

incomel | 16258 10.42097 .7624525 6.214608 11.22524

hsgrad | 16258 .3355271 .4721889 0 1

somecol | 16258 .2685447 .4432161 0 1

college | 16258 .3293763 .4700012 0 1

. * run a linear probability model for comparison purposes;

. * estimate white standard errors to control for heteroskedasticity;

. reg smoker age incomel male black hispanic

> hsgrad somecol college worka, robust;

Regression with robust standard errors Number of obs = 16258

F( 9, 16248) = 99.26

Prob > F = 0.0000

R-squared = 0.0488

Root MSE = .42336

------------------------------------------------------------------------------

| Robust

smoker | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

age | -.0004776 .0002806 -1.70 0.089 -.0010276 .0000725

incomel | -.0287361 .0047823 -6.01 0.000 -.03811 -.0193621

male | .0168615 .0069542 2.42 0.015 .0032305 .0304926

black | -.0356723 .0110203 -3.24 0.001 -.0572732 -.0140714

hispanic | -.070582 .0136691 -5.16 0.000 -.097375 -.043789

hsgrad | -.0661429 .0162279 -4.08 0.000 -.0979514 -.0343345

somecol | -.1312175 .0164726 -7.97 0.000 -.1635056 -.0989293

college | -.2406109 .0162568 -14.80 0.000 -.272476 -.2087459

worka | -.066076 .0074879 -8.82 0.000 -.080753 -.051399

_cons | .7530714 .0494255 15.24 0.000 .6561919 .8499509

------------------------------------------------------------------------------

. * run probit model;

. probit smoker age incomel male black hispanic

> hsgrad somecol college worka;

Iteration 0: log likelihood = -9171.443

Iteration 1: log likelihood = -8764.068

Iteration 2: log likelihood = -8761.7211

Iteration 3: log likelihood = -8761.7208

Probit estimates Number of obs = 16258

LR chi2(9) = 819.44

Prob > chi2 = 0.0000

Log likelihood = -8761.7208 Pseudo R2 = 0.0447

------------------------------------------------------------------------------

smoker | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

age | -.0012684 .0009316 -1.36 0.173 -.0030943 .0005574

incomel | -.092812 .0151496 -6.13 0.000 -.1225047 -.0631193

male | .0533213 .0229297 2.33 0.020 .0083799 .0982627

black | -.1060518 .034918 -3.04 0.002 -.17449 -.0376137

hispanic | -.2281468 .0475128 -4.80 0.000 -.3212701 -.1350235

hsgrad | -.1748765 .0436392 -4.01 0.000 -.2604078 -.0893453

somecol | -.363869 .0451757 -8.05 0.000 -.4524118 -.2753262

college | -.7689528 .0466418 -16.49 0.000 -.860369 -.6775366

worka | -.2093287 .0231425 -9.05 0.000 -.2546873 -.1639702

_cons | .870543 .154056 5.65 0.000 .5685989 1.172487

------------------------------------------------------------------------------

. *predict probability of smoking;

. predict pred_prob_smoke;

(option p assumed; Pr(smoker))

. * get detailed descriptive data about predicted prob;

. sum pred_prob, detail;

Pr(smoker)

-------------------------------------------------------------

Percentiles Smallest

1% .0959301 .0615221

5% .1155022 .0622963

10% .1237434 .0633929 Obs 16258

25% .1620851 .0733495 Sum of Wgt. 16258

50% .2569962 Mean .2516653

Largest Std. Dev. .0960007

75% .3187975 .5619798

90% .3795704 .5655878 Variance .0092161

95% .4039573 .5684112 Skewness .1520254

99% .4672697 .6203823 Kurtosis 2.149247

. * predict binary outcome with 50% cutoff;

. gen pred_smoke1=pred_prob_smoke>=.5;

. label variable pred_smoke1 "predicted smoking, 50% cutoff";

. * compare actual values;

. tab smoker pred_smoke1, row col cell;

+-------------------+

| Key |

|-------------------|

| frequency |

| row percentage |

| column percentage |

| cell percentage |

+-------------------+

| predicted smoking,

is current | 50% cutoff

smoking | 0 1 | Total

-----------+----------------------+----------

0 | 12,153 14 | 12,167

| 99.88 0.12 | 100.00

| 74.93 35.90 | 74.84

| 74.75 0.09 | 74.84

-----------+----------------------+----------

1 | 4,066 25 | 4,091

| 99.39 0.61 | 100.00

| 25.07 64.10 | 25.16

| 25.01 0.15 | 25.16

-----------+----------------------+----------

Total | 16,219 39 | 16,258

| 99.76 0.24 | 100.00

| 100.00 100.00 | 100.00

| 99.76 0.24 | 100.00

. * ask for marginal effects/treatment effects;

. mfx compute;

Marginal effects after probit

y = Pr(smoker) (predict)

= .24093439

------------------------------------------------------------------------------

variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X

---------+--------------------------------------------------------------------

age | -.0003951 .00029 -1.36 0.173 -.000964 .000174 38.5474

incomel | -.0289139 .00472 -6.13 0.000 -.03816 -.019668 10.421

male*| .0166757 .0072 2.32 0.021 .002568 .030783 .39476

black*| -.0320621 .01023 -3.13 0.002 -.052111 -.012013 .111945

hispanic*| -.0658551 .01259 -5.23 0.000 -.090536 -.041174 .060709

hsgrad*| -.053335 .01302 -4.10 0.000 -.07885 -.02782 .335527

somecol*| -.1062358 .01228 -8.65 0.000 -.130308 -.082164 .268545

college*| -.2149199 .01146 -18.76 0.000 -.237378 -.192462 .329376

worka*| -.0668959 .00756 -8.84 0.000 -.08172 -.052072 .68514

------------------------------------------------------------------------------

(*) dy/dx is for discrete change of dummy variable from 0 to 1

. * the same type of variables can be produced with;

. * prchange. this command is however more flexible;

. * in that you can change the reference individual;

. prchange, help;

probit: Changes in Predicted Probabilities for smoker

min->max 0->1 -+1/2 -+sd/2 MargEfct

age -0.0269 -0.0004 -0.0004 -0.0047 -0.0004

incomel -0.1589 -0.0361 -0.0289 -0.0220 -0.0289

male 0.0167 0.0167 0.0166 0.0081 0.0166

black -0.0321 -0.0321 -0.0330 -0.0104 -0.0330

hispanic -0.0659 -0.0659 -0.0710 -0.0170 -0.0711

hsgrad -0.0533 -0.0533 -0.0544 -0.0257 -0.0545

somecol -0.1062 -0.1062 -0.1130 -0.0502 -0.1134

college -0.2149 -0.2149 -0.2366 -0.1123 -0.2396

worka -0.0669 -0.0669 -0.0652 -0.0303 -0.0652

0 1

Pr(y|x) 0.7591 0.2409

age incomel male black hispanic hsgrad somecol

x= 38.5474 10.421 .39476 .111945 .060709 .335527 .268545

sd(x)= 11.9619 .762452 .488814 .315308 .238802 .472189 .443216

college worka

x= .329376 .68514

sd(x)= .470001 .464475

Pr(y|x): probability of observing each y for specified x values

Avg|Chg|: average of absolute value of the change across categories

Min->Max: change in predicted probability as x changes from its minimum to

its maximum

0->1: change in predicted probability as x changes from 0 to 1

-+1/2: change in predicted probability as x changes from 1/2 unit below

base value to 1/2 unit above

-+sd/2: change in predicted probability as x changes from 1/2 standard

dev below base to 1/2 standard dev above

MargEfct: the partial derivative of the predicted probability/rate with

respect to a given independent variable

. * get marginal effect/treatment effects for specific person;

. * male, age 40, college educ, white, without workplace smoking ban;

. * if a variable is not specified, its value is assumed to be;

. * the sample mean. in this case, the only variable i am not;

. * listing is mean log income;

. prchange, x(age=40 black=0 hispanic=0 hsgrad=0 somecol=0 worka=0);

probit: Changes in Predicted Probabilities for smoker

min->max 0->1 -+1/2 -+sd/2 MargEfct

age -0.0323 -0.0005 -0.0005 -0.0056 -0.0005

incomel -0.1795 -0.0320 -0.0344 -0.0263 -0.0345

male 0.0198 0.0198 0.0198 0.0097 0.0198

black -0.0385 -0.0385 -0.0394 -0.0124 -0.0394

hispanic -0.0804 -0.0804 -0.0845 -0.0202 -0.0847

hsgrad -0.0625 -0.0625 -0.0648 -0.0306 -0.0649

somecol -0.1235 -0.1235 -0.1344 -0.0598 -0.1351

college -0.2644 -0.2644 -0.2795 -0.1335 -0.2854

worka -0.0742 -0.0742 -0.0776 -0.0361 -0.0777

0 1

Pr(y|x) 0.6479 0.3521

age incomel male black hispanic hsgrad somecol

x= 40 10.421 .39476 0 0 0 0

sd(x)= 11.9619 .762452 .488814 .315308 .238802 .472189 .443216

college worka

x= .329376 0

sd(x)= .470001 .464475

. * using a wald test, test the null hypothesis that;

. * all the education coefficients are zero;

. test hsgrad somecol college;

( 1) hsgrad = 0

( 2) somecol = 0

( 3) college = 0

chi2( 3) = 504.78

Prob > chi2 = 0.0000

. * how to run the same tets with a -2 log like test;

. * estimate the unresticted model and save the estimates ;

. * in urmodel;

. probit smoker age incomel male black hispanic

> hsgrad somecol college worka;

Iteration 0: log likelihood = -9171.443

Iteration 1: log likelihood = -8764.068

Iteration 2: log likelihood = -8761.7211

Iteration 3: log likelihood = -8761.7208

Probit estimates Number of obs = 16258

LR chi2(9) = 819.44

Prob > chi2 = 0.0000

Log likelihood = -8761.7208 Pseudo R2 = 0.0447

------------------------------------------------------------------------------

smoker | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

age | -.0012684 .0009316 -1.36 0.173 -.0030943 .0005574

incomel | -.092812 .0151496 -6.13 0.000 -.1225047 -.0631193

male | .0533213 .0229297 2.33 0.020 .0083799 .0982627

black | -.1060518 .034918 -3.04 0.002 -.17449 -.0376137

hispanic | -.2281468 .0475128 -4.80 0.000 -.3212701 -.1350235

hsgrad | -.1748765 .0436392 -4.01 0.000 -.2604078 -.0893453

somecol | -.363869 .0451757 -8.05 0.000 -.4524118 -.2753262

college | -.7689528 .0466418 -16.49 0.000 -.860369 -.6775366

worka | -.2093287 .0231425 -9.05 0.000 -.2546873 -.1639702

_cons | .870543 .154056 5.65 0.000 .5685989 1.172487

------------------------------------------------------------------------------

. estimates store urmodel;

. * estimate the restricted model. save results in rmodel;

. probit smoker age incomel male black hispanic

> worka;

Iteration 0: log likelihood = -9171.443

Iteration 1: log likelihood = -9022.2473

Iteration 2: log likelihood = -9022.1031

Probit estimates Number of obs = 16258

LR chi2(6) = 298.68

Prob > chi2 = 0.0000

Log likelihood = -9022.1031 Pseudo R2 = 0.0163

------------------------------------------------------------------------------

smoker | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

age | .0003514 .0009163 0.38 0.701 -.0014445 .0021473

incomel | -.1802868 .0143242 -12.59 0.000 -.2083617 -.152212

male | -.0117546 .0223519 -0.53 0.599 -.0555635 .0320543

black | -.0650982 .0345516 -1.88 0.060 -.1328181 .0026217

hispanic | -.152071 .0465132 -3.27 0.001 -.2432351 -.0609069

worka | -.2501544 .0227794 -10.98 0.000 -.2948012 -.2055076

_cons | 1.37729 .1472574 9.35 0.000 1.08867 1.665909

------------------------------------------------------------------------------

. estimates store rmodel;

. lrtest urmodel rmodel;

likelihood-ratio test LR chi2(3) = 520.76

(Assumption: rmodel nested in urmodel) Prob > chi2 = 0.0000

. * run logit model;

. logit smoker age incomel male black hispanic

> hsgrad somecol college worka;

Iteration 0: log likelihood = -9171.443

Iteration 1: log likelihood = -8770.6512

Iteration 2: log likelihood = -8760.9282

Iteration 3: log likelihood = -8760.9112

Logit estimates Number of obs = 16258

LR chi2(9) = 821.06

Prob > chi2 = 0.0000

Log likelihood = -8760.9112 Pseudo R2 = 0.0448

------------------------------------------------------------------------------

smoker | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

age | -.0026236 .0015594 -1.68 0.092 -.0056799 .0004327

incomel | -.1518663 .0251899 -6.03 0.000 -.2012376 -.102495

male | .0942472 .0390171 2.42 0.016 .0177751 .1707192

black | -.196468 .0598366 -3.28 0.001 -.3137456 -.0791904

hispanic | -.4024453 .0825043 -4.88 0.000 -.5641507 -.2407399

hsgrad | -.2906189 .0707661 -4.11 0.000 -.429318 -.1519199

somecol | -.6092455 .073822 -8.25 0.000 -.7539339 -.4645571

college | -1.325203 .0780572 -16.98 0.000 -1.478192 -1.172214

worka | -.3508271 .0389286 -9.01 0.000 -.4271257 -.2745285

_cons | 1.467936 .255991 5.73 0.000 .9662025 1.969669

------------------------------------------------------------------------------

. * ask for marginal effects/treatment effects;

. * logit model;

. mfx compute;

Marginal effects after logit

y = Pr(smoker) (predict)

= .23812502

------------------------------------------------------------------------------

variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X

---------+--------------------------------------------------------------------

age | -.000476 .00028 -1.68 0.092 -.00103 .000078 38.5474

incomel | -.0275518 .00457 -6.03 0.000 -.0365 -.018604 10.421

male*| .0171866 .00715 2.40 0.016 .003174 .0312 .39476

black*| -.0342102 .00998 -3.43 0.001 -.053765 -.014655 .111945

hispanic*| -.0661959 .01217 -5.44 0.000 -.090044 -.042347 .060709

hsgrad*| -.0513887 .01219 -4.22 0.000 -.075278 -.0275 .335527

somecol*| -.102284 .01141 -8.97 0.000 -.124644 -.079924 .268545

college*| -.2120833 .0108 -19.64 0.000 -.233248 -.190919 .329376

worka*| -.0657566 .0075 -8.76 0.000 -.080464 -.05105 .68514

------------------------------------------------------------------------------

(*) dy/dx is for discrete change of dummy variable from 0 to 1

. log close;

log: c:\bill\jpsm\workplace1.log

log type: text

closed on: 4 Nov 2004, 07:30:16

------------------------------------------------------------------------------

STATA Program for Odds Ratio in Logit Models

natal95.do

* this data set is a small .005 % random sample;

* of observations from the 1995 natality detail;

* data. we will examine the impack of smoking:

* on birth weight. two large states, NY and CA, do not;

* record mothers smoking status. therefore, of the ;

* 4 million births in the US, only 3 million have all;

* the necessary data so there should be 3 million*.005;

* or roughly 15,000 obs;

* set semi colon as the end of line;

# delimit;

* ask it NOT to pause;

set more off;

* open log file;

log using c:\bill\jpsm\natal95.log,replace;

* use the natality detail data set;

use c:\bill\jpsm\natal95;

* print out variable labels;

desc;

* construct indicator for low birth weight;

gen lowbw=birthw -

storage display value

variable name type format label variable label

------------------------------------------------------------------------------

> -

birthw int %9.0g birth weight in grams

smoked byte %9.0g =1 if mom smoked during

pregnancy

age byte %9.0g moms age at birth

married byte %9.0g =1 if married

race4 byte %9.0g 1=white,2=black,3=asian,4=other

educ5 byte %9.0g 1=0-8, 2=9-11, 3=12, 4=13-15,

5=16+

visits byte %9.0g prenatal visits

------------------------------------------------------------------------------

> -

Sorted by:

. * construct indicator for low birth weight;

. gen lowbw=birthw|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

smoked | .6740651 .0897869 7.51 0.000 .4980861 .8500441

age | .0080537 .006791 1.19 0.236 -.0052564 .0213638

married | -.3954044 .0882471 -4.48 0.000 -.5683654 -.2224433

_Ieduc5_2 | -.1949335 .1626502 -1.20 0.231 -.5137221 .1238551

_Ieduc5_3 | -.1925099 .1543239 -1.25 0.212 -.4949791 .1099594

_Ieduc5_4 | -.4057382 .1676759 -2.42 0.016 -.7343769 -.0770994

_Ieduc5_5 | -.3569715 .1780322 -2.01 0.045 -.7059081 -.0080349

_Irace4_2 | .7072894 .0875125 8.08 0.000 .5357681 .8788107

_Irace4_3 | .386623 .307062 1.26 0.208 -.2152075 .9884535

_Irace4_4 | .3095536 .2047899 1.51 0.131 -.0918271 .7109344

_cons | -2.755971 .2104916 -13.09 0.000 -3.168527 -2.343415

------------------------------------------------------------------------------

. * get marginal effects;

. mfx compute;

Marginal effects after logit

y = Pr(lowbw) (predict)

= .05465609

------------------------------------------------------------------------------

variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X

---------+--------------------------------------------------------------------

smoked*| .0436744 .00706 6.18 0.000 .029834 .057514 .136683

age | .0004161 .00035 1.19 0.236 -.000271 .001104 26.6564

married*| -.0218806 .0052 -4.21 0.000 -.032074 -.011687 .683204

_Ieduc~2*| -.0095123 .00749 -1.27 0.204 -.024188 .005164 .165495

_Ieduc~3*| -.0096965 .00758 -1.28 0.201 -.024554 .005161 .345397

_Ieduc~4*| -.0190499 .00714 -2.67 0.008 -.033043 -.005057 .22319

_Ieduc~5*| -.0169077 .00771 -2.19 0.028 -.032027 -.001788 .216093

_Irace~2*| .0453844 .00675 6.72 0.000 .032148 .058621 .17168

_Irace~3*| .0236917 .02204 1.07 0.282 -.019506 .06689 .010401

_Irace~4*| .018225 .01363 1.34 0.181 -.008488 .044938 .031694

------------------------------------------------------------------------------

(*) dy/dx is for discrete change of dummy variable from 0 to 1

. * run a logit but report the odds ratios instead;

. xi: logistic lowbw smoked age married i.educ5 i.race4;

i.educ5 _Ieduc5_1-5 (naturally coded; _Ieduc5_1 omitted)

i.race4 _Irace4_1-4 (naturally coded; _Irace4_1 omitted)

Logistic regression Number of obs = 14230

LR chi2(10) = 214.10

Prob > chi2 = 0.0000

Log likelihood = -3136.9912 Pseudo R2 = 0.0330

------------------------------------------------------------------------------

lowbw | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

smoked | 1.962198 .1761796 7.51 0.000 1.645569 2.33975

age | 1.008086 .0068459 1.19 0.236 .9947574 1.021594

married | .6734077 .0594262 -4.48 0.000 .5664506 .8005604

_Ieduc5_2 | .8228894 .1338431 -1.20 0.231 .5982646 1.131852

_Ieduc5_3 | .8248862 .1272996 -1.25 0.212 .6095837 1.116233

_Ieduc5_4 | .6664847 .1117534 -2.42 0.016 .4798043 .9257979

_Ieduc5_5 | .6997924 .1245856 -2.01 0.045 .4936601 .9919973

_Irace4_2 | 2.028485 .1775178 8.08 0.000 1.70876 2.408034

_Irace4_3 | 1.472001 .4519957 1.26 0.208 .8063741 2.687076

_Irace4_4 | 1.362817 .2790911 1.51 0.131 .9122628 2.035893

------------------------------------------------------------------------------

. log close;

log: c:\bill\jpsm\natal95.log

log type: text

closed on: 4 Nov 2004, 05:48:39

------------------------------------------------------------------------------

STATA Program for Ordered Probit Models

sr_health_status.do

* this data for this example are adults, 18-64;

* who answered the cancer control supplement to;

* the 1994 national health interview survey;

* the key outcome is self reported health status;

* coded 1-5, poor, fair, good, very good, excellent;

* a ke covariate is current smoking status and whether;

* one smoked 5 years ago;

# delimit;

set memory 20m;

set matsize 200;

set more off;

log using c:\bill\jpsm\sr_health_status.log,replace;

* load up sas data set;

use c:\bill\jpsm\sr_health_status;

* get contents of data file;

desc;

* get summary statistics;

sum;

* get tabulation of sr_health;

tab sr_health;

* run OLS models, just to look at the raw correlations in data;

reg sr_health male age educ famincl black othrace smoke smoke5;

* do ordered probit, self reported health status;

oprobit sr_health male age educ famincl black othrace smoke smoke5;

* get marginal effects, evaluated at y=5 (excellent);

mfx compute, predict(outcome(5));

* get marginal effects, evaluated at y=3 (good);

mfx compute, predict(outcome(3));

* use prchange, evaluate marginal effects for;

* 40 year old white female with a college degree;

* never smoked with average log income;

prchange, x(age=40 black=0 othrace=0 smoke=0 smoke5=0 educ=16);

log close;

STATA Results for Ordered Probit Models

sr_health_status.log

------------------------------------------------------------------------------

log: c:\bill\iadb\sr_health_status.log

log type: text

opened on: 1 Nov 2004, 12:06:56

. * load up sas data set;

. use sr_health_status;

. * get contents of data file;

. desc;

Contains data from sr_health_status.dta

obs: 12,900

vars: 9 1 Nov 2004 11:51

size: 322,500 (98.5% of memory free)

------------------------------------------------------------------------------

> -

storage display value

variable name type format label variable label

------------------------------------------------------------------------------

> -

male byte %9.0g =1 if male

age byte %9.0g age in years

educ byte %9.0g years of education

smoke byte %9.0g current smoker

smoke5 byte %9.0g smoked in past 5 years

black float %9.0g =1 if respondent is black

othrace float %9.0g =1 if other race (white is ref)

sr_health float %9.0g 1-5 self reported health,

5=excel, 1=poor

famincl float %9.0g log family income

------------------------------------------------------------------------------

> -

Sorted by:

. * get summary statistics;

. sum;

Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------

male | 12900 .438062 .4961681 0 1

age | 12900 39.84124 11.60603 21 64

educ | 12900 13.24016 2.73325 0 18

smoke | 12900 .2891473 .453384 0 1

smoke5 | 12900 .0813953 .2734519 0 1

-------------+--------------------------------------------------------

black | 12900 .1242636 .3298948 0 1

othrace | 12900 .0412403 .1988532 0 1

sr_health | 12900 3.888992 1.063713 1 5

famincl | 12900 10.21313 .95086 6.214608 11.22524

. * get tabulation of sr_health;

. tab sr_health;

1-5 self |

reported |

health, |

5=excel, |

1=poor | Freq. Percent Cum.

------------+-----------------------------------

1 | 342 2.65 2.65

2 | 991 7.68 10.33

3 | 3,068 23.78 34.12

4 | 3,855 29.88 64.00

5 | 4,644 36.00 100.00

------------+-----------------------------------

Total | 12,900 100.00

. * run OLS models, just to look at the raw correlations in data;

. reg sr_health male age educ famincl black othrace smoke smoke5;

Source | SS df MS Number of obs = 12900

-------------+------------------------------ F( 8, 12891) = 350.85

Model | 2609.62058 8 326.202572 Prob > F = 0.0000

Residual | 11985.4163 12891 .929750704 R-squared = 0.1788

-------------+------------------------------ Adj R-squared = 0.1783

Total | 14595.0369 12899 1.13148592 Root MSE = .96424

------------------------------------------------------------------------------

sr_health | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

male | .1033877 .0172399 6.00 0.000 .0695949 .1371804

age | -.0189687 .0007472 -25.39 0.000 -.0204333 -.0175041

educ | .074539 .0033897 21.99 0.000 .0678946 .0811833

famincl | .2299388 .0099542 23.10 0.000 .2104271 .2494504

black | -.2127016 .0265726 -8.00 0.000 -.2647878 -.1606153

othrace | -.2120907 .0429632 -4.94 0.000 -.2963049 -.1278765

smoke | -.1800193 .0196221 -9.17 0.000 -.2184815 -.1415572

smoke5 | -.1356116 .0317119 -4.28 0.000 -.1977716 -.0734515

_cons | 1.362405 .1005616 13.55 0.000 1.165289 1.55952

------------------------------------------------------------------------------

. * do ordered probit, self reported health status;

. oprobit sr_health male age educ famincl black othrace smoke smoke5;

Iteration 0: log likelihood = -17591.791

Iteration 1: log likelihood = -16403.785

Iteration 2: log likelihood = -16401.987

Iteration 3: log likelihood = -16401.987

Ordered probit estimates Number of obs = 12900

LR chi2(8) = 2379.61

Prob > chi2 = 0.0000

Log likelihood = -16401.987 Pseudo R2 = 0.0676

------------------------------------------------------------------------------

sr_health | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

male | .1281241 .0195747 6.55 0.000 .0897583 .1664899

age | -.0202308 .0008499 -23.80 0.000 -.0218966 -.018565

educ | .0827086 .0038547 21.46 0.000 .0751535 .0902637

famincl | .2398957 .0112206 21.38 0.000 .2179037 .2618878

black | -.221508 .029528 -7.50 0.000 -.2793818 -.1636341

othrace | -.2425083 .0480047 -5.05 0.000 -.3365958 -.1484208

smoke | -.2086096 .0219779 -9.49 0.000 -.2516855 -.1655337

smoke5 | -.1529619 .0357995 -4.27 0.000 -.2231277 -.0827961

-------------+----------------------------------------------------------------

_cut1 | .4858634 .113179 (Ancillary parameters)

_cut2 | 1.269036 .11282

_cut3 | 2.247251 .1138171

_cut4 | 3.094606 .1145781

------------------------------------------------------------------------------

. * get marginal effects, evaluated at y=5 (excellent);

. mfx compute, predict(outcome(5));

Marginal effects after oprobit

y = Pr(sr_health==5) (predict, outcome(5))

= .34103717

------------------------------------------------------------------------------

variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X

---------+--------------------------------------------------------------------

male*| .0471251 .00722 6.53 0.000 .03298 .06127 .438062

age | -.0074214 .00031 -23.77 0.000 -.008033 -.00681 39.8412

educ | .0303405 .00142 21.42 0.000 .027565 .033116 13.2402

famincl | .0880025 .00412 21.37 0.000 .07993 .096075 10.2131

black*| -.0781411 .00996 -7.84 0.000 -.097665 -.058617 .124264

othrace*| -.0843227 .01567 -5.38 0.000 -.115043 -.053602 .04124

smoke*| -.0749785 .00773 -9.71 0.000 -.09012 -.059837 .289147

smoke5*| -.0545062 .01235 -4.41 0.000 -.078719 -.030294 .081395

------------------------------------------------------------------------------

(*) dy/dx is for discrete change of dummy variable from 0 to 1

. * get marginal effects, evaluated at y=3 (good);

. mfx compute, predict(outcome(3));

Marginal effects after oprobit

y = Pr(sr_health==3) (predict, outcome(3))

= .25239744

------------------------------------------------------------------------------

variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X

---------+--------------------------------------------------------------------

male*| -.0276959 .00425 -6.51 0.000 -.036029 -.019363 .438062

age | .0043717 .0002 21.81 0.000 .003979 .004765 39.8412

educ | -.0178727 .00089 -20.02 0.000 -.019623 -.016123 13.2402

famincl | -.0518395 .00261 -19.85 0.000 -.056959 -.04672 10.2131

black*| .0464219 .00599 7.75 0.000 .034675 .058169 .124264

othrace*| .0501493 .00934 5.37 0.000 .031834 .068464 .04124

smoke*| .0443735 .00464 9.56 0.000 .035272 .053476 .289147

smoke5*| .0323707 .00739 4.38 0.000 .017882 .04686 .081395

------------------------------------------------------------------------------

(*) dy/dx is for discrete change of dummy variable from 0 to 1

. * use prchange, evaluate marginal effects for;

. * 40 year old white female with a college degree;

. * never smoked with average log income;

. prchange, x(age=40 black=0 othrace=0 smoke=0 smoke5=0 educ=16);

oprobit: Changes in Predicted Probabilities for sr_health

male

Avg|Chg| 1 2 3 4

0->1 .0203868 -.0020257 -.00886671 -.02677558 -.01329902

5

0->1 .05096698

age

Avg|Chg| 1 2 3 4

Min->Max .13358317 .0184785 .06797072 .17686112 .07064757

-+1/2 .00321942 .00032518 .00141642 .00424452 .00206241

-+sd/2 .03728014 .00382077 .01648743 .04910323 .0237889

MargEfct .00321947 .00032515 .00141639 .00424462 .00206252

5

Min->Max -.33395794

-+1/2 -.00804856

-+sd/2 -.09320036

MargEfct -.00804868

educ

Avg|Chg| 1 2 3 4

Min->Max .21397413 -.10945692 -.19725057 -.22822781 .07974288

-+1/2 .01315829 -.00133136 -.00579271 -.01734608 -.00842556

-+sd/2 .03589903 -.0036753 -.01587057 -.04728749 -.02291423

MargEfct .01316202 -.0013293 -.00579057 -.01735309 -.00843208

5

Min->Max .45519245

-+1/2 .03289571

-+sd/2 .08974758

MargEfct .03290504

famincl

Avg|Chg| 1 2 3 4

Min->Max .16759798 -.05486112 -.13623201 -.22790183 .00276569

-+1/2 .03808549 -.00390581 -.01684746 -.05016185 -.02429861

-+sd/2 .03622223 -.0037093 -.01601486 -.04771243 -.02311897

MargEfct .03817633 -.00385563 -.0167955 -.05033251 -.02445719

5

Min->Max .41622926

-+1/2 .09521371

-+sd/2 .09055558

MargEfct .09544083

black

Avg|Chg| 1 2 3 4

0->1 .03467907 .00473166 .01835598 .04779626 .01581377

5

0->1 -.08669767

othrace

Avg|Chg| 1 2 3 4

0->1 .03787661 .00532324 .02040636 .05239134 .0165706

5

0->1 -.09469151

smoke

Avg|Chg| 1 2 3 4

0->1 .03270518 .00438228 .01712416 .04497364 .01528287

5

0->1 -.08176297

smoke5

Avg|Chg| 1 2 3 4

0->1 .02411037 .00299019 .012047 .03281575 .01242298

5

0->1 -.06027591

1 2 3 4 5

Pr(y|x) .00563112 .03431748 .17979275 .30986777 .47039089

male age educ famincl black othrace smoke

x= .438062 40 16 10.2131 0 0 0

sd(x)= .496168 11.606 2.73325 .95086 .329895 .198853 .453384

smoke5

x= 0

sd(x)= .273452

. log close;

log: c:\bill\iadb\sr_health_status.log

log type: text

closed on: 1 Nov 2004, 12:07:40

------------------------------------------------------------------------------

STATA Program for Count Data Models

drvisits.do

* drvisits.do;

* this program estimates a poisson and negative binomial;

* count data model. teh data inclused people aged 65+;

* from the 1987 nmes data set. dr visits are annual;

* this line defines the semicolon as the line delimiter;

# delimit ;

* set memork for 10 meg;

set memory 10m;

* open output file;

log using c:\bill\jpsm\drvisits.log,replace;

* open stata data set;

use c:\bill\jpsm\drvisits;

* generate new variables;

gen incomel=ln(income);

* get distribution of dr visits;

tabulate drvisits;

* get descriptive statistics;

sum;

* run poisson regression;

poisson drvisits age65 age70 age75 age80 chronic excel good fair female

black hispanic hs_drop hs_grad mcaid incomel;

* run neg binomial regression;

nbreg drvisits age65 age70 age75 age80 chronic excel good fair female

black hispanic hs_drop hs_grad mcaid incomel, dispersion(constant);

log close;

STATA Results for Count Data Models

drvisits.log

------------------------------------------------------------------------------

log: C:\bill\stata\drvisits.log

log type: text

opened on: 28 Oct 2004, 13:44:05

. * open stata data set;

. use drvisits;

. * generate new variables;

. gen incomel=ln(income);

(28 missing values generated)

. * get distribution of dr visits;

. tabulate drvisits;

annual doc |

visits | Freq. Percent Cum.

------------+-----------------------------------

0 | 915 17.18 17.18

1 | 601 11.28 28.46

2 | 533 10.01 38.46

3 | 503 9.44 47.91

4 | 450 8.45 56.35

5 | 391 7.34 63.69

6 | 319 5.99 69.68

7 | 258 4.84 74.53

8 | 216 4.05 78.58

9 | 192 3.60 82.19

10 | 147 2.76 84.94

11 | 123 2.31 87.25

12 | 99 1.86 89.11

13 | 81 1.52 90.63

14 | 80 1.50 92.13

15 | 66 1.24 93.37

16 | 56 1.05 94.42

17 | 56 1.05 95.48

18 | 34 0.64 96.11

19 | 26 0.49 96.60

20 | 17 0.32 96.92

21 | 21 0.39 97.32

22 | 20 0.38 97.69

23 | 11 0.21 97.90

24 | 15 0.28 98.18

25 | 4 0.08 98.25

26 | 12 0.23 98.48

27 | 9 0.17 98.65

28 | 6 0.11 98.76

29 | 4 0.08 98.84

30 | 5 0.09 98.93

31 | 6 0.11 99.04

32 | 2 0.04 99.08

33 | 2 0.04 99.12

34 | 3 0.06 99.17

35 | 2 0.04 99.21

36 | 2 0.04 99.25

37 | 4 0.08 99.32

38 | 2 0.04 99.36

39 | 5 0.09 99.46

40 | 2 0.04 99.49

41 | 1 0.02 99.51

42 | 4 0.08 99.59

43 | 2 0.04 99.62

44 | 2 0.04 99.66

47 | 1 0.02 99.68

48 | 2 0.04 99.72

49 | 1 0.02 99.74

50 | 1 0.02 99.76

51 | 1 0.02 99.77

53 | 2 0.04 99.81

55 | 1 0.02 99.83

56 | 1 0.02 99.85

58 | 2 0.04 99.89

61 | 1 0.02 99.91

63 | 1 0.02 99.92

65 | 1 0.02 99.94

66 | 1 0.02 99.96

68 | 1 0.02 99.98

89 | 1 0.02 100.00

------------+-----------------------------------

Total | 5,327 100.00

. * get descriptive statistics;

. sum;

Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------

drvisits | 5327 5.563732 6.676081 0 89

age65 | 5327 .3358363 .4723263 0 1

age70 | 5327 .2802703 .4491734 0 1

age75 | 5327 .2003004 .4002627 0 1

age80 | 5327 .1101934 .31316 0 1

-------------+--------------------------------------------------------

chronic | 5327 .6279332 .4834015 0 1

excel | 5327 .0749014 .263257 0 1

good | 5327 .3792003 .4852336 0 1

fair | 5327 .3305801 .4704662 0 1

hs_drop | 5327 .5029097 .5000385 0 1

-------------+--------------------------------------------------------

hs_grad | 5327 .2922846 .4548551 0 1

black | 5327 .1255866 .331414 0 1

hispanic | 5327 .0324761 .1772774 0 1

female | 5327 .5969589 .4905549 0 1

mcaid | 5327 .1019335 .3025893 0 1

-------------+--------------------------------------------------------

income | 5327 25381.78 28962.69 0 548224

incomel | 5299 9.754733 .8911269 2.639057 13.21444

. * run poisson regression;

. poisson drvisits age65 age70 age75 age80 chronic excel good fair female

> black hispanic hs_drop hs_grad mcaid incomel;

Iteration 0: log likelihood = -22275.374

Iteration 1: log likelihood = -22275.351

Iteration 2: log likelihood = -22275.351

Poisson regression Number of obs = 5299

LR chi2(15) = 3334.46

Prob > chi2 = 0.0000

Log likelihood = -22275.351 Pseudo R2 = 0.0696

------------------------------------------------------------------------------

drvisits | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

age65 | .2144282 .026267 8.16 0.000 .1629458 .2659106

age70 | .286831 .0263077 10.90 0.000 .2352689 .3383931

age75 | .2801504 .0269802 10.38 0.000 .2272702 .3330307

age80 | .24314 .0292045 8.33 0.000 .1859001 .3003798

chronic | .4997173 .0137789 36.27 0.000 .4727111 .5267235

excel | -.7836622 .0305392 -25.66 0.000 -.8435178 -.7238065

good | -.4774853 .0159987 -29.85 0.000 -.5088422 -.4461284

fair | -.2578352 .0155473 -16.58 0.000 -.2883073 -.2273631

female | .0960976 .0123182 7.80 0.000 .0719543 .1202409

black | -.2838081 .0202163 -14.04 0.000 -.3234314 -.2441849

hispanic | -.2051023 .0368764 -5.56 0.000 -.2773788 -.1328258

hs_drop | -.2323802 .016066 -14.46 0.000 -.263869 -.2008914

hs_grad | -.1200559 .016517 -7.27 0.000 -.1524287 -.0876831

mcaid | .1535708 .0203414 7.55 0.000 .1137025 .1934392

incomel | .0211453 .0072946 2.90 0.004 .0068481 .0354425

_cons | 1.348084 .0804659 16.75 0.000 1.190374 1.505795

------------------------------------------------------------------------------

. * run neg binomial regression;

. nbreg drvisits age65 age70 age75 age80 chronic excel good fair female

> black hispanic hs_drop hs_grad mcaid incomel, dispersion(constant);

Fitting Poisson model:

Iteration 0: log likelihood = -22275.374

Iteration 1: log likelihood = -22275.351

Iteration 2: log likelihood = -22275.351

Fitting constant-only model:

Iteration 0: log likelihood = -17434.216

Iteration 1: log likelihood = -15076.44

Iteration 2: log likelihood = -14841.425

Iteration 3: log likelihood = -14840.935

Iteration 4: log likelihood = -14840.935

Fitting full model:

Iteration 0: log likelihood = -14840.935

Iteration 1: log likelihood = -14540.408

Iteration 2: log likelihood = -14519.799

Iteration 3: log likelihood = -14519.721

Iteration 4: log likelihood = -14519.721

Negative binomial (constant dispersion) Number of obs = 5299

LR chi2(15) = 642.43

Prob > chi2 = 0.0000

Log likelihood = -14519.721 Pseudo R2 = 0.0216

------------------------------------------------------------------------------

drvisits | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

age65 | .1034281 .054664 1.89 0.058 -.0037113 .2105675

age70 | .2039634 .0546788 3.73 0.000 .0967949 .3111319

age75 | .2094928 .0560412 3.74 0.000 .0996541 .3193314

age80 | .2227169 .0605925 3.68 0.000 .1039579 .341476

chronic | .5091666 .0292189 17.43 0.000 .4518986 .5664347

excel | -.5272908 .0594584 -8.87 0.000 -.6438271 -.4107545

good | -.3422506 .0353507 -9.68 0.000 -.4115368 -.2729645

fair | -.1526385 .0351632 -4.34 0.000 -.2215571 -.0837198

female | .1321966 .0263028 5.03 0.000 .0806441 .183749

black | -.3300031 .0438969 -7.52 0.000 -.4160395 -.2439668

hispanic | -.1527763 .0763018 -2.00 0.045 -.3023251 -.0032275

hs_drop | -.1912903 .0344335 -5.56 0.000 -.2587787 -.1238018

hs_grad | -.0869843 .0354543 -2.45 0.014 -.1564733 -.0174952

mcaid | .1341325 .0442797 3.03 0.002 .0473459 .2209191

incomel | .0379834 .0155687 2.44 0.015 .0074693 .0684975

_cons | 1.11029 .17092 6.50 0.000 .7752924 1.445287

-------------+----------------------------------------------------------------

/lndelta | 1.65017 .0286445 1.594027 1.706312

-------------+----------------------------------------------------------------

delta | 5.207863 .1491766 4.923538 5.508607

------------------------------------------------------------------------------

Likelihood-ratio test of delta=0: chibar2(01) = 1.6e+04 Prob>=chibar2 = 0.000

. log close;

log: C:\bill\stata\drvisits.log

log type: text

closed on: 28 Oct 2004, 13:44:20

------------------------------------------------------------------------------

Program for Duration Data

Surv_data.do

* this data set has married males, aged 50-70;

* from the nhis multiple cause of death file;

* data is taken from the 1987-1990 nhis;

* surveys. all people are followed for;

* up to 60 months. max_mths is the most;

* people are followed and diedin5;

* indicates whether the person died;

* in five years (60 months);

* set end of line marker;

# delimit;

set more off;

* increase memory;

set memory 20m;

* write results to file;

log using c:\bill\jpsm\surv_data.log,replace;

* load up sas data set;

use c:\bill\jpsm\surv_data;

* get contents of data file;

desc;

* get summary statistics;

sum;

* define the duration data in the analysis;

stset max_mths, failure(diedin5);

* list the kaplan meier survivor function;

sts list;

* you can graph the functions as well;

* output the graphs to a file;

sts graph;

graph save c:\bill\jpsm\graph1.gph, replace;

* you can draw graphs for various subgroups;

* output the graphs to a file;

sts graph, by(educ);

graph save c:\bill\jpsm\graph2.gph, replace;

* run a duration model where the hazard varies across;

* people. first, ask stata to print out the raw;

* coefficients (nohr option), then do default;

* show weibull first, then exponential;

* first, construct dummies for the income and;

* education categories. in the regression statement;

* _Ie star include all variables beginning with _Ie;

* and _Ii star includes all variables starting with;

* _Ii;

xi i.income i.educ;

streg age_s_yrs black hispanic _Ie* _Ii*, d(weibull) nohr;

* now get the hazard ratios where all coefs are raised to;

* exp(b1);

streg age_s_yrs black hispanic _Ie* _Ii*, d(weibull);

* for compairson purposes, look at results from an exponential;

streg age_s_yrs black hispanic _Ie* _Ii*, d(exp) nohr;

streg age_s_yrs black hispanic _Ie* _Ii*, d(exp);

log close;

STATA Results for Duration Data

surv_data.log

------------------------------------------------------------------------------

log: c:\bill\jpsm\surv_data.log

log type: text

opened on: 7 Nov 2004, 06:26:56

. * load up sas data set;

. use c:\bill\jpsm\surv_data;

. * get contents of data file;

. desc;

Contains data from c:\bill\jpsm\surv_data.dta

obs: 26,654

vars: 7 2 Nov 2004 10:59

size: 533,080 (97.5% of memory free)

------------------------------------------------------------------------------

> -

storage display value

variable name type format label variable label

------------------------------------------------------------------------------

> -

age_s_yrs byte %9.0g age in years at the time of

survey

max_mths byte %9.0g max months of followup

black byte %9.0g dummy variable, =1 if black

hispanic byte %9.0g dummy variable, =1 hispanic

income float %9.0g =1 if ]

------------------------------------------------------------------------------

> -

1 26631 38 0 0.9986 0.0002 0.9980 0.999

> 0

2 26593 42 0 0.9970 0.0003 0.9963 0.997

> 6

3 26551 40 0 0.9955 0.0004 0.9946 0.996

> 2

4 26511 49 0 0.9937 0.0005 0.9926 0.994

> 5

5 26462 50 0 0.9918 0.0006 0.9906 0.992

> 8

6 26412 61 0 0.9895 0.0006 0.9882 0.990

> 6

7 26351 45 0 0.9878 0.0007 0.9864 0.989

> 0

8 26306 60 0 0.9855 0.0007 0.9840 0.986

> 9

9 26246 46 0 0.9838 0.0008 0.9822 0.985

> 3

10 26200 42 0 0.9822 0.0008 0.9806 0.983

> 8

11 26158 52 0 0.9803 0.0009 0.9785 0.981

> 9

12 26106 56 0 0.9782 0.0009 0.9764 0.979

> 9

13 26050 53 0 0.9762 0.0009 0.9743 0.978

> 0

14 25997 64 0 0.9738 0.0010 0.9718 0.975

> 6

15 25933 48 0 0.9720 0.0010 0.9699 0.973

> 9

16 25885 49 0 0.9701 0.0010 0.9680 0.972

> 1

17 25836 54 0 0.9681 0.0011 0.9659 0.970

> 2

18 25782 46 0 0.9664 0.0011 0.9642 0.968

> 5

19 25736 51 0 0.9645 0.0011 0.9622 0.966

> 6

20 25685 38 0 0.9631 0.0012 0.9607 0.965

> 2

21 25647 56 0 0.9609 0.0012 0.9586 0.963

> 2

22 25591 51 0 0.9590 0.0012 0.9566 0.961

> 3

23 25540 48 0 0.9572 0.0012 0.9547 0.959

> 6

24 25492 51 0 0.9553 0.0013 0.9528 0.957

> 7

25 25441 59 0 0.9531 0.0013 0.9505 0.955

> 6

26 25382 58 0 0.9509 0.0013 0.9483 0.953

> 5

27 25324 63 0 0.9486 0.0014 0.9458 0.951

> 1

28 25261 50 0 0.9467 0.0014 0.9439 0.949

> 3

29 25211 50 0 0.9448 0.0014 0.9420 0.947

> 5

30 25161 52 0 0.9428 0.0014 0.9400 0.945

> 6

31 25109 60 0 0.9406 0.0014 0.9377 0.943

> 4

32 25049 52 0 0.9386 0.0015 0.9357 0.941

> 5

33 24997 54 0 0.9366 0.0015 0.9336 0.939

> 5

34 24943 56 0 0.9345 0.0015 0.9315 0.937

> 4

35 24887 66 0 0.9320 0.0015 0.9289 0.935

> 0

36 24821 70 0 0.9294 0.0016 0.9263 0.932

> 4

37 24751 45 0 0.9277 0.0016 0.9245 0.930

> 8

38 24706 59 0 0.9255 0.0016 0.9223 0.928

> 6

39 24647 54 0 0.9235 0.0016 0.9202 0.926

> 6

40 24593 48 0 0.9217 0.0016 0.9184 0.924

> 8

41 24545 61 0 0.9194 0.0017 0.9160 0.922

> 6

42 24484 63 0 0.9170 0.0017 0.9136 0.920

> 3

43 24421 56 0 0.9149 0.0017 0.9115 0.918

> 2

44 24365 52 0 0.9130 0.0017 0.9095 0.916

> 3

45 24313 60 0 0.9107 0.0017 0.9072 0.914

> 1

46 24253 56 0 0.9086 0.0018 0.9051 0.912

> 0

47 24197 68 0 0.9060 0.0018 0.9025 0.909

> 5

48 24129 59 0 0.9038 0.0018 0.9002 0.907

> 3

49 24070 57 0 0.9017 0.0018 0.8981 0.905

> 2

50 24013 57 0 0.8996 0.0018 0.8959 0.903

> 1

51 23956 66 0 0.8971 0.0019 0.8934 0.900

> 7

52 23890 57 0 0.8949 0.0019 0.8912 0.898

> 6

53 23833 50 0 0.8931 0.0019 0.8893 0.896

> 7

54 23783 53 0 0.8911 0.0019 0.8873 0.894

> 7

55 23730 64 0 0.8887 0.0019 0.8848 0.892

> 4

56 23666 55 0 0.8866 0.0019 0.8827 0.890

> 3

57 23611 65 0 0.8842 0.0020 0.8803 0.887

> 9

58 23546 66 0 0.8817 0.0020 0.8777 0.885

> 5

59 23480 44 0 0.8800 0.0020 0.8761 0.883

> 9

60 23436 50 2.3e+04 0.8781 0.0020 0.8742 0.8

> 820

------------------------------------------------------------------------------

> -

. * you can graph the functions as well;

. * output the graphs to a file;

. sts graph;

failure _d: diedin5

analysis time _t: max_mths

. graph save c:\bill\jpsm\graph1.gph, replace;

(file c:\bill\jpsm\graph1.gph saved)

. * you can draw graphs for various subgroups;

. * output the graphs to a file;

. sts graph, by(educ);

failure _d: diedin5

analysis time _t: max_mths

. graph save c:\bill\jpsm\graph2.gph, replace;

(file c:\bill\jpsm\graph2.gph saved)

. * run a duration model where the hazard varies across;

. * people. first, ask stata to print out the raw;

. * coefficients (nohr option), then do default;

. * show weibull first, then exponential;

. * first, construct dummies for the income and;

. * education categories. in the regression statement;

. * _Ie star include all variables beginning with _Ie;

. * and _Ii star includes all variables starting with;

. * _Ii;

. xi i.income i.educ;

i.income _Iincome_1-5 (naturally coded; _Iincome_1 omitted)

i.educ _Ieduc_1-4 (naturally coded; _Ieduc_1 omitted)

. streg age_s_yrs black hispanic _Ie* _Ii*, d(weibull) nohr;

failure _d: diedin5

analysis time _t: max_mths

Fitting constant-only model:

Iteration 0: log likelihood = -12759.823

Iteration 1: log likelihood = -12723.121

Iteration 2: log likelihood = -12722.924

Iteration 3: log likelihood = -12722.924

Fitting full model:

Iteration 0: log likelihood = -12722.924

Iteration 1: log likelihood = -12454.553

Iteration 2: log likelihood = -12425.111

Iteration 3: log likelihood = -12425.055

Iteration 4: log likelihood = -12425.055

Weibull regression -- log relative-hazard form

No. of subjects = 26631 Number of obs = 26631

No. of failures = 3245

Time at risk = 1505705

LR chi2(10) = 595.74

Log likelihood = -12425.055 Prob > chi2 = 0.0000

------------------------------------------------------------------------------

_t | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

age_s_yrs | .0452588 .0031592 14.33 0.000 .0390669 .0514508

black | .4770152 .0511122 9.33 0.000 .3768371 .5771932

hispanic | .1333552 .082156 1.62 0.105 -.0276676 .294378

_Ieduc_2 | .0093353 .0591918 0.16 0.875 -.1066786 .1253492

_Ieduc_3 | -.072163 .0503131 -1.43 0.151 -.1707748 .0264488

_Ieduc_4 | -.1301173 .0657131 -1.98 0.048 -.2589126 -.0013221

_Iincome_2 | -.1867752 .0650604 -2.87 0.004 -.3142914 -.0592591

_Iincome_3 | -.3268927 .0688635 -4.75 0.000 -.4618627 -.1919227

_Iincome_4 | -.5166137 .0769202 -6.72 0.000 -.6673747 -.3658528

_Iincome_5 | -.5425447 .0722025 -7.51 0.000 -.684059 -.4010303

_cons | -9.201724 .2266475 -40.60 0.000 -9.645945 -8.757503

-------------+----------------------------------------------------------------

/ln_p | .1585315 .0172241 9.20 0.000 .1247729 .1922901

-------------+----------------------------------------------------------------

p | 1.171789 .020183 1.132891 1.212022

1/p | .8533961 .014699 .8250675 .8826974

------------------------------------------------------------------------------

. * now get the hazard ratios where all coefs are raised to;

. * exp(b1);

. streg age_s_yrs black hispanic _Ie* _Ii*, d(weibull);

failure _d: diedin5

analysis time _t: max_mths

Fitting constant-only model:

Iteration 0: log likelihood = -12759.823

Iteration 1: log likelihood = -12723.121

Iteration 2: log likelihood = -12722.924

Iteration 3: log likelihood = -12722.924

Fitting full model:

Iteration 0: log likelihood = -12722.924

Iteration 1: log likelihood = -12454.553

Iteration 2: log likelihood = -12425.111

Iteration 3: log likelihood = -12425.055

Iteration 4: log likelihood = -12425.055

Weibull regression -- log relative-hazard form

No. of subjects = 26631 Number of obs = 26631

No. of failures = 3245

Time at risk = 1505705

LR chi2(10) = 595.74

Log likelihood = -12425.055 Prob > chi2 = 0.0000

------------------------------------------------------------------------------

_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

age_s_yrs | 1.046299 .0033055 14.33 0.000 1.03984 1.052797

black | 1.611258 .082355 9.33 0.000 1.457667 1.781032

hispanic | 1.142656 .093876 1.62 0.105 .9727116 1.342291

_Ieduc_2 | 1.009379 .059747 0.16 0.875 .8988145 1.133544

_Ieduc_3 | .9303792 .0468103 -1.43 0.151 .8430114 1.026802

_Ieduc_4 | .8779924 .0576956 -1.98 0.048 .7718905 .9986788

_Iincome_2 | .8296302 .0539761 -2.87 0.004 .7303062 .9424625

_Iincome_3 | .7211611 .0496617 -4.75 0.000 .6301089 .8253706

_Iincome_4 | .5965372 .0458858 -6.72 0.000 .5130537 .6936049

_Iincome_5 | .5812672 .041969 -7.51 0.000 .5045648 .6696297

-------------+----------------------------------------------------------------

/ln_p | .1585315 .0172241 9.20 0.000 .1247729 .1922901

-------------+----------------------------------------------------------------

p | 1.171789 .020183 1.132891 1.212022

1/p | .8533961 .014699 .8250675 .8826974

------------------------------------------------------------------------------

. * for compairson purposes, look at results from an exponential;

. streg age_s_yrs black hispanic _Ie* _Ii*, d(exp) nohr;

failure _d: diedin5

analysis time _t: max_mths

Iteration 0: log likelihood = -12759.823

Iteration 1: log likelihood = -12493.913

Iteration 2: log likelihood = -12465.272

Iteration 3: log likelihood = -12465.218

Iteration 4: log likelihood = -12465.218

Exponential regression -- log relative-hazard form

No. of subjects = 26631 Number of obs = 26631

No. of failures = 3245

Time at risk = 1505705

LR chi2(10) = 589.21

Log likelihood = -12465.218 Prob > chi2 = 0.0000

------------------------------------------------------------------------------

_t | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

age_s_yrs | .0450058 .0031587 14.25 0.000 .0388149 .0511968

black | .4739259 .0511077 9.27 0.000 .3737567 .574095

hispanic | .1325028 .0821549 1.61 0.107 -.0285178 .2935235

_Ieduc_2 | .0094567 .0591916 0.16 0.873 -.1065568 .1254701

_Ieduc_3 | -.071804 .0503096 -1.43 0.154 -.170409 .0268011

_Ieduc_4 | -.1293206 .0657092 -1.97 0.049 -.2581081 -.000533

_Iincome_2 | -.1855024 .0650573 -2.85 0.004 -.3130123 -.0579925

_Iincome_3 | -.3244382 .0688567 -4.71 0.000 -.4593948 -.1894816

_Iincome_4 | -.5134143 .0769126 -6.68 0.000 -.6641602 -.3626684

_Iincome_5 | -.5391811 .072196 -7.47 0.000 -.6806827 -.3976795

_cons | -8.491069 .2107085 -40.30 0.000 -8.90405 -8.078088

------------------------------------------------------------------------------

. streg age_s_yrs black hispanic _Ie* _Ii*, d(exp);

failure _d: diedin5

analysis time _t: max_mths

Iteration 0: log likelihood = -12759.823

Iteration 1: log likelihood = -12493.913

Iteration 2: log likelihood = -12465.272

Iteration 3: log likelihood = -12465.218

Iteration 4: log likelihood = -12465.218

Exponential regression -- log relative-hazard form

No. of subjects = 26631 Number of obs = 26631

No. of failures = 3245

Time at risk = 1505705

LR chi2(10) = 589.21

Log likelihood = -12465.218 Prob > chi2 = 0.0000

------------------------------------------------------------------------------

_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

age_s_yrs | 1.046034 .0033041 14.25 0.000 1.039578 1.05253

black | 1.606288 .0820936 9.27 0.000 1.453184 1.775523

hispanic | 1.141682 .0937948 1.61 0.107 .971885 1.341145

_Ieduc_2 | 1.009502 .059754 0.16 0.873 .898924 1.133681

_Ieduc_3 | .9307133 .0468238 -1.43 0.154 .8433198 1.027163

_Ieduc_4 | .8786922 .0577381 -1.97 0.049 .7725117 .9994672

_Iincome_2 | .8306869 .0540422 -2.85 0.004 .731241 .943657

_Iincome_3 | .7229334 .0497788 -4.71 0.000 .6316658 .827388

_Iincome_4 | .5984488 .0460282 -6.68 0.000 .5147056 .6958171

_Iincome_5 | .5832257 .0421066 -7.47 0.000 .5062713 .6718773

------------------------------------------------------------------------------

. log close;

log: c:\bill\jpsm\surv_data.log

log type: text

closed on: 7 Nov 2004, 06:27:08

------------------------------------------------------------------------------

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download