STATA Program for Probit/Logit Models
STATA Program for OLS
cps87_or.do
* the data for this project is a small subsample;
* of full time (30 or more hours) male workers;
* aged 21-64 from the out going rotation;
* samples of the 1987 current population survey;
* this line defines the semicolon as the ;
* end of line delimiter;
# delimit ;
* set memork for 10 meg;
set memory 10m;
* write results to a log file;
* the replace options writes over old;
* log files;
log using cps87_or.log,replace;
* open stata data set;
use c:\bill\stata\cps87_or;
* list variables and labels in data set;
desc;
* generate new variables;
* lines 1-2 illustrate basic math functoins;
* lines 3-4 line illustrate logical operators;
* line 5 illustrate the OR statement;
* line 6 illustrates the AND statement;
* after you construct new variables, compress the data again;
gen age2=age*age;
gen earnwkl=ln(earnwke);
gen union=unionm==1;
gen topcode=earnwke==999;
gen nonwhite=((race==2)|(race==3));
gen big_ne=((region==1)&(smsa==1));
* label the data;
label var age2 "age squared";
label var earnwkl "log earnings per week";
label var topcode "=1 if earnwkl is topcoded";
label var union "1=in union, 0 otherwise";
label var nonwhite "1=nonwhite, 0=white" ;
label var big_ne "1= live in big smsa from northeast, 0=otherwsie";
* get descriptive statistics;
sum;
* get detailed descriptics for continuous variables;
sum earnwke, detail;
* get frequencies of discrete variables;
tabulate unionm;
tabulate race;
* get two-way table of frequencies;
tabulate region smsa, row column cell;
*run simple regression;
reg earnwkl age age2 educ nonwhite union;
* run regression addinf smsa, region and race fixed-effects;
* the xi command constructs the dummies for you;
* the lowest numbered dummy is usually the;
* omitted variable;
xi: reg earnwkl age age2 educ union i.race i.region i.smsa;
more;
* close log file;
log close;
STATA Results for OLS
cps87_do.log
------------------------------------------------------------------------------
log: c:\bill\stata\cps87_or.log
log type: text
opened on: 6 Nov 2004, 08:14:10
. * open stata data set;
. use c:\bill\stata\cps87_or;
. * list variables and labels in data set;
. desc;
Contains data from c:\bill\stata\cps87_or.dta
obs: 19,906
vars: 7 6 Nov 2004 08:11
size: 636,992 (93.9% of memory free)
------------------------------------------------------------------------------
> -
storage display value
variable name type format label variable label
------------------------------------------------------------------------------
> -
age float %9.0g age in years
race float %9.0g 1=white, non-hisp, 2=place,
n.h, 3=hisp
educ float %9.0g years of education
unionm float %9.0g 1=union member, 2=otherwise
smsa float %9.0g 1=live in 19 largest smsa,
2=other smsa, 3=non smsa
region float %9.0g 1=east, 2=midwest, 3=south,
4=west
earnwke float %9.0g usual weekly earnings
------------------------------------------------------------------------------
> -
Sorted by:
. * generate new variables;
. * lines 1-2 illustrate basic math functoins;
. * lines 3-4 line illustrate logical operators;
. * line 5 illustrate the OR statement;
. * line 6 illustrates the AND statement;
. * after you construct new variables, compress the data again;
. gen age2=age*age;
. gen earnwkl=ln(earnwke);
. gen union=unionm==1;
. gen topcode=earnwke==999;
. gen nonwhite=((race==2)|(race==3));
. gen big_ne=((region==1)&(smsa==1));
. * label the data;
. label var age2 "age squared";
. label var earnwkl "log earnings per week";
. label var topcode "=1 if earnwkl is topcoded";
. label var union "1=in union, 0 otherwise";
. label var nonwhite "1=nonwhite, 0=white" ;
. label var big_ne "1= live in big smsa from northeast, 0=otherwsie";
. compress;
age was float now byte
race was float now byte
educ was float now byte
unionm was float now byte
smsa was float now byte
region was float now byte
earnwke was float now int
age2 was float now int
union was float now byte
topcode was float now byte
nonwhite was float now byte
big_ne was float now byte
. more;
. * get descriptive statistics;
. sum;
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
age | 19906 37.96619 11.15348 21 64
race | 19906 1.199136 .525493 1 3
educ | 19906 13.16126 2.795234 0 18
unionm | 19906 1.769065 .4214418 1 2
smsa | 19906 1.908369 .7955814 1 3
-------------+--------------------------------------------------------
region | 19906 2.462373 1.079514 1 4
earnwke | 19906 488.264 236.4713 60 999
age2 | 19906 1565.826 912.4383 441 4096
earnwkl | 19906 6.067307 .513047 4.094345 6.906755
union | 19906 .2309354 .4214418 0 1
-------------+--------------------------------------------------------
topcode | 19906 .0719381 .2583919 0 1
nonwhite | 19906 .1408118 .3478361 0 1
big_ne | 19906 .1409625 .3479916 0 1
. * get detailed descriptics for continuous variables;
. sum earnwke, detail;
usual weekly earnings
-------------------------------------------------------------
Percentiles Smallest
1% 128 60
5% 178 60
10% 210 60 Obs 19906
25% 300 63 Sum of Wgt. 19906
50% 449 Mean 488.264
Largest Std. Dev. 236.4713
75% 615 999
90% 865 999 Variance 55918.7
95% 999 999 Skewness .668646
99% 999 999 Kurtosis 2.632356
. more;
. * get frequencies of discrete variables;
. tabulate unionm;
1=union |
member, |
2=otherwise | Freq. Percent Cum.
------------+-----------------------------------
1 | 4,597 23.09 23.09
2 | 15,309 76.91 100.00
------------+-----------------------------------
Total | 19,906 100.00
. tabulate race;
1=white, |
non-hisp, |
2=place, |
n.h, 3=hisp | Freq. Percent Cum.
------------+-----------------------------------
1 | 17,103 85.92 85.92
2 | 1,642 8.25 94.17
3 | 1,161 5.83 100.00
------------+-----------------------------------
Total | 19,906 100.00
. more;
. * get two-way table of frequencies;
. tabulate region smsa, row column cell;
+-------------------+
| Key |
|-------------------|
| frequency |
| row percentage |
| column percentage |
| cell percentage |
+-------------------+
1=east, |
2=midwest, | 1=live in 19 largest smsa,
3=south, | 2=other smsa, 3=non smsa
4=west | 1 2 3 | Total
-----------+---------------------------------+----------
1 | 2,806 1,349 842 | 4,997
| 56.15 27.00 16.85 | 100.00
| 38.46 18.89 15.39 | 25.10
| 14.10 6.78 4.23 | 25.10
-----------+---------------------------------+----------
2 | 1,501 1,742 1,592 | 4,835
| 31.04 36.03 32.93 | 100.00
| 20.58 24.40 29.10 | 24.29
| 7.54 8.75 8.00 | 24.29
-----------+---------------------------------+----------
3 | 1,501 2,542 1,904 | 5,947
| 25.24 42.74 32.02 | 100.00
| 20.58 35.60 34.80 | 29.88
| 7.54 12.77 9.56 | 29.88
-----------+---------------------------------+----------
4 | 1,487 1,507 1,133 | 4,127
| 36.03 36.52 27.45 | 100.00
| 20.38 21.11 20.71 | 20.73
| 7.47 7.57 5.69 | 20.73
-----------+---------------------------------+----------
Total | 7,295 7,140 5,471 | 19,906
| 36.65 35.87 27.48 | 100.00
| 100.00 100.00 100.00 | 100.00
| 36.65 35.87 27.48 | 100.00
. more;
. *run simple regression;
. reg earnwkl age age2 educ nonwhite union;
Source | SS df MS Number of obs = 19906
-------------+------------------------------ F( 5, 19900) = 1775.70
Model | 1616.39963 5 323.279927 Prob > F = 0.0000
Residual | 3622.93905 19900 .182057239 R-squared = 0.3085
-------------+------------------------------ Adj R-squared = 0.3083
Total | 5239.33869 19905 .263217216 Root MSE = .42668
------------------------------------------------------------------------------
earnwkl | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .0679808 .0020033 33.93 0.000 .0640542 .0719075
age2 | -.0006778 .0000245 -27.69 0.000 -.0007258 -.0006299
educ | .069219 .0011256 61.50 0.000 .0670127 .0714252
nonwhite | -.1716133 .0089118 -19.26 0.000 -.1890812 -.1541453
union | .1301547 .0072923 17.85 0.000 .1158613 .1444481
_cons | 3.630805 .0394126 92.12 0.000 3.553553 3.708057
------------------------------------------------------------------------------
. more;
. * run regression addinf smsa, region and race fixed-effects;
. * the xi command constructs the dummies for you;
. * the lowest numbered dummy is usually the;
. * omitted variable;
. xi: reg earnwkl age age2 educ union i.race i.region i.smsa;
i.race _Irace_1-3 (naturally coded; _Irace_1 omitted)
i.region _Iregion_1-4 (naturally coded; _Iregion_1 omitted)
i.smsa _Ismsa_1-3 (naturally coded; _Ismsa_1 omitted)
Source | SS df MS Number of obs = 19906
-------------+------------------------------ F( 11, 19894) = 920.86
Model | 1767.66908 11 160.697189 Prob > F = 0.0000
Residual | 3471.66961 19894 .174508375 R-squared = 0.3374
-------------+------------------------------ Adj R-squared = 0.3370
Total | 5239.33869 19905 .263217216 Root MSE = .41774
------------------------------------------------------------------------------
earnwkl | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .070194 .0019645 35.73 0.000 .0663435 .0740446
age2 | -.0007052 .000024 -29.37 0.000 -.0007522 -.0006581
educ | .0643064 .0011285 56.98 0.000 .0620944 .0665184
union | .1131485 .007257 15.59 0.000 .0989241 .1273729
_Irace_2 | -.2329794 .0110958 -21.00 0.000 -.254728 -.2112308
_Irace_3 | -.1795253 .0134073 -13.39 0.000 -.2058047 -.1532458
_Iregion_2 | -.0088962 .0085926 -1.04 0.301 -.0257383 .007946
_Iregion_3 | -.0281747 .008443 -3.34 0.001 -.0447238 -.0116257
_Iregion_4 | .0318053 .0089802 3.54 0.000 .0142034 .0494071
_Ismsa_2 | -.1225607 .0072078 -17.00 0.000 -.1366886 -.1084328
_Ismsa_3 | -.2054124 .0078651 -26.12 0.000 -.2208287 -.1899961
_cons | 3.76812 .0391241 96.31 0.000 3.691434 3.844807
------------------------------------------------------------------------------
. more;
. * close log file;
. log close;
log: c:\bill\stata\cps87_or.log
log type: text
closed on: 6 Nov 2004, 08:14:19
------------------------------------------------------------------------------
STATA Program for Probit/Logit Models
workplace.do
* this data for this program are a random sample;
* of 10k observations from the data used in;
* evans, farrelly and montgomery, aer, 1999;
* the data are indoor workers in the 1991 and 1993;
* national health interview survey. the survey;
* identifies whether the worker smoked and whether;
* the worker faces a workplace smoking ban;
* set semi colon as the end of line;
# delimit;
* ask it NOT to pause;
set more off;
* open log file;
log using c:\bill\jpsm\workplace1.log,replace;
* use the workplace data set;
use c:\bill\jpsm\workplace1;
* print out variable labels;
desc;
* get summary statistics;
sum;
* run a linear probability model for comparison purposes;
* estimate white standard errors to control for heteroskedasticity;
reg smoker age incomel male black hispanic
hsgrad somecol college worka, robust;
* run probit model;
probit smoker age incomel male black hispanic
hsgrad somecol college worka;
*predict probability of smoking;
predict pred_prob_smoke;
* get detailed descriptive data about predicted prob;
sum pred_prob, detail;
* predict binary outcome with 50% cutoff;
gen pred_smoke1=pred_prob_smoke>=.5;
label variable pred_smoke1 "predicted smoking, 50% cutoff";
* compare actual values;
tab smoker pred_smoke1, row col cell;
* ask for marginal effects/treatment effects;
mfx compute;
* the same type of variables can be produced with;
* prchange. this command is however more flexible;
* in that you can change the reference individual;
prchange, help;
* get marginal effect/treatment effects for specific person;
* male, age 40, college educ, white, without workplace smoking ban;
* if a variable is not specified, its value is assumed to be;
* the sample mean. in this case, the only variable i am not;
* listing is mean log income;
prchange, x(age=40 black=0 hispanic=0 hsgrad=0 somecol=0 worka=0);
* using a wald test, test the null hypothesis that;
* all the education coefficients are zero;
test hsgrad somecol college;
* how to run the same tets with a -2 log like test;
* estimate the unresticted model and save the estimates ;
* in urmodel;
probit smoker age incomel male black hispanic
hsgrad somecol college worka;
estimates store urmodel;
* estimate the restricted model. save results in rmodel;
probit smoker age incomel male black hispanic
worka;
estimates store rmodel;
lrtest urmodel rmodel;
* run logit model;
logit smoker age incomel male black hispanic
hsgrad somecol college worka;
* ask for marginal effects/treatment effects;
* logit model;
mfx compute;
log close;
STATA Results for Probit/Logit Models
workplace.log
------------------------------------------------------------------------------
log: c:\bill\jpsm\workplace1.log
log type: text
opened on: 4 Nov 2004, 07:29:21
. * use the workplace data set;
. use c:\bill\jpsm\workplace1;
. * print out variable labels;
. desc;
Contains data from c:\bill\jpsm\workplace1.dta
obs: 16,258
vars: 10 28 Oct 2004 05:27
size: 325,160 (96.9% of memory free)
------------------------------------------------------------------------------
> -
storage display value
variable name type format label variable label
------------------------------------------------------------------------------
> -
smoker byte %9.0g is current smoking
worka byte %9.0g has workplace smoking bans
age byte %9.0g age in years
male byte %9.0g male
black byte %9.0g black
hispanic byte %9.0g hispanic
incomel float %9.0g log income
hsgrad byte %9.0g is hs graduate
somecol byte %9.0g has some college
college float %9.0g
------------------------------------------------------------------------------
> -
Sorted by:
. * get summary statistics;
. sum;
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
smoker | 16258 .25163 .433963 0 1
worka | 16258 .6851396 .4644745 0 1
age | 16258 38.54742 11.96189 18 87
male | 16258 .3947595 .488814 0 1
black | 16258 .1119449 .3153083 0 1
-------------+--------------------------------------------------------
hispanic | 16258 .0607086 .2388023 0 1
incomel | 16258 10.42097 .7624525 6.214608 11.22524
hsgrad | 16258 .3355271 .4721889 0 1
somecol | 16258 .2685447 .4432161 0 1
college | 16258 .3293763 .4700012 0 1
. * run a linear probability model for comparison purposes;
. * estimate white standard errors to control for heteroskedasticity;
. reg smoker age incomel male black hispanic
> hsgrad somecol college worka, robust;
Regression with robust standard errors Number of obs = 16258
F( 9, 16248) = 99.26
Prob > F = 0.0000
R-squared = 0.0488
Root MSE = .42336
------------------------------------------------------------------------------
| Robust
smoker | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | -.0004776 .0002806 -1.70 0.089 -.0010276 .0000725
incomel | -.0287361 .0047823 -6.01 0.000 -.03811 -.0193621
male | .0168615 .0069542 2.42 0.015 .0032305 .0304926
black | -.0356723 .0110203 -3.24 0.001 -.0572732 -.0140714
hispanic | -.070582 .0136691 -5.16 0.000 -.097375 -.043789
hsgrad | -.0661429 .0162279 -4.08 0.000 -.0979514 -.0343345
somecol | -.1312175 .0164726 -7.97 0.000 -.1635056 -.0989293
college | -.2406109 .0162568 -14.80 0.000 -.272476 -.2087459
worka | -.066076 .0074879 -8.82 0.000 -.080753 -.051399
_cons | .7530714 .0494255 15.24 0.000 .6561919 .8499509
------------------------------------------------------------------------------
. * run probit model;
. probit smoker age incomel male black hispanic
> hsgrad somecol college worka;
Iteration 0: log likelihood = -9171.443
Iteration 1: log likelihood = -8764.068
Iteration 2: log likelihood = -8761.7211
Iteration 3: log likelihood = -8761.7208
Probit estimates Number of obs = 16258
LR chi2(9) = 819.44
Prob > chi2 = 0.0000
Log likelihood = -8761.7208 Pseudo R2 = 0.0447
------------------------------------------------------------------------------
smoker | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | -.0012684 .0009316 -1.36 0.173 -.0030943 .0005574
incomel | -.092812 .0151496 -6.13 0.000 -.1225047 -.0631193
male | .0533213 .0229297 2.33 0.020 .0083799 .0982627
black | -.1060518 .034918 -3.04 0.002 -.17449 -.0376137
hispanic | -.2281468 .0475128 -4.80 0.000 -.3212701 -.1350235
hsgrad | -.1748765 .0436392 -4.01 0.000 -.2604078 -.0893453
somecol | -.363869 .0451757 -8.05 0.000 -.4524118 -.2753262
college | -.7689528 .0466418 -16.49 0.000 -.860369 -.6775366
worka | -.2093287 .0231425 -9.05 0.000 -.2546873 -.1639702
_cons | .870543 .154056 5.65 0.000 .5685989 1.172487
------------------------------------------------------------------------------
. *predict probability of smoking;
. predict pred_prob_smoke;
(option p assumed; Pr(smoker))
. * get detailed descriptive data about predicted prob;
. sum pred_prob, detail;
Pr(smoker)
-------------------------------------------------------------
Percentiles Smallest
1% .0959301 .0615221
5% .1155022 .0622963
10% .1237434 .0633929 Obs 16258
25% .1620851 .0733495 Sum of Wgt. 16258
50% .2569962 Mean .2516653
Largest Std. Dev. .0960007
75% .3187975 .5619798
90% .3795704 .5655878 Variance .0092161
95% .4039573 .5684112 Skewness .1520254
99% .4672697 .6203823 Kurtosis 2.149247
. * predict binary outcome with 50% cutoff;
. gen pred_smoke1=pred_prob_smoke>=.5;
. label variable pred_smoke1 "predicted smoking, 50% cutoff";
. * compare actual values;
. tab smoker pred_smoke1, row col cell;
+-------------------+
| Key |
|-------------------|
| frequency |
| row percentage |
| column percentage |
| cell percentage |
+-------------------+
| predicted smoking,
is current | 50% cutoff
smoking | 0 1 | Total
-----------+----------------------+----------
0 | 12,153 14 | 12,167
| 99.88 0.12 | 100.00
| 74.93 35.90 | 74.84
| 74.75 0.09 | 74.84
-----------+----------------------+----------
1 | 4,066 25 | 4,091
| 99.39 0.61 | 100.00
| 25.07 64.10 | 25.16
| 25.01 0.15 | 25.16
-----------+----------------------+----------
Total | 16,219 39 | 16,258
| 99.76 0.24 | 100.00
| 100.00 100.00 | 100.00
| 99.76 0.24 | 100.00
. * ask for marginal effects/treatment effects;
. mfx compute;
Marginal effects after probit
y = Pr(smoker) (predict)
= .24093439
------------------------------------------------------------------------------
variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X
---------+--------------------------------------------------------------------
age | -.0003951 .00029 -1.36 0.173 -.000964 .000174 38.5474
incomel | -.0289139 .00472 -6.13 0.000 -.03816 -.019668 10.421
male*| .0166757 .0072 2.32 0.021 .002568 .030783 .39476
black*| -.0320621 .01023 -3.13 0.002 -.052111 -.012013 .111945
hispanic*| -.0658551 .01259 -5.23 0.000 -.090536 -.041174 .060709
hsgrad*| -.053335 .01302 -4.10 0.000 -.07885 -.02782 .335527
somecol*| -.1062358 .01228 -8.65 0.000 -.130308 -.082164 .268545
college*| -.2149199 .01146 -18.76 0.000 -.237378 -.192462 .329376
worka*| -.0668959 .00756 -8.84 0.000 -.08172 -.052072 .68514
------------------------------------------------------------------------------
(*) dy/dx is for discrete change of dummy variable from 0 to 1
. * the same type of variables can be produced with;
. * prchange. this command is however more flexible;
. * in that you can change the reference individual;
. prchange, help;
probit: Changes in Predicted Probabilities for smoker
min->max 0->1 -+1/2 -+sd/2 MargEfct
age -0.0269 -0.0004 -0.0004 -0.0047 -0.0004
incomel -0.1589 -0.0361 -0.0289 -0.0220 -0.0289
male 0.0167 0.0167 0.0166 0.0081 0.0166
black -0.0321 -0.0321 -0.0330 -0.0104 -0.0330
hispanic -0.0659 -0.0659 -0.0710 -0.0170 -0.0711
hsgrad -0.0533 -0.0533 -0.0544 -0.0257 -0.0545
somecol -0.1062 -0.1062 -0.1130 -0.0502 -0.1134
college -0.2149 -0.2149 -0.2366 -0.1123 -0.2396
worka -0.0669 -0.0669 -0.0652 -0.0303 -0.0652
0 1
Pr(y|x) 0.7591 0.2409
age incomel male black hispanic hsgrad somecol
x= 38.5474 10.421 .39476 .111945 .060709 .335527 .268545
sd(x)= 11.9619 .762452 .488814 .315308 .238802 .472189 .443216
college worka
x= .329376 .68514
sd(x)= .470001 .464475
Pr(y|x): probability of observing each y for specified x values
Avg|Chg|: average of absolute value of the change across categories
Min->Max: change in predicted probability as x changes from its minimum to
its maximum
0->1: change in predicted probability as x changes from 0 to 1
-+1/2: change in predicted probability as x changes from 1/2 unit below
base value to 1/2 unit above
-+sd/2: change in predicted probability as x changes from 1/2 standard
dev below base to 1/2 standard dev above
MargEfct: the partial derivative of the predicted probability/rate with
respect to a given independent variable
. * get marginal effect/treatment effects for specific person;
. * male, age 40, college educ, white, without workplace smoking ban;
. * if a variable is not specified, its value is assumed to be;
. * the sample mean. in this case, the only variable i am not;
. * listing is mean log income;
. prchange, x(age=40 black=0 hispanic=0 hsgrad=0 somecol=0 worka=0);
probit: Changes in Predicted Probabilities for smoker
min->max 0->1 -+1/2 -+sd/2 MargEfct
age -0.0323 -0.0005 -0.0005 -0.0056 -0.0005
incomel -0.1795 -0.0320 -0.0344 -0.0263 -0.0345
male 0.0198 0.0198 0.0198 0.0097 0.0198
black -0.0385 -0.0385 -0.0394 -0.0124 -0.0394
hispanic -0.0804 -0.0804 -0.0845 -0.0202 -0.0847
hsgrad -0.0625 -0.0625 -0.0648 -0.0306 -0.0649
somecol -0.1235 -0.1235 -0.1344 -0.0598 -0.1351
college -0.2644 -0.2644 -0.2795 -0.1335 -0.2854
worka -0.0742 -0.0742 -0.0776 -0.0361 -0.0777
0 1
Pr(y|x) 0.6479 0.3521
age incomel male black hispanic hsgrad somecol
x= 40 10.421 .39476 0 0 0 0
sd(x)= 11.9619 .762452 .488814 .315308 .238802 .472189 .443216
college worka
x= .329376 0
sd(x)= .470001 .464475
. * using a wald test, test the null hypothesis that;
. * all the education coefficients are zero;
. test hsgrad somecol college;
( 1) hsgrad = 0
( 2) somecol = 0
( 3) college = 0
chi2( 3) = 504.78
Prob > chi2 = 0.0000
. * how to run the same tets with a -2 log like test;
. * estimate the unresticted model and save the estimates ;
. * in urmodel;
. probit smoker age incomel male black hispanic
> hsgrad somecol college worka;
Iteration 0: log likelihood = -9171.443
Iteration 1: log likelihood = -8764.068
Iteration 2: log likelihood = -8761.7211
Iteration 3: log likelihood = -8761.7208
Probit estimates Number of obs = 16258
LR chi2(9) = 819.44
Prob > chi2 = 0.0000
Log likelihood = -8761.7208 Pseudo R2 = 0.0447
------------------------------------------------------------------------------
smoker | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | -.0012684 .0009316 -1.36 0.173 -.0030943 .0005574
incomel | -.092812 .0151496 -6.13 0.000 -.1225047 -.0631193
male | .0533213 .0229297 2.33 0.020 .0083799 .0982627
black | -.1060518 .034918 -3.04 0.002 -.17449 -.0376137
hispanic | -.2281468 .0475128 -4.80 0.000 -.3212701 -.1350235
hsgrad | -.1748765 .0436392 -4.01 0.000 -.2604078 -.0893453
somecol | -.363869 .0451757 -8.05 0.000 -.4524118 -.2753262
college | -.7689528 .0466418 -16.49 0.000 -.860369 -.6775366
worka | -.2093287 .0231425 -9.05 0.000 -.2546873 -.1639702
_cons | .870543 .154056 5.65 0.000 .5685989 1.172487
------------------------------------------------------------------------------
. estimates store urmodel;
. * estimate the restricted model. save results in rmodel;
. probit smoker age incomel male black hispanic
> worka;
Iteration 0: log likelihood = -9171.443
Iteration 1: log likelihood = -9022.2473
Iteration 2: log likelihood = -9022.1031
Probit estimates Number of obs = 16258
LR chi2(6) = 298.68
Prob > chi2 = 0.0000
Log likelihood = -9022.1031 Pseudo R2 = 0.0163
------------------------------------------------------------------------------
smoker | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .0003514 .0009163 0.38 0.701 -.0014445 .0021473
incomel | -.1802868 .0143242 -12.59 0.000 -.2083617 -.152212
male | -.0117546 .0223519 -0.53 0.599 -.0555635 .0320543
black | -.0650982 .0345516 -1.88 0.060 -.1328181 .0026217
hispanic | -.152071 .0465132 -3.27 0.001 -.2432351 -.0609069
worka | -.2501544 .0227794 -10.98 0.000 -.2948012 -.2055076
_cons | 1.37729 .1472574 9.35 0.000 1.08867 1.665909
------------------------------------------------------------------------------
. estimates store rmodel;
. lrtest urmodel rmodel;
likelihood-ratio test LR chi2(3) = 520.76
(Assumption: rmodel nested in urmodel) Prob > chi2 = 0.0000
. * run logit model;
. logit smoker age incomel male black hispanic
> hsgrad somecol college worka;
Iteration 0: log likelihood = -9171.443
Iteration 1: log likelihood = -8770.6512
Iteration 2: log likelihood = -8760.9282
Iteration 3: log likelihood = -8760.9112
Logit estimates Number of obs = 16258
LR chi2(9) = 821.06
Prob > chi2 = 0.0000
Log likelihood = -8760.9112 Pseudo R2 = 0.0448
------------------------------------------------------------------------------
smoker | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | -.0026236 .0015594 -1.68 0.092 -.0056799 .0004327
incomel | -.1518663 .0251899 -6.03 0.000 -.2012376 -.102495
male | .0942472 .0390171 2.42 0.016 .0177751 .1707192
black | -.196468 .0598366 -3.28 0.001 -.3137456 -.0791904
hispanic | -.4024453 .0825043 -4.88 0.000 -.5641507 -.2407399
hsgrad | -.2906189 .0707661 -4.11 0.000 -.429318 -.1519199
somecol | -.6092455 .073822 -8.25 0.000 -.7539339 -.4645571
college | -1.325203 .0780572 -16.98 0.000 -1.478192 -1.172214
worka | -.3508271 .0389286 -9.01 0.000 -.4271257 -.2745285
_cons | 1.467936 .255991 5.73 0.000 .9662025 1.969669
------------------------------------------------------------------------------
. * ask for marginal effects/treatment effects;
. * logit model;
. mfx compute;
Marginal effects after logit
y = Pr(smoker) (predict)
= .23812502
------------------------------------------------------------------------------
variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X
---------+--------------------------------------------------------------------
age | -.000476 .00028 -1.68 0.092 -.00103 .000078 38.5474
incomel | -.0275518 .00457 -6.03 0.000 -.0365 -.018604 10.421
male*| .0171866 .00715 2.40 0.016 .003174 .0312 .39476
black*| -.0342102 .00998 -3.43 0.001 -.053765 -.014655 .111945
hispanic*| -.0661959 .01217 -5.44 0.000 -.090044 -.042347 .060709
hsgrad*| -.0513887 .01219 -4.22 0.000 -.075278 -.0275 .335527
somecol*| -.102284 .01141 -8.97 0.000 -.124644 -.079924 .268545
college*| -.2120833 .0108 -19.64 0.000 -.233248 -.190919 .329376
worka*| -.0657566 .0075 -8.76 0.000 -.080464 -.05105 .68514
------------------------------------------------------------------------------
(*) dy/dx is for discrete change of dummy variable from 0 to 1
. log close;
log: c:\bill\jpsm\workplace1.log
log type: text
closed on: 4 Nov 2004, 07:30:16
------------------------------------------------------------------------------
STATA Program for Odds Ratio in Logit Models
natal95.do
* this data set is a small .005 % random sample;
* of observations from the 1995 natality detail;
* data. we will examine the impack of smoking:
* on birth weight. two large states, NY and CA, do not;
* record mothers smoking status. therefore, of the ;
* 4 million births in the US, only 3 million have all;
* the necessary data so there should be 3 million*.005;
* or roughly 15,000 obs;
* set semi colon as the end of line;
# delimit;
* ask it NOT to pause;
set more off;
* open log file;
log using c:\bill\jpsm\natal95.log,replace;
* use the natality detail data set;
use c:\bill\jpsm\natal95;
* print out variable labels;
desc;
* construct indicator for low birth weight;
gen lowbw=birthw -
storage display value
variable name type format label variable label
------------------------------------------------------------------------------
> -
birthw int %9.0g birth weight in grams
smoked byte %9.0g =1 if mom smoked during
pregnancy
age byte %9.0g moms age at birth
married byte %9.0g =1 if married
race4 byte %9.0g 1=white,2=black,3=asian,4=other
educ5 byte %9.0g 1=0-8, 2=9-11, 3=12, 4=13-15,
5=16+
visits byte %9.0g prenatal visits
------------------------------------------------------------------------------
> -
Sorted by:
. * construct indicator for low birth weight;
. gen lowbw=birthw|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
smoked | .6740651 .0897869 7.51 0.000 .4980861 .8500441
age | .0080537 .006791 1.19 0.236 -.0052564 .0213638
married | -.3954044 .0882471 -4.48 0.000 -.5683654 -.2224433
_Ieduc5_2 | -.1949335 .1626502 -1.20 0.231 -.5137221 .1238551
_Ieduc5_3 | -.1925099 .1543239 -1.25 0.212 -.4949791 .1099594
_Ieduc5_4 | -.4057382 .1676759 -2.42 0.016 -.7343769 -.0770994
_Ieduc5_5 | -.3569715 .1780322 -2.01 0.045 -.7059081 -.0080349
_Irace4_2 | .7072894 .0875125 8.08 0.000 .5357681 .8788107
_Irace4_3 | .386623 .307062 1.26 0.208 -.2152075 .9884535
_Irace4_4 | .3095536 .2047899 1.51 0.131 -.0918271 .7109344
_cons | -2.755971 .2104916 -13.09 0.000 -3.168527 -2.343415
------------------------------------------------------------------------------
. * get marginal effects;
. mfx compute;
Marginal effects after logit
y = Pr(lowbw) (predict)
= .05465609
------------------------------------------------------------------------------
variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X
---------+--------------------------------------------------------------------
smoked*| .0436744 .00706 6.18 0.000 .029834 .057514 .136683
age | .0004161 .00035 1.19 0.236 -.000271 .001104 26.6564
married*| -.0218806 .0052 -4.21 0.000 -.032074 -.011687 .683204
_Ieduc~2*| -.0095123 .00749 -1.27 0.204 -.024188 .005164 .165495
_Ieduc~3*| -.0096965 .00758 -1.28 0.201 -.024554 .005161 .345397
_Ieduc~4*| -.0190499 .00714 -2.67 0.008 -.033043 -.005057 .22319
_Ieduc~5*| -.0169077 .00771 -2.19 0.028 -.032027 -.001788 .216093
_Irace~2*| .0453844 .00675 6.72 0.000 .032148 .058621 .17168
_Irace~3*| .0236917 .02204 1.07 0.282 -.019506 .06689 .010401
_Irace~4*| .018225 .01363 1.34 0.181 -.008488 .044938 .031694
------------------------------------------------------------------------------
(*) dy/dx is for discrete change of dummy variable from 0 to 1
. * run a logit but report the odds ratios instead;
. xi: logistic lowbw smoked age married i.educ5 i.race4;
i.educ5 _Ieduc5_1-5 (naturally coded; _Ieduc5_1 omitted)
i.race4 _Irace4_1-4 (naturally coded; _Irace4_1 omitted)
Logistic regression Number of obs = 14230
LR chi2(10) = 214.10
Prob > chi2 = 0.0000
Log likelihood = -3136.9912 Pseudo R2 = 0.0330
------------------------------------------------------------------------------
lowbw | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
smoked | 1.962198 .1761796 7.51 0.000 1.645569 2.33975
age | 1.008086 .0068459 1.19 0.236 .9947574 1.021594
married | .6734077 .0594262 -4.48 0.000 .5664506 .8005604
_Ieduc5_2 | .8228894 .1338431 -1.20 0.231 .5982646 1.131852
_Ieduc5_3 | .8248862 .1272996 -1.25 0.212 .6095837 1.116233
_Ieduc5_4 | .6664847 .1117534 -2.42 0.016 .4798043 .9257979
_Ieduc5_5 | .6997924 .1245856 -2.01 0.045 .4936601 .9919973
_Irace4_2 | 2.028485 .1775178 8.08 0.000 1.70876 2.408034
_Irace4_3 | 1.472001 .4519957 1.26 0.208 .8063741 2.687076
_Irace4_4 | 1.362817 .2790911 1.51 0.131 .9122628 2.035893
------------------------------------------------------------------------------
. log close;
log: c:\bill\jpsm\natal95.log
log type: text
closed on: 4 Nov 2004, 05:48:39
------------------------------------------------------------------------------
STATA Program for Ordered Probit Models
sr_health_status.do
* this data for this example are adults, 18-64;
* who answered the cancer control supplement to;
* the 1994 national health interview survey;
* the key outcome is self reported health status;
* coded 1-5, poor, fair, good, very good, excellent;
* a ke covariate is current smoking status and whether;
* one smoked 5 years ago;
# delimit;
set memory 20m;
set matsize 200;
set more off;
log using c:\bill\jpsm\sr_health_status.log,replace;
* load up sas data set;
use c:\bill\jpsm\sr_health_status;
* get contents of data file;
desc;
* get summary statistics;
sum;
* get tabulation of sr_health;
tab sr_health;
* run OLS models, just to look at the raw correlations in data;
reg sr_health male age educ famincl black othrace smoke smoke5;
* do ordered probit, self reported health status;
oprobit sr_health male age educ famincl black othrace smoke smoke5;
* get marginal effects, evaluated at y=5 (excellent);
mfx compute, predict(outcome(5));
* get marginal effects, evaluated at y=3 (good);
mfx compute, predict(outcome(3));
* use prchange, evaluate marginal effects for;
* 40 year old white female with a college degree;
* never smoked with average log income;
prchange, x(age=40 black=0 othrace=0 smoke=0 smoke5=0 educ=16);
log close;
STATA Results for Ordered Probit Models
sr_health_status.log
------------------------------------------------------------------------------
log: c:\bill\iadb\sr_health_status.log
log type: text
opened on: 1 Nov 2004, 12:06:56
. * load up sas data set;
. use sr_health_status;
. * get contents of data file;
. desc;
Contains data from sr_health_status.dta
obs: 12,900
vars: 9 1 Nov 2004 11:51
size: 322,500 (98.5% of memory free)
------------------------------------------------------------------------------
> -
storage display value
variable name type format label variable label
------------------------------------------------------------------------------
> -
male byte %9.0g =1 if male
age byte %9.0g age in years
educ byte %9.0g years of education
smoke byte %9.0g current smoker
smoke5 byte %9.0g smoked in past 5 years
black float %9.0g =1 if respondent is black
othrace float %9.0g =1 if other race (white is ref)
sr_health float %9.0g 1-5 self reported health,
5=excel, 1=poor
famincl float %9.0g log family income
------------------------------------------------------------------------------
> -
Sorted by:
. * get summary statistics;
. sum;
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
male | 12900 .438062 .4961681 0 1
age | 12900 39.84124 11.60603 21 64
educ | 12900 13.24016 2.73325 0 18
smoke | 12900 .2891473 .453384 0 1
smoke5 | 12900 .0813953 .2734519 0 1
-------------+--------------------------------------------------------
black | 12900 .1242636 .3298948 0 1
othrace | 12900 .0412403 .1988532 0 1
sr_health | 12900 3.888992 1.063713 1 5
famincl | 12900 10.21313 .95086 6.214608 11.22524
. * get tabulation of sr_health;
. tab sr_health;
1-5 self |
reported |
health, |
5=excel, |
1=poor | Freq. Percent Cum.
------------+-----------------------------------
1 | 342 2.65 2.65
2 | 991 7.68 10.33
3 | 3,068 23.78 34.12
4 | 3,855 29.88 64.00
5 | 4,644 36.00 100.00
------------+-----------------------------------
Total | 12,900 100.00
. * run OLS models, just to look at the raw correlations in data;
. reg sr_health male age educ famincl black othrace smoke smoke5;
Source | SS df MS Number of obs = 12900
-------------+------------------------------ F( 8, 12891) = 350.85
Model | 2609.62058 8 326.202572 Prob > F = 0.0000
Residual | 11985.4163 12891 .929750704 R-squared = 0.1788
-------------+------------------------------ Adj R-squared = 0.1783
Total | 14595.0369 12899 1.13148592 Root MSE = .96424
------------------------------------------------------------------------------
sr_health | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | .1033877 .0172399 6.00 0.000 .0695949 .1371804
age | -.0189687 .0007472 -25.39 0.000 -.0204333 -.0175041
educ | .074539 .0033897 21.99 0.000 .0678946 .0811833
famincl | .2299388 .0099542 23.10 0.000 .2104271 .2494504
black | -.2127016 .0265726 -8.00 0.000 -.2647878 -.1606153
othrace | -.2120907 .0429632 -4.94 0.000 -.2963049 -.1278765
smoke | -.1800193 .0196221 -9.17 0.000 -.2184815 -.1415572
smoke5 | -.1356116 .0317119 -4.28 0.000 -.1977716 -.0734515
_cons | 1.362405 .1005616 13.55 0.000 1.165289 1.55952
------------------------------------------------------------------------------
. * do ordered probit, self reported health status;
. oprobit sr_health male age educ famincl black othrace smoke smoke5;
Iteration 0: log likelihood = -17591.791
Iteration 1: log likelihood = -16403.785
Iteration 2: log likelihood = -16401.987
Iteration 3: log likelihood = -16401.987
Ordered probit estimates Number of obs = 12900
LR chi2(8) = 2379.61
Prob > chi2 = 0.0000
Log likelihood = -16401.987 Pseudo R2 = 0.0676
------------------------------------------------------------------------------
sr_health | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | .1281241 .0195747 6.55 0.000 .0897583 .1664899
age | -.0202308 .0008499 -23.80 0.000 -.0218966 -.018565
educ | .0827086 .0038547 21.46 0.000 .0751535 .0902637
famincl | .2398957 .0112206 21.38 0.000 .2179037 .2618878
black | -.221508 .029528 -7.50 0.000 -.2793818 -.1636341
othrace | -.2425083 .0480047 -5.05 0.000 -.3365958 -.1484208
smoke | -.2086096 .0219779 -9.49 0.000 -.2516855 -.1655337
smoke5 | -.1529619 .0357995 -4.27 0.000 -.2231277 -.0827961
-------------+----------------------------------------------------------------
_cut1 | .4858634 .113179 (Ancillary parameters)
_cut2 | 1.269036 .11282
_cut3 | 2.247251 .1138171
_cut4 | 3.094606 .1145781
------------------------------------------------------------------------------
. * get marginal effects, evaluated at y=5 (excellent);
. mfx compute, predict(outcome(5));
Marginal effects after oprobit
y = Pr(sr_health==5) (predict, outcome(5))
= .34103717
------------------------------------------------------------------------------
variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X
---------+--------------------------------------------------------------------
male*| .0471251 .00722 6.53 0.000 .03298 .06127 .438062
age | -.0074214 .00031 -23.77 0.000 -.008033 -.00681 39.8412
educ | .0303405 .00142 21.42 0.000 .027565 .033116 13.2402
famincl | .0880025 .00412 21.37 0.000 .07993 .096075 10.2131
black*| -.0781411 .00996 -7.84 0.000 -.097665 -.058617 .124264
othrace*| -.0843227 .01567 -5.38 0.000 -.115043 -.053602 .04124
smoke*| -.0749785 .00773 -9.71 0.000 -.09012 -.059837 .289147
smoke5*| -.0545062 .01235 -4.41 0.000 -.078719 -.030294 .081395
------------------------------------------------------------------------------
(*) dy/dx is for discrete change of dummy variable from 0 to 1
. * get marginal effects, evaluated at y=3 (good);
. mfx compute, predict(outcome(3));
Marginal effects after oprobit
y = Pr(sr_health==3) (predict, outcome(3))
= .25239744
------------------------------------------------------------------------------
variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X
---------+--------------------------------------------------------------------
male*| -.0276959 .00425 -6.51 0.000 -.036029 -.019363 .438062
age | .0043717 .0002 21.81 0.000 .003979 .004765 39.8412
educ | -.0178727 .00089 -20.02 0.000 -.019623 -.016123 13.2402
famincl | -.0518395 .00261 -19.85 0.000 -.056959 -.04672 10.2131
black*| .0464219 .00599 7.75 0.000 .034675 .058169 .124264
othrace*| .0501493 .00934 5.37 0.000 .031834 .068464 .04124
smoke*| .0443735 .00464 9.56 0.000 .035272 .053476 .289147
smoke5*| .0323707 .00739 4.38 0.000 .017882 .04686 .081395
------------------------------------------------------------------------------
(*) dy/dx is for discrete change of dummy variable from 0 to 1
. * use prchange, evaluate marginal effects for;
. * 40 year old white female with a college degree;
. * never smoked with average log income;
. prchange, x(age=40 black=0 othrace=0 smoke=0 smoke5=0 educ=16);
oprobit: Changes in Predicted Probabilities for sr_health
male
Avg|Chg| 1 2 3 4
0->1 .0203868 -.0020257 -.00886671 -.02677558 -.01329902
5
0->1 .05096698
age
Avg|Chg| 1 2 3 4
Min->Max .13358317 .0184785 .06797072 .17686112 .07064757
-+1/2 .00321942 .00032518 .00141642 .00424452 .00206241
-+sd/2 .03728014 .00382077 .01648743 .04910323 .0237889
MargEfct .00321947 .00032515 .00141639 .00424462 .00206252
5
Min->Max -.33395794
-+1/2 -.00804856
-+sd/2 -.09320036
MargEfct -.00804868
educ
Avg|Chg| 1 2 3 4
Min->Max .21397413 -.10945692 -.19725057 -.22822781 .07974288
-+1/2 .01315829 -.00133136 -.00579271 -.01734608 -.00842556
-+sd/2 .03589903 -.0036753 -.01587057 -.04728749 -.02291423
MargEfct .01316202 -.0013293 -.00579057 -.01735309 -.00843208
5
Min->Max .45519245
-+1/2 .03289571
-+sd/2 .08974758
MargEfct .03290504
famincl
Avg|Chg| 1 2 3 4
Min->Max .16759798 -.05486112 -.13623201 -.22790183 .00276569
-+1/2 .03808549 -.00390581 -.01684746 -.05016185 -.02429861
-+sd/2 .03622223 -.0037093 -.01601486 -.04771243 -.02311897
MargEfct .03817633 -.00385563 -.0167955 -.05033251 -.02445719
5
Min->Max .41622926
-+1/2 .09521371
-+sd/2 .09055558
MargEfct .09544083
black
Avg|Chg| 1 2 3 4
0->1 .03467907 .00473166 .01835598 .04779626 .01581377
5
0->1 -.08669767
othrace
Avg|Chg| 1 2 3 4
0->1 .03787661 .00532324 .02040636 .05239134 .0165706
5
0->1 -.09469151
smoke
Avg|Chg| 1 2 3 4
0->1 .03270518 .00438228 .01712416 .04497364 .01528287
5
0->1 -.08176297
smoke5
Avg|Chg| 1 2 3 4
0->1 .02411037 .00299019 .012047 .03281575 .01242298
5
0->1 -.06027591
1 2 3 4 5
Pr(y|x) .00563112 .03431748 .17979275 .30986777 .47039089
male age educ famincl black othrace smoke
x= .438062 40 16 10.2131 0 0 0
sd(x)= .496168 11.606 2.73325 .95086 .329895 .198853 .453384
smoke5
x= 0
sd(x)= .273452
. log close;
log: c:\bill\iadb\sr_health_status.log
log type: text
closed on: 1 Nov 2004, 12:07:40
------------------------------------------------------------------------------
STATA Program for Count Data Models
drvisits.do
* drvisits.do;
* this program estimates a poisson and negative binomial;
* count data model. teh data inclused people aged 65+;
* from the 1987 nmes data set. dr visits are annual;
* this line defines the semicolon as the line delimiter;
# delimit ;
* set memork for 10 meg;
set memory 10m;
* open output file;
log using c:\bill\jpsm\drvisits.log,replace;
* open stata data set;
use c:\bill\jpsm\drvisits;
* generate new variables;
gen incomel=ln(income);
* get distribution of dr visits;
tabulate drvisits;
* get descriptive statistics;
sum;
* run poisson regression;
poisson drvisits age65 age70 age75 age80 chronic excel good fair female
black hispanic hs_drop hs_grad mcaid incomel;
* run neg binomial regression;
nbreg drvisits age65 age70 age75 age80 chronic excel good fair female
black hispanic hs_drop hs_grad mcaid incomel, dispersion(constant);
log close;
STATA Results for Count Data Models
drvisits.log
------------------------------------------------------------------------------
log: C:\bill\stata\drvisits.log
log type: text
opened on: 28 Oct 2004, 13:44:05
. * open stata data set;
. use drvisits;
. * generate new variables;
. gen incomel=ln(income);
(28 missing values generated)
. * get distribution of dr visits;
. tabulate drvisits;
annual doc |
visits | Freq. Percent Cum.
------------+-----------------------------------
0 | 915 17.18 17.18
1 | 601 11.28 28.46
2 | 533 10.01 38.46
3 | 503 9.44 47.91
4 | 450 8.45 56.35
5 | 391 7.34 63.69
6 | 319 5.99 69.68
7 | 258 4.84 74.53
8 | 216 4.05 78.58
9 | 192 3.60 82.19
10 | 147 2.76 84.94
11 | 123 2.31 87.25
12 | 99 1.86 89.11
13 | 81 1.52 90.63
14 | 80 1.50 92.13
15 | 66 1.24 93.37
16 | 56 1.05 94.42
17 | 56 1.05 95.48
18 | 34 0.64 96.11
19 | 26 0.49 96.60
20 | 17 0.32 96.92
21 | 21 0.39 97.32
22 | 20 0.38 97.69
23 | 11 0.21 97.90
24 | 15 0.28 98.18
25 | 4 0.08 98.25
26 | 12 0.23 98.48
27 | 9 0.17 98.65
28 | 6 0.11 98.76
29 | 4 0.08 98.84
30 | 5 0.09 98.93
31 | 6 0.11 99.04
32 | 2 0.04 99.08
33 | 2 0.04 99.12
34 | 3 0.06 99.17
35 | 2 0.04 99.21
36 | 2 0.04 99.25
37 | 4 0.08 99.32
38 | 2 0.04 99.36
39 | 5 0.09 99.46
40 | 2 0.04 99.49
41 | 1 0.02 99.51
42 | 4 0.08 99.59
43 | 2 0.04 99.62
44 | 2 0.04 99.66
47 | 1 0.02 99.68
48 | 2 0.04 99.72
49 | 1 0.02 99.74
50 | 1 0.02 99.76
51 | 1 0.02 99.77
53 | 2 0.04 99.81
55 | 1 0.02 99.83
56 | 1 0.02 99.85
58 | 2 0.04 99.89
61 | 1 0.02 99.91
63 | 1 0.02 99.92
65 | 1 0.02 99.94
66 | 1 0.02 99.96
68 | 1 0.02 99.98
89 | 1 0.02 100.00
------------+-----------------------------------
Total | 5,327 100.00
. * get descriptive statistics;
. sum;
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
drvisits | 5327 5.563732 6.676081 0 89
age65 | 5327 .3358363 .4723263 0 1
age70 | 5327 .2802703 .4491734 0 1
age75 | 5327 .2003004 .4002627 0 1
age80 | 5327 .1101934 .31316 0 1
-------------+--------------------------------------------------------
chronic | 5327 .6279332 .4834015 0 1
excel | 5327 .0749014 .263257 0 1
good | 5327 .3792003 .4852336 0 1
fair | 5327 .3305801 .4704662 0 1
hs_drop | 5327 .5029097 .5000385 0 1
-------------+--------------------------------------------------------
hs_grad | 5327 .2922846 .4548551 0 1
black | 5327 .1255866 .331414 0 1
hispanic | 5327 .0324761 .1772774 0 1
female | 5327 .5969589 .4905549 0 1
mcaid | 5327 .1019335 .3025893 0 1
-------------+--------------------------------------------------------
income | 5327 25381.78 28962.69 0 548224
incomel | 5299 9.754733 .8911269 2.639057 13.21444
. * run poisson regression;
. poisson drvisits age65 age70 age75 age80 chronic excel good fair female
> black hispanic hs_drop hs_grad mcaid incomel;
Iteration 0: log likelihood = -22275.374
Iteration 1: log likelihood = -22275.351
Iteration 2: log likelihood = -22275.351
Poisson regression Number of obs = 5299
LR chi2(15) = 3334.46
Prob > chi2 = 0.0000
Log likelihood = -22275.351 Pseudo R2 = 0.0696
------------------------------------------------------------------------------
drvisits | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age65 | .2144282 .026267 8.16 0.000 .1629458 .2659106
age70 | .286831 .0263077 10.90 0.000 .2352689 .3383931
age75 | .2801504 .0269802 10.38 0.000 .2272702 .3330307
age80 | .24314 .0292045 8.33 0.000 .1859001 .3003798
chronic | .4997173 .0137789 36.27 0.000 .4727111 .5267235
excel | -.7836622 .0305392 -25.66 0.000 -.8435178 -.7238065
good | -.4774853 .0159987 -29.85 0.000 -.5088422 -.4461284
fair | -.2578352 .0155473 -16.58 0.000 -.2883073 -.2273631
female | .0960976 .0123182 7.80 0.000 .0719543 .1202409
black | -.2838081 .0202163 -14.04 0.000 -.3234314 -.2441849
hispanic | -.2051023 .0368764 -5.56 0.000 -.2773788 -.1328258
hs_drop | -.2323802 .016066 -14.46 0.000 -.263869 -.2008914
hs_grad | -.1200559 .016517 -7.27 0.000 -.1524287 -.0876831
mcaid | .1535708 .0203414 7.55 0.000 .1137025 .1934392
incomel | .0211453 .0072946 2.90 0.004 .0068481 .0354425
_cons | 1.348084 .0804659 16.75 0.000 1.190374 1.505795
------------------------------------------------------------------------------
. * run neg binomial regression;
. nbreg drvisits age65 age70 age75 age80 chronic excel good fair female
> black hispanic hs_drop hs_grad mcaid incomel, dispersion(constant);
Fitting Poisson model:
Iteration 0: log likelihood = -22275.374
Iteration 1: log likelihood = -22275.351
Iteration 2: log likelihood = -22275.351
Fitting constant-only model:
Iteration 0: log likelihood = -17434.216
Iteration 1: log likelihood = -15076.44
Iteration 2: log likelihood = -14841.425
Iteration 3: log likelihood = -14840.935
Iteration 4: log likelihood = -14840.935
Fitting full model:
Iteration 0: log likelihood = -14840.935
Iteration 1: log likelihood = -14540.408
Iteration 2: log likelihood = -14519.799
Iteration 3: log likelihood = -14519.721
Iteration 4: log likelihood = -14519.721
Negative binomial (constant dispersion) Number of obs = 5299
LR chi2(15) = 642.43
Prob > chi2 = 0.0000
Log likelihood = -14519.721 Pseudo R2 = 0.0216
------------------------------------------------------------------------------
drvisits | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age65 | .1034281 .054664 1.89 0.058 -.0037113 .2105675
age70 | .2039634 .0546788 3.73 0.000 .0967949 .3111319
age75 | .2094928 .0560412 3.74 0.000 .0996541 .3193314
age80 | .2227169 .0605925 3.68 0.000 .1039579 .341476
chronic | .5091666 .0292189 17.43 0.000 .4518986 .5664347
excel | -.5272908 .0594584 -8.87 0.000 -.6438271 -.4107545
good | -.3422506 .0353507 -9.68 0.000 -.4115368 -.2729645
fair | -.1526385 .0351632 -4.34 0.000 -.2215571 -.0837198
female | .1321966 .0263028 5.03 0.000 .0806441 .183749
black | -.3300031 .0438969 -7.52 0.000 -.4160395 -.2439668
hispanic | -.1527763 .0763018 -2.00 0.045 -.3023251 -.0032275
hs_drop | -.1912903 .0344335 -5.56 0.000 -.2587787 -.1238018
hs_grad | -.0869843 .0354543 -2.45 0.014 -.1564733 -.0174952
mcaid | .1341325 .0442797 3.03 0.002 .0473459 .2209191
incomel | .0379834 .0155687 2.44 0.015 .0074693 .0684975
_cons | 1.11029 .17092 6.50 0.000 .7752924 1.445287
-------------+----------------------------------------------------------------
/lndelta | 1.65017 .0286445 1.594027 1.706312
-------------+----------------------------------------------------------------
delta | 5.207863 .1491766 4.923538 5.508607
------------------------------------------------------------------------------
Likelihood-ratio test of delta=0: chibar2(01) = 1.6e+04 Prob>=chibar2 = 0.000
. log close;
log: C:\bill\stata\drvisits.log
log type: text
closed on: 28 Oct 2004, 13:44:20
------------------------------------------------------------------------------
Program for Duration Data
Surv_data.do
* this data set has married males, aged 50-70;
* from the nhis multiple cause of death file;
* data is taken from the 1987-1990 nhis;
* surveys. all people are followed for;
* up to 60 months. max_mths is the most;
* people are followed and diedin5;
* indicates whether the person died;
* in five years (60 months);
* set end of line marker;
# delimit;
set more off;
* increase memory;
set memory 20m;
* write results to file;
log using c:\bill\jpsm\surv_data.log,replace;
* load up sas data set;
use c:\bill\jpsm\surv_data;
* get contents of data file;
desc;
* get summary statistics;
sum;
* define the duration data in the analysis;
stset max_mths, failure(diedin5);
* list the kaplan meier survivor function;
sts list;
* you can graph the functions as well;
* output the graphs to a file;
sts graph;
graph save c:\bill\jpsm\graph1.gph, replace;
* you can draw graphs for various subgroups;
* output the graphs to a file;
sts graph, by(educ);
graph save c:\bill\jpsm\graph2.gph, replace;
* run a duration model where the hazard varies across;
* people. first, ask stata to print out the raw;
* coefficients (nohr option), then do default;
* show weibull first, then exponential;
* first, construct dummies for the income and;
* education categories. in the regression statement;
* _Ie star include all variables beginning with _Ie;
* and _Ii star includes all variables starting with;
* _Ii;
xi i.income i.educ;
streg age_s_yrs black hispanic _Ie* _Ii*, d(weibull) nohr;
* now get the hazard ratios where all coefs are raised to;
* exp(b1);
streg age_s_yrs black hispanic _Ie* _Ii*, d(weibull);
* for compairson purposes, look at results from an exponential;
streg age_s_yrs black hispanic _Ie* _Ii*, d(exp) nohr;
streg age_s_yrs black hispanic _Ie* _Ii*, d(exp);
log close;
STATA Results for Duration Data
surv_data.log
------------------------------------------------------------------------------
log: c:\bill\jpsm\surv_data.log
log type: text
opened on: 7 Nov 2004, 06:26:56
. * load up sas data set;
. use c:\bill\jpsm\surv_data;
. * get contents of data file;
. desc;
Contains data from c:\bill\jpsm\surv_data.dta
obs: 26,654
vars: 7 2 Nov 2004 10:59
size: 533,080 (97.5% of memory free)
------------------------------------------------------------------------------
> -
storage display value
variable name type format label variable label
------------------------------------------------------------------------------
> -
age_s_yrs byte %9.0g age in years at the time of
survey
max_mths byte %9.0g max months of followup
black byte %9.0g dummy variable, =1 if black
hispanic byte %9.0g dummy variable, =1 hispanic
income float %9.0g =1 if ]
------------------------------------------------------------------------------
> -
1 26631 38 0 0.9986 0.0002 0.9980 0.999
> 0
2 26593 42 0 0.9970 0.0003 0.9963 0.997
> 6
3 26551 40 0 0.9955 0.0004 0.9946 0.996
> 2
4 26511 49 0 0.9937 0.0005 0.9926 0.994
> 5
5 26462 50 0 0.9918 0.0006 0.9906 0.992
> 8
6 26412 61 0 0.9895 0.0006 0.9882 0.990
> 6
7 26351 45 0 0.9878 0.0007 0.9864 0.989
> 0
8 26306 60 0 0.9855 0.0007 0.9840 0.986
> 9
9 26246 46 0 0.9838 0.0008 0.9822 0.985
> 3
10 26200 42 0 0.9822 0.0008 0.9806 0.983
> 8
11 26158 52 0 0.9803 0.0009 0.9785 0.981
> 9
12 26106 56 0 0.9782 0.0009 0.9764 0.979
> 9
13 26050 53 0 0.9762 0.0009 0.9743 0.978
> 0
14 25997 64 0 0.9738 0.0010 0.9718 0.975
> 6
15 25933 48 0 0.9720 0.0010 0.9699 0.973
> 9
16 25885 49 0 0.9701 0.0010 0.9680 0.972
> 1
17 25836 54 0 0.9681 0.0011 0.9659 0.970
> 2
18 25782 46 0 0.9664 0.0011 0.9642 0.968
> 5
19 25736 51 0 0.9645 0.0011 0.9622 0.966
> 6
20 25685 38 0 0.9631 0.0012 0.9607 0.965
> 2
21 25647 56 0 0.9609 0.0012 0.9586 0.963
> 2
22 25591 51 0 0.9590 0.0012 0.9566 0.961
> 3
23 25540 48 0 0.9572 0.0012 0.9547 0.959
> 6
24 25492 51 0 0.9553 0.0013 0.9528 0.957
> 7
25 25441 59 0 0.9531 0.0013 0.9505 0.955
> 6
26 25382 58 0 0.9509 0.0013 0.9483 0.953
> 5
27 25324 63 0 0.9486 0.0014 0.9458 0.951
> 1
28 25261 50 0 0.9467 0.0014 0.9439 0.949
> 3
29 25211 50 0 0.9448 0.0014 0.9420 0.947
> 5
30 25161 52 0 0.9428 0.0014 0.9400 0.945
> 6
31 25109 60 0 0.9406 0.0014 0.9377 0.943
> 4
32 25049 52 0 0.9386 0.0015 0.9357 0.941
> 5
33 24997 54 0 0.9366 0.0015 0.9336 0.939
> 5
34 24943 56 0 0.9345 0.0015 0.9315 0.937
> 4
35 24887 66 0 0.9320 0.0015 0.9289 0.935
> 0
36 24821 70 0 0.9294 0.0016 0.9263 0.932
> 4
37 24751 45 0 0.9277 0.0016 0.9245 0.930
> 8
38 24706 59 0 0.9255 0.0016 0.9223 0.928
> 6
39 24647 54 0 0.9235 0.0016 0.9202 0.926
> 6
40 24593 48 0 0.9217 0.0016 0.9184 0.924
> 8
41 24545 61 0 0.9194 0.0017 0.9160 0.922
> 6
42 24484 63 0 0.9170 0.0017 0.9136 0.920
> 3
43 24421 56 0 0.9149 0.0017 0.9115 0.918
> 2
44 24365 52 0 0.9130 0.0017 0.9095 0.916
> 3
45 24313 60 0 0.9107 0.0017 0.9072 0.914
> 1
46 24253 56 0 0.9086 0.0018 0.9051 0.912
> 0
47 24197 68 0 0.9060 0.0018 0.9025 0.909
> 5
48 24129 59 0 0.9038 0.0018 0.9002 0.907
> 3
49 24070 57 0 0.9017 0.0018 0.8981 0.905
> 2
50 24013 57 0 0.8996 0.0018 0.8959 0.903
> 1
51 23956 66 0 0.8971 0.0019 0.8934 0.900
> 7
52 23890 57 0 0.8949 0.0019 0.8912 0.898
> 6
53 23833 50 0 0.8931 0.0019 0.8893 0.896
> 7
54 23783 53 0 0.8911 0.0019 0.8873 0.894
> 7
55 23730 64 0 0.8887 0.0019 0.8848 0.892
> 4
56 23666 55 0 0.8866 0.0019 0.8827 0.890
> 3
57 23611 65 0 0.8842 0.0020 0.8803 0.887
> 9
58 23546 66 0 0.8817 0.0020 0.8777 0.885
> 5
59 23480 44 0 0.8800 0.0020 0.8761 0.883
> 9
60 23436 50 2.3e+04 0.8781 0.0020 0.8742 0.8
> 820
------------------------------------------------------------------------------
> -
. * you can graph the functions as well;
. * output the graphs to a file;
. sts graph;
failure _d: diedin5
analysis time _t: max_mths
. graph save c:\bill\jpsm\graph1.gph, replace;
(file c:\bill\jpsm\graph1.gph saved)
. * you can draw graphs for various subgroups;
. * output the graphs to a file;
. sts graph, by(educ);
failure _d: diedin5
analysis time _t: max_mths
. graph save c:\bill\jpsm\graph2.gph, replace;
(file c:\bill\jpsm\graph2.gph saved)
. * run a duration model where the hazard varies across;
. * people. first, ask stata to print out the raw;
. * coefficients (nohr option), then do default;
. * show weibull first, then exponential;
. * first, construct dummies for the income and;
. * education categories. in the regression statement;
. * _Ie star include all variables beginning with _Ie;
. * and _Ii star includes all variables starting with;
. * _Ii;
. xi i.income i.educ;
i.income _Iincome_1-5 (naturally coded; _Iincome_1 omitted)
i.educ _Ieduc_1-4 (naturally coded; _Ieduc_1 omitted)
. streg age_s_yrs black hispanic _Ie* _Ii*, d(weibull) nohr;
failure _d: diedin5
analysis time _t: max_mths
Fitting constant-only model:
Iteration 0: log likelihood = -12759.823
Iteration 1: log likelihood = -12723.121
Iteration 2: log likelihood = -12722.924
Iteration 3: log likelihood = -12722.924
Fitting full model:
Iteration 0: log likelihood = -12722.924
Iteration 1: log likelihood = -12454.553
Iteration 2: log likelihood = -12425.111
Iteration 3: log likelihood = -12425.055
Iteration 4: log likelihood = -12425.055
Weibull regression -- log relative-hazard form
No. of subjects = 26631 Number of obs = 26631
No. of failures = 3245
Time at risk = 1505705
LR chi2(10) = 595.74
Log likelihood = -12425.055 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
_t | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age_s_yrs | .0452588 .0031592 14.33 0.000 .0390669 .0514508
black | .4770152 .0511122 9.33 0.000 .3768371 .5771932
hispanic | .1333552 .082156 1.62 0.105 -.0276676 .294378
_Ieduc_2 | .0093353 .0591918 0.16 0.875 -.1066786 .1253492
_Ieduc_3 | -.072163 .0503131 -1.43 0.151 -.1707748 .0264488
_Ieduc_4 | -.1301173 .0657131 -1.98 0.048 -.2589126 -.0013221
_Iincome_2 | -.1867752 .0650604 -2.87 0.004 -.3142914 -.0592591
_Iincome_3 | -.3268927 .0688635 -4.75 0.000 -.4618627 -.1919227
_Iincome_4 | -.5166137 .0769202 -6.72 0.000 -.6673747 -.3658528
_Iincome_5 | -.5425447 .0722025 -7.51 0.000 -.684059 -.4010303
_cons | -9.201724 .2266475 -40.60 0.000 -9.645945 -8.757503
-------------+----------------------------------------------------------------
/ln_p | .1585315 .0172241 9.20 0.000 .1247729 .1922901
-------------+----------------------------------------------------------------
p | 1.171789 .020183 1.132891 1.212022
1/p | .8533961 .014699 .8250675 .8826974
------------------------------------------------------------------------------
. * now get the hazard ratios where all coefs are raised to;
. * exp(b1);
. streg age_s_yrs black hispanic _Ie* _Ii*, d(weibull);
failure _d: diedin5
analysis time _t: max_mths
Fitting constant-only model:
Iteration 0: log likelihood = -12759.823
Iteration 1: log likelihood = -12723.121
Iteration 2: log likelihood = -12722.924
Iteration 3: log likelihood = -12722.924
Fitting full model:
Iteration 0: log likelihood = -12722.924
Iteration 1: log likelihood = -12454.553
Iteration 2: log likelihood = -12425.111
Iteration 3: log likelihood = -12425.055
Iteration 4: log likelihood = -12425.055
Weibull regression -- log relative-hazard form
No. of subjects = 26631 Number of obs = 26631
No. of failures = 3245
Time at risk = 1505705
LR chi2(10) = 595.74
Log likelihood = -12425.055 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age_s_yrs | 1.046299 .0033055 14.33 0.000 1.03984 1.052797
black | 1.611258 .082355 9.33 0.000 1.457667 1.781032
hispanic | 1.142656 .093876 1.62 0.105 .9727116 1.342291
_Ieduc_2 | 1.009379 .059747 0.16 0.875 .8988145 1.133544
_Ieduc_3 | .9303792 .0468103 -1.43 0.151 .8430114 1.026802
_Ieduc_4 | .8779924 .0576956 -1.98 0.048 .7718905 .9986788
_Iincome_2 | .8296302 .0539761 -2.87 0.004 .7303062 .9424625
_Iincome_3 | .7211611 .0496617 -4.75 0.000 .6301089 .8253706
_Iincome_4 | .5965372 .0458858 -6.72 0.000 .5130537 .6936049
_Iincome_5 | .5812672 .041969 -7.51 0.000 .5045648 .6696297
-------------+----------------------------------------------------------------
/ln_p | .1585315 .0172241 9.20 0.000 .1247729 .1922901
-------------+----------------------------------------------------------------
p | 1.171789 .020183 1.132891 1.212022
1/p | .8533961 .014699 .8250675 .8826974
------------------------------------------------------------------------------
. * for compairson purposes, look at results from an exponential;
. streg age_s_yrs black hispanic _Ie* _Ii*, d(exp) nohr;
failure _d: diedin5
analysis time _t: max_mths
Iteration 0: log likelihood = -12759.823
Iteration 1: log likelihood = -12493.913
Iteration 2: log likelihood = -12465.272
Iteration 3: log likelihood = -12465.218
Iteration 4: log likelihood = -12465.218
Exponential regression -- log relative-hazard form
No. of subjects = 26631 Number of obs = 26631
No. of failures = 3245
Time at risk = 1505705
LR chi2(10) = 589.21
Log likelihood = -12465.218 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
_t | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age_s_yrs | .0450058 .0031587 14.25 0.000 .0388149 .0511968
black | .4739259 .0511077 9.27 0.000 .3737567 .574095
hispanic | .1325028 .0821549 1.61 0.107 -.0285178 .2935235
_Ieduc_2 | .0094567 .0591916 0.16 0.873 -.1065568 .1254701
_Ieduc_3 | -.071804 .0503096 -1.43 0.154 -.170409 .0268011
_Ieduc_4 | -.1293206 .0657092 -1.97 0.049 -.2581081 -.000533
_Iincome_2 | -.1855024 .0650573 -2.85 0.004 -.3130123 -.0579925
_Iincome_3 | -.3244382 .0688567 -4.71 0.000 -.4593948 -.1894816
_Iincome_4 | -.5134143 .0769126 -6.68 0.000 -.6641602 -.3626684
_Iincome_5 | -.5391811 .072196 -7.47 0.000 -.6806827 -.3976795
_cons | -8.491069 .2107085 -40.30 0.000 -8.90405 -8.078088
------------------------------------------------------------------------------
. streg age_s_yrs black hispanic _Ie* _Ii*, d(exp);
failure _d: diedin5
analysis time _t: max_mths
Iteration 0: log likelihood = -12759.823
Iteration 1: log likelihood = -12493.913
Iteration 2: log likelihood = -12465.272
Iteration 3: log likelihood = -12465.218
Iteration 4: log likelihood = -12465.218
Exponential regression -- log relative-hazard form
No. of subjects = 26631 Number of obs = 26631
No. of failures = 3245
Time at risk = 1505705
LR chi2(10) = 589.21
Log likelihood = -12465.218 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age_s_yrs | 1.046034 .0033041 14.25 0.000 1.039578 1.05253
black | 1.606288 .0820936 9.27 0.000 1.453184 1.775523
hispanic | 1.141682 .0937948 1.61 0.107 .971885 1.341145
_Ieduc_2 | 1.009502 .059754 0.16 0.873 .898924 1.133681
_Ieduc_3 | .9307133 .0468238 -1.43 0.154 .8433198 1.027163
_Ieduc_4 | .8786922 .0577381 -1.97 0.049 .7725117 .9994672
_Iincome_2 | .8306869 .0540422 -2.85 0.004 .731241 .943657
_Iincome_3 | .7229334 .0497788 -4.71 0.000 .6316658 .827388
_Iincome_4 | .5984488 .0460282 -6.68 0.000 .5147056 .6958171
_Iincome_5 | .5832257 .0421066 -7.47 0.000 .5062713 .6718773
------------------------------------------------------------------------------
. log close;
log: c:\bill\jpsm\surv_data.log
log type: text
closed on: 7 Nov 2004, 06:27:08
------------------------------------------------------------------------------
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related searches
- student loan forgiveness program for gov
- loan forgiveness program for teachers
- hecm program for seniors
- trued program for federal employees
- best program for home finance
- government program for home buyers
- student loan forgiveness program for government employees
- free word program for windows
- loan forgiveness program for nurses
- free word program for mac
- hardship program for car loans
- the hope program for first time homebuyers