Biost 513 - University of Washington



Biost 513

Spring 2000

Professor Breslow

HOMEWORK #4 Key

(Due Friday, April 28 in class)

NOTE: Unless explicitly stated, direct computer output is NOT desired. Typically only part of the computer output is asked for (such as a confidence interval) and then proper interpretation of the statistics is requested.

1. Download the Framingham data from the class website and, using the commands shown on pp. 41706 and 41707 of the class notes, create a dataset containing records for the 910 male subjects 40+ years of age who were free of CHD at exam 1 and had known values for the key covariates.

** Data prep: **

.infile lexam surv cause cexam chd cva ca oth sex age ht wt sc1 sc2 dbp sbp mrw smok using s/fram.dat

.mvdecode ht wt sc1 sc2 mrw smok, mv(-1)

a. Using commands identical to those shown on page 41708, create a four level factor bpg for systolic blood pressure with break points at 127, 147 and 167 mmHg and a four level factor scg for exam 1 serum cholesterol with breakpoints at 200, 220 and 260 mg/100ml. In contradistinction to what is shown on p. 41708, create a two-level (binary) factor for age using the commands

.gen agp1 = age

.recode agp1 min/49 = 0 50/max = 1

.drop if sex>1 | age1000|smok>1000|mrw>1000|cexam==1

.gen bpg=sbp

.recode bpg min/126=1 127/146=2 147/166=3 167/max=4

.gen scg=sc1

.recode scg min/199=1 200/219=2. 220/159=3 260/max=4

b. By means of the following STATA commands, examine the association between CHD and blood pressure groups separately for those under and over 50 years of age:

.tab chd bpg if age < 50

.tab chd bpg if age >= 50

Calculate (by hand) the odds ratios comparing the second, third and fourth levels of blood pressure relative to the baseline (first) level, separately for the two age groups.

For age < 50, the table showing bpg by chd is:

Table 1

| bpg

chd | 1 2 3 4 | Total

-------------+--------------------------------------------+----------

not diseased | 124 178 64 34 | 400

diseased | 17 32 15 12 | 76

-------------+--------------------------------------------+----------

Total | 141 210 79 46 | 476

The OR comparing the second level of blood pressure to baseline is:

To obtain this, the numerator is the odds of having chd given bpg 2, and the denominator is the odds of having chd given bpg 1. The remaining OR’s are calculated in the same manner.

The OR comparing the third level of blood pressure to baseline is:

The OR comparing the fourth level of blood pressure to baseline is:

For age >= 50, the table showing bpg by chd is:

Table 2

| bpg

chd | 1 2 3 4 | Total

-------------+--------------------------------------------+----------

not diseased | 89 124 76 43 | 332

diseased | 15 31 27 29 | 102

-------------+--------------------------------------------+----------

Total | 104 155 103 72 | 434

The OR comparing the second level of blood pressure to baseline is:

The OR comparing the third level of blood pressure to baseline is:

The OR comparing the fourth level of blood pressure to baseline is:

c. Verify the calculations in part b) using the commands:

.tabodds chd bpg if age < 50

.tabodds chd bpg if age >= 50

Describe in words the associations that you see. What (tentative) conclusions do you draw?

[Note: if you add “, or” after the commands given above, the OR’s relative to baseline will be calculated for you in STATA.]

. tabodds chd bpg if age chi2 [95% Conf. Interval]

------------+-------------------------------------------------------------

= 50, or

------------+-------------------------------------------------------------

bpg (mm/Hg) | Odds ratio chi2 P>chi2 [95% Conf. Interval]

------------+-------------------------------------------------------------

|z|

---------------------+------------------------------------------

AGP1 Beta1 | .2064821 .3805134 0.543 0.587

BPG(2) Beta2 | .2710206 .3221115 0.841 0.400

BPG(3) Beta3 | .5362353 .3862376 1.388 0.165

BPG(4) Beta4 | .9456143 .4238312 2.231 0.026

AGP1*BPG(2) Beta5 | .1232712 .4711486 0.262 0.794

AGP1*BPG(3) Beta6 | .2094544 .526571 0.398 0.691

AGP1*BPG(4) Beta7 | .4410675 .5614927 0.786 0.432

(constant) Beta0 | -1.987068 .2586268 -7.683 0.000

----------------------------------------------------------------

To obtain the (’s, use the logit form of the logistic equation. We know that

To find (0, all the covariate terms must be equal to 0. That is, (0 is the log odds of having CHD for ( 50 to 50 ((0). (3 and (4 are calculated similarly with similar interpretation.

(5 tells us how much more we have to add to the main effect of BPG(2) (i.e., (2) to account for begin in the ( 50 age group. This is the difference in the log OR of CHD for having the second level of blood pressure and age < 50 and the log OR of CHD for having the second level of blood pressure and age ( 50. (6 and (7 are calculated similarly with similar interpretation.

f. Is there any evidence from e) to suggest that the CHD x BP odds ratios are modified by age? Why is it difficult to make a summary statement about these changes from the results shown in e)?

It’s hard to say what is going on with CHD and AGE! A summary evaluation is difficult because of the multiplicity of coefficients. You may think that the interaction coefficients are not necessary, because their p-values are non-significant. But they may be significant as a group in the model. The p-values will not show you this.

g. Now fit the additive (no interaction) model in which the CHD x BP odds ratios are assumed constant in age using the command

. xi: logit chd i.agp1 i.bpg

Interpret (in words) the regression coefficients in this model. Do they seem “reasonable” in light of the regression coefficients in part e)? By subtracting the log likelihood for this model from that in part e), and doubling the difference, create a chi-squared statistic. How many degrees of freedom does it have? What hypothesis does it test? Is there evidence to support the alternative hypothesis? What do you conclude?

. xi: logit chd i.agp1 i.bpg

Log likelihood = -434.92543 Pseudo R2 = 0.0330

--------------------------------

chd | Coef.

-----------------+--------------

AGP1 Beta1 | .3807617

BPG(2) Beta2 | .3297315

BPG(3) Beta3 | .6412845

BPG(4) Beta4 | 1.20235

(constant) Beta0 | -2.070352

--------------------------------

(0 is the log odds of having CHD for baseline levels of AGE and BPG (age < 50 and BPG=1

(1 is the effect of age at the baseline level of blood pressure. This is the difference in the log odds of CHD for having CHD for baseline levels of blood pressure and age ( 50 and the log odds at baseline levels for age and blood pressure ((0).

(2 is the effect of the second level of blood pressure for age < 50. This is the difference in the log odds of CHD for having the second level of blood pressure and age < 50 and the log odds at baseline levels for age and blood pressure ((0).

(3 is the effect of the third level of blood pressure for age < 50. This is the difference in the log odds of CHD for having the third level of blood pressure and age < 50 and the log odds at baseline levels for age and blood pressure ((0).

(4 is the effect of the fourth level of blood pressure for age < 50. This is the difference in the log odds of CHD for having the fourth level of blood pressure and age < 50 and the log odds at baseline levels for age and blood pressure ((0).

The estimates obtained for the betas in this regression are similar to the estimates obtained in the previous regression. They seem reasonable; nothing looks too wild.

p-value=0.8785

This is distributed as a (2 with 3 degrees of freedom. It has 3 dof because the first model fit 7 covariates and the second model fit 4 covariates, and the difference between the two is the degrees of freedom for the (2 statistic.

This tests the hypothesis of

against the alternative hypotheses that these betas are collectively not all 0.

These values can also be obtained by using the following STATA commands:

. xi: logit chd i.agp1*i.bpg

. lrtest, saving(1)

. xi: logit chd i.agp1 i.bpg

. lrtest, saving(2)

. lrtest, using(1) model(2)

Logit: likelihood-ratio test chi2(3) = 0.68

Prob > chi2 = 0.8785

We tested the full model (with interaction terms) against the model with no interaction term. The log-ratio test statistic was non-significant; this there is not a significant difference between the models. Therefore the interaction terms are not needed in our model.

2. Let’s continue the analysis of the grouped data collected by Tuyns et al. (1977) that we considered in HW #2. These data are contained in the files tuyns.dat or esoph.raw, the latter with one record per subject. Recall that the goal of the study is to characterize the cancer risk associated with both alcohol and tobacco consumption. In many applications age is a potential confounder. In class (notes pp. 32717) we noted an association of sorts between age and both alcohol and tobacco consumption among the controls, who are representative of the population. Here we will first study the association between age and cancer to confirm our belief that age is an important risk factor.

In HW #3 we looked at the association between tobacco and disease (cc) by considering a dichotomization of tobacco and then adjusting for age. Here we will use logistic regression to characterize the risk associated with all 4 levels of tobacco (by estimating odds ratios).

a. In view of the fact that the youngest age group had only 1 case, it would be a good idea to pool the first two age groups using the recode command

.recode age 1/2 = 1 3 = 2 4 = 3 5 = 4 6 = 5

after which age has 5 levels. Then use tabodds and mhodds to examine the odds ratios associating case-control status cc with age. Summarize in words the results and your conclusions [Optional for epi students: Why is it legitimate to examine age as a risk factor with these data? Under what (common) circumstance would it NOT be legitimate to do so?]

. tabodds cc age [fweight=freq]

------------+-------------------------------------------------------------

age | cases controls odds [95% Conf. Interval]

------------+-------------------------------------------------------------

1 | 10 305 0.03279 0.01746 0.06155

2 | 46 167 0.27545 0.19875 0.38175

3 | 76 166 0.45783 0.34899 0.60061

4 | 55 106 0.51887 0.37463 0.71864

5 | 13 31 0.41935 0.21944 0.80138

------------+-------------------------------------------------------------

Test of homogeneity (equal odds): chi2(4) = 96.33

Pr>chi2 = 0.0000

Score test for trend of odds: chi2(1) = 79.23

Pr>chi2 = 0.0000

Here we find an increase in odds of disease for age categories 1 to 4. The odds decrease for individuals in age category 5 (note the small number of cases in controls in this group). By the trend test we can conclude that there is a general upward trend with p-valuechi2 [95% Conf. Interval]

----------------------------------------------------------------

1.784211 79.23 0.0000 1.570651 2.026808

----------------------------------------------------------------

Using the mhodds command, we find a significant association between age and disease (p-value chi2 = 0.0000

Log likelihood = -482.05896 Pseudo R2 = 0.0256

------------------------------------------------------------------------------

cc | Coef. Std. Err. z P>|z| [95% Conf. Interval]

---------+--------------------------------------------------------------------

tob | .3884458 .0761922 5.098 0.000 .2391119 .5377797

_cons | -2.081989 .1707138 -12.196 0.000 -2.416582 -1.747396

------------------------------------------------------------------------------

A trend test in terms of the regression coefficient is:

H0: b1=0

H1: b1 not equal to zero

We can use either the likelihood ratio test (given above as LR chi2(1),

With a test statistic of 25.37 on df=1, p-value chi2 = 0.0000

Log likelihood = -480.82075 Pseudo R2 = 0.0281

------------------------------------------------------------------------------

cc | Coef. Std. Err. z P>|z| [95% Conf. Interval]

---------+--------------------------------------------------------------------

tob | .6245092 .1947229 3.207 0.001 .2428594 1.006159

tob3 | -.601781 .3832742 -1.570 0.116 -1.352985 .1494227

tob4 | -.6255163 .5637647 -1.110 0.267 -1.730475 .4794423

_cons | -2.370359 .2882533 -8.223 0.000 -2.935325 -1.805393

------------------------------------------------------------------------------

Notice that we have the following fitted log odds:

TOB=1 -2.3704 + 0.6245*(1) = -1.746

TOB=2 -2.3704 + 0.6245*(2) = -1.121 = -1.7458 + 0.6245

TOB=3 -2.3704 + 0.6245*(3) –0.6018 = -1.099 = -1.7458 + 0.6472

TOB=4 -2.3704 + 0.6245*(4) –0.6255 = -0.498 = -1.7458 + 1.2480

This gives exactly the same fitted values (log odds ratios) as the dummy variable model!

Clearly the linear model is nested within this model. We obtain the linear model as a special case where:

H0: b2=b3=0 (coefficients of TOB(3) and TOB(4) are zero)

H1: not both are zero

e. Use a likelihood ratio test to see if we’d reject the linear model in favor of the 3 parameter (for tobacco) model. Report the null hypothesis and test statistic, and interpret the p-value.

A likelihood ratio test of the linear model in 2(b) versus the model in 2(d) yields a likelihood ratio statistic of:

LR = 2*( -480.82 - -482.06 ) = 2.48, p=0.2899

Where:

H0: b2=b3=0

H1: not both are zero

We would not reject the null hypothesis and conclude that the linear model is adequate.

f. Age adjustment: Using TOB as in model b) (ie. a grouped linear variable) consider a model that adjusts the TOB odds ratio for AGE. Use dummy variables for the AGE categories. Report the adjusted TOB odds ratio and interpret the adjusted odds ratio. Does AGE appear to confound the TOB/cancer association in these data?

. xi: logit cc tob i.age [fweight=freq]

Logit estimates Number of obs = 975

LR chi2(5) = 149.81

Prob > chi2 = 0.0000

Log likelihood = -419.84154 Pseudo R2 = 0.1514

------------------------------------------------------------------------------

cc | Coef. Std. Err. z P>|z| [95% Conf. Interval]

---------+--------------------------------------------------------------------

tob | .488134 .085711 5.695 0.000 .3201435 .6561244

Iage_2 | 2.157098 .365998 5.894 0.000 1.439755 2.874441

Iage_3 | 2.686403 .3543984 7.580 0.000 1.991795 3.381011

Iage_4 | 2.974758 .3699291 8.041 0.000 2.24971 3.699806

Iage_5 | 2.695906 .4709286 5.725 0.000 1.772903 3.618909

_cons | -4.410968 .3801178 -11.604 0.000 -5.155985 -3.665951

------------------------------------------------------------------------------

The resulting adjusted odds ratio for TOB is given by:

exp(0.488) = 1.63

This odds ratio is the increase in the odds of disease for a 1 unit increase in TOB categories, where age is controlled (ie. held fixed). That is, this is the odds of disease comparing TOB=2 to TOB=1, or comparing TOB=3 to TOB=2, or comparing TOB=4 to TOB=3, for fixed values of age.

The estimate for this odds ratio that did not adjust for age was given in 2(b) as

exp(0.388) = 1.47

Age adjustment results in only a minor change in the odds ratio estimate. This suggests that age does not confound tobacco.

-----------------------

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download