Biostat 513

Biostat 513

Homework 3 Key

Note to students:

The STATA output has been edited to eliminate the presentation of information unrelated to the question of interest. If you must include STATA output, please edit it accordingly. In this key, the STATA commands are included with our tables for your conveinence. You may include STATA output and commands in an appendix if you think they will be helpful to the graders.

. infile age alc tob freq1 freq0 using a:\tuyns_dat.txt

. reshape long freq, i(age alc tob) j(cc)

. gen tobexp=tob

. recode tobexp 1/2=0 3/4=1

. gen alcexp=alc

. recode alcexp 1/2=0 3/4=1

1. (a) Analyze the relationship between cancer (cc) and tobacco (tobexp) by

creating a 2X2 table. Quote and interpret the odds ratio estimate and

a 95% confidence limit for the odds ratio. Why is this called the “crude”

estimate?

. cs cc tobexp [freq=freq], or

| tobexp |

| Exposed Unexposed | Total

-----------------+------------------------+----------

Cases | 64 136 | 200

Noncases | 150 625 | 775

-----------------+------------------------+----------

Total | 214 761 | 975

| Point estimate | [95% Conf. Interval]

|------------------------+----------------------

Odds ratio | 1.960784 | 1.387991 2.770272 (Cornfield)

+-----------------------------------------------

chi2(1) = 14.84 Pr>chi2 = 0.0001

The odds for getting cancer is approximately 2 times greater for people who

smoke 20+ cigarettes per day than those who smoke less than 20 cigarettes

per day.

This odds ratio is the crude estimate because we did not adjust for any

other covariates.

1. (b) Now adjust for age by stratification into the 6 age categories. Determine and interpret an adjusted odds ratio for tobexp and a 95% confidence limit using the Mantel-Haenszel method. State in simple terms how the meaning of this estimate differs from that calculated in (a).

. cs cc tobexp [freq=freq], by(age) or

age in years | OR [95% Conf. Interval]

-----------------+-----------------------------------

25-34 | 0 0 .

35-44 | 1.817073 .4776855 6.966522

45-54 | 2.857464 1.429078 5.721081

55-64 | 2.442708 1.33919 4.457753

65-74 | 2.186047 .9238396 5.176556

75+ | .9454545 0 5.039992

-----------------+-----------------------------------

Crude | 1.960784 1.387991 2.770272

M-H combined | 2.302855 1.578173 3.360306

-----------------+-----------------------------------

Test of homogeneity (M-H) chi2(5) = 1.477 Pr>chi2 = 0.9157

Test that combined OR = 1:

Mantel-Haenszel chi2(1) = 19.16

Pr>chi2 = 0.0000

The summary odds ratio for getting cancer, holding age constant, is 2.30

( 95% CI: [1.58,3.36] ) This implies that for each age group, the odds of

getting cancer is 2.3 times greater for people who smoke 20+ cigarettes per

day than those who smoke less than 20 cigarettes per day.

1. (c) Is the assumption of a common odds ratio, which implicitly underlies the calculations in (b), a plausible assumption? Present evidence to support

your conclusions.

By looking at the M-H test of homogeneity (from part (b)) we can conclude

that the assumption of a common odds ratio is reasonable. The p-value is

0.9157, which provides no evidence that the odd ratios are different.

1. (d) Repeat parts (b) and (c), but this time using simultaneous adjustment for age and alcexp.

. egen age_alc=group(age alcexp)

. cs cc tobexp [freq=freq], by(age_alc) or

age in years/alchol consumption | OR [95% Conf. Interval]

--------------------------------+-----------------------------------

25-35 / 0-79 gms/day | . . .

25-35 / 80+ gms/day | 0 0 .

35-35 / 0-79 gms/day | .8888889 0 6.180248

35-35 / 80+ gms/day | 4.2 .5867284 30.99466

45-35 / 0-79 gms/day | 3.916084 1.535386 10.01481

45-35 / 80+ gms/day | 1.767857 .5572697 5.599932

55-35 / 0-79 gms/day | 2.903704 1.316374 6.419849

55-35 / 80+ gms/day | 2.2 .7067541 6.768626

65-35 / 0-79 gms/day | 1.689655 .6156516 4.661245

65-35 / 80+ gms/day | 6.071429 .8038395 .

75+ / 0-79 gms/day | 1.733333 .3163389 10.09754

75+ / 80+ gms/day | . . .

--------------------------------+-----------------------------------

Crude | 1.960784 1.387991 2.770272

M-H combined | 2.382241 1.591432 3.566017

--------------------------------+-----------------------------------

Test of homogeneity (M-H) chi2(9) = 3.738 Pr>chi2 = 0.9278

Test that combined OR = 1:

Mantel-Haenszel chi2(1) = 18.47

Pr>chi2 = 0.0000

part(b): The summary odds ratio for getting cancer, holding age and alcohol consumption constant, is 2.38. The odds for getting cancer is 2.38 times greater for people who smoke 20+ cigarettes per day than those who smoke less than 20 cigarettes per day.

part(c): By looking at the M-H test of homogeneity, we can conclude that the

assumption of a common odds ratio is valid. The p-value is 0.9278, which

provides no evidence that the odds ratios are different.

1. (e) Is there evidence that alcohol and tobacco consumption are associated? After adjustment for age? Why is it best to examine this association using the control population only?

To check if alcohol and tobacco consumption are associated:

. cc alcexp tobexp if cc==0 [freq=freq]

| tobexp | Proportion

| Exposed Unexposed | Total Exposed

-----------------+------------------------+----------------------

Exposed | 23 86 | 109 0.2110

Unexposed | 127 539 | 666 0.1907

-----------------+------------------------+----------------------

Total | 150 625 | 775 0.1935

| |

| Point estimate | [95% Conf. Interval]

|------------------------+----------------------

Odds ratio | 1.135049 | .6918794 1.863025 (Cornfield)

+------------------------+----------------------

chi2(1) = 0.25 Pr>chi2 = 0.6187

Since this is a 2x2 table, the X2 statisitic can be interpreted as the result of a X2 test of association, with H0: no association exists and H1: there is an association between alcohol and tobacco consumption. The statistic is not significant (p=0.62), so the conclusion is that there is no evidence for an association between alcohol and tobacco consumption in the controls.

To check if alcohol and tobacco consumption are associated:

. cc alcexp tobexp if cc==0 [freq=freq], by (age)

age in years | OR [95% Conf. Interval]

-----------------+-----------------------------------

25-34 | 4.772727 1.269772 17.91537

35-44 | .8465608 .3096488 2.330879

45-54 | 1.370629 .5427856 3.482617

55-64 | .9427609 .339913 2.636674

64-74 | .4117647 0 2.698141

75+ | . . .

-----------------+-----------------------------------

Crude | 1.135049 .6918794 1.863025

M-H combined | 1.170469 .7067113 1.938553

-----------------+-----------------------------------

Test of homogeneity (M-H) chi2(4) = 5.47 Pr>chi2 = 0.2424

Test that combined OR = 1:

Mantel-Haenszel chi2(1) = 0.37

Pr>chi2 = 0.5410

To test if there is an association between alcohol and tobacco consumption after adjusting for age, first the M-H test of homogeneity is used. The test statisitic is not significant (p=0.24), indicating that there is no evidence to suggest that the OR’s are different within the age groups. The M-H test of association test statistic is also non-significant, indicating that there is no evidence that the OR’s within the age groups are different from 1.

In conclusion, there is no evidence to suggest an association between alcohol and tobacco consumption in the control population, with and without adjusting for age.

It is best to examine this association against the control population because the control population reflects the population the results of the study will be applied to. The diseased population is more likely to show an association between alcohol consumption and tobacco consumption.

2. (a) What would be the dependent variable in a logistic regression for the Ille-et-Vilaine data?

Cancer (cc) is the dependent variable.

1. (b) Define (write down the equation for) a logistic regression model that would characterize the unadjusted (crude) odds ratio that was measured in question (a).

pi(X) = expit(b0 + b1*X) = [exp(b0 + b1*tobexp)]/[1 + exp(b0 + b1*tobexp)]

or

logit[pi(X)] = bo + b1*tobexp

2. (c) Compute and interpret the estimated odds ratio for tobexp, with

adjustment for age, and its 95% confidence limit. Compare the point and

interval estimates to those obtained in question 1(b). Are they similar? Do

they have similar interpretations? Why or Why not?

OR = exp(0.83397) = 2.30

95% CI: [exp(0.455), exp(1.212)] = [1.58, 3.36]

The odds ratio and 95% CI are the same as the point and interval estimate

obtained in 1(b). The interpretation is identical; 2.30 is an estimator of

the age-specific OR assumed constant in age.

3. (a) For each of the models fitted above, state the form of the logistic model that was used – stating the dependent variable, the interpretation of the probability pi(X), and the model for pi(X) in terms of the (unknown) population parameters and the independent variables.

The dependent variable is CVD mortality (1=death from CVD, 0 otherwise).

pi(X) is the probability of dying from Cardiovascular Disease within ten

years for a group of subjects with covariate values X.

The hypothesized models are:

*Model 1:

pi(X) = ( exp( b0 + b1*SOC + b2*SBP + b3*SMK + b4*SOC*SBP +

.b5*SOC*SMK) ) / ( 1 + ( exp( b0 + b1*SOC + b2*SBP +

b3*SMK + b4*SOC*SBP + b5*SOC*SMK) ))

*Model 2:

pi(X) = ( exp( b0 + b1*SOC + b2*SBP + b3*SMK) ) /

(1 + ( exp( b0 + b1*SOC + b2*SBP + b3*SMK) ))

The fitted models are:

*Model 1:

pi(X) = ( exp(-1.180 - .520*SOC + .040*SBP - .560*SMK - .033*SOC*SBP +

.175*SOC*SMK) ) / ( 1 + ( exp(-1.180 - .520*SOC + .040*SBP -

.560*SMK - .033*SOC*SBP + .175*SOC*SMK) ))

*Model 2:

pi(X) = ( exp(-1.19 - .500*SOC + .010*SBP - .420*SMK) ) /

(1 + ( exp(-1.19 - .500*SOC + .010*SBP - .420*SMK) ))

3. (b) For each of the models in (a) state the form of the estimated log odds

functions: logit[pi(X)]= …

Model 1:

logit[pi(X)] = -1.180 - .520*SOC + .040*SBP - .560*SMK - .033*SOC*SBP

+ .175*SOC*SMK

Model 2:

logit[pi(X)] = -1.19 - .500*SOC + .010*SBP - .420*SMK

3. (c) Using model 1, compute the estimated risk for CVD death (i.e. CVD=1) for a high social class (SOC=1) smoker(SMK=1) with SBP=150 (person 1), and a low social class (SOC=0)smoker (SMK=1) with SBP=150 (person 2). What is the estimated relative risk comparing these individuals?

High Social Class:

pi(X) = ( exp(-1.180 - .520*1 + .040*150 - .560*1 - .033*1*150 +

.175*1*1) ) / ( 1 + ( exp(-1.180 - .520*1 + .040*150 - .560*1 -

.033*1*150 + .175*1*1) ))

= exp(-1.035) / (1 + exp(-1.035))

= 0.262

Low Social Class:

pi(X) = exp(-1.180 + .040*150 -.56*1) / (1 + exp(-1.180 + .040*150 -.56*1))

= exp(4.26) / (1 + exp(4.26))

= .986

RR = .262 / . 986 = .2657

Smokers with a SBP of 150 in a high social class have .2657 times the

risk of CVD mortality compared to smokers with a SBP of 150 in a low

social class

3. (d) Repeat parts (c) using model 2. Why is the estimate so different?

High Social Class:

pi(X) = exp(-1.19 - .5*1 + .01*150 - .42) / (1 + exp(-1.19 - .5*1 +

.01*150 - .42) )

= exp(-.61) / ( 1 + exp(-.61))

= .352

Low Social Class:

pi(X) = exp(-1.19 + .01*150 - .42) / (1 + exp(-1.19 + .01*150 - .42))

= exp(-.11) / (1 + exp(-.11))

= .4725

RR = .352 / .4725 = .745

Smokers with a SBP of 150 in a high social class have .745 times the

risk of CVD mortality compared to smokers with a SBP of 150 in a low

social class

The estimates are different due to the interactions terms in model 1.

3. (e) What is the estimated odds ratio comparing SOC=1 to SOC=0 for non-smokers SMK=0 with SBP=150 under model 1 and under model 2.

Model 1:

OR = exp(-.52 - .033(150)) = exp(-5.47) = .0042

Model 2:

OR = exp(-.5) = .606

3. (f) If the study design had been a case-control study (retrospective) which

risk estimate would you report (RR or OR)? Justify.

From case-control studies we can not estimate the disease relative risk

(comparing exposed to unexposed). However, from this study design we

can estimate the exposure odds ratio which equals the disease odds ratio.

For rare diseases the OR approximates the RR.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

To fulfill the demand for quickly locating and searching documents.

It is intelligent file search solution for home and business.

Literature Lottery

To fulfill the demand for quickly locating and searching documents.

Related download

Related searches