Notes on Homework 3: - University of Michigan



Notes on Homework 3:

Results for this homework may differ, due to differences in entering the raw data.

Question 1: Recode variables.

Recode HRS_EXERCISE, HRTRATE1 and HRTRATE2 into the new variables ACTIVITY, HI1 and HI2:

• All recodes need to be included in a data step.

• The if then ...do; statements and the end; statements make sure that the recodes only apply to those cases that have nonmissing data for the variable being recoded.

• The new variables are automatically created in the process of the recoding. You do not need to create them explicitly prior to the recoding.

Recode of GENDER (not required): GENDER was recoded to make all values consistent, since some values in my original Excel file were lower case values and others were upper case. Because GENDER was a character variable in my Excel file, the values are all enclosed in quotes.

Use of Quotes: Be sure to use quotes for the values of character variables when doing recodes. You should *not* use quotes when recoding numeric variables.

libname b510 v9 "e:\510\homework";

data bine2;

set bine;

/*This is the ID for the baby.

Delete this case from the data set*/

if id = 59 then delete;

if hrs_exercise not=. then do;

if hrs_exercise=2 and hrs_exercise4 then activity=3;

end;

if hrtrate1 not=. then do;

if hrtrate1 >=85 then hi1=1;

else hi1 = 2;

end;

if hrtrate2 not=. then do;

if hrtrate2 >=85 then hi2=1;

else hi2 = 2;

end;

if ran = 1 then rran=1;

if ran = 0 then rran=2;

/*****************************

Recode Gender. Not required.

******************************/

if gender="m" then gender="M";

if gender="f" then gender="F";

run;

Question 2. Create formats for new variables. Note: you can use the same format for HI1 and HI2, although you can create separate formats for these two variables. You do not need to create a format for GENDER, although the method for doing so is shown below. Note the inclusion of the $ at the start of the format name $GENFMT and the use of quotes in the value statement.

proc format;

value actfmt 1="1: 4 hours";

value hifmt 1="1:High"

2="2:Not High";

value rranfmt 1="1:Yes"

2="2:No";

value $genfmt "M" = "Male"

"F" = "Female";

run;

Question 3: Check recoding of new variables.

proc means data=bine2;

title "Question 3: Proc Means for the Entire Data Set";

run;

This first Proc Means run is to check the sample sizes for all of the new variables. The n should match for the new variable and the original variable. If it does not, the syntax for the recode needs to be corrected, to be sure that missing values are not incorrectly being recoded into a value for the new variable.

The individual runs of Proc Means below are used to check that the minimum and maximum values for the new variables match the values that were desired.

proc means data=bine2;

class activity;

var hrs_exercise;

title "Question 3: Proc Means to Check ACTIVITY";

run;

proc means data=bine2;

class hi1;

var hrtrate1;

title "Question 3: Proc Means to Check HI1";

run;

proc means data=bine2;

class hi2;

var hrtrate2;

title "Question 3: Proc Means to Check HI2";

run;

proc means data=bine2;

class rran;

var ran;

title "Question 3: Proc Means to Check RRAN";

run;

A better way to check the recoding of RAN into RRAN is shown below. This uses Proc Freq and allows us to be sure that the value of 1 for RAN was recoded into a value of 1 for RRAN and that the value of 0 for RAN was recoded into 2 for RRAN. Either way is OK.

proc freq data=bine2;

tables ran*rran/ list;

title "Question 3: Proc Freq to Check RRAN";

run;

Question 4 Tabulate new variables:

Notice in the syntax below that both HI1 and HI2 use the same format, HIFMT. You can do this to save typing, if more than one variable has the same format. It is not required for this problem.

proc freq data=bine2;

tables ran rran activity hi1 hi2 ;

format activity actfmt.

hi1 hi2 hifmt. rran rranfmt.;

title "Question 4: Tabulation of New Variables";

run;

To get a tabulation of GENDER, using the format $GENFMT., you could use syntax like that shown below. Again, notice the use of the $ in the name of this format, because it applies to a character variable.

proc freq data=bine2;

tables hi2 gender;

format gender $genfmt.;

run;

Question 5 Crosstabs GENDER by HI1 and ACTIVITY by HI1:

GENDER by HI1:

Here, we are testing the null hypothesis that there is no relationship between GENDER and an elevated heart rate at time 1 (HI1). The alternative hypothesis is that there these variables are associated. We only need to use the chisq option for this question. The CMH option is not helpful here.

proc freq data=bine2;

tables gender*hi1 / chisq ;

format hi1 hifmt.;

title "Question 5: Crosstab of GENDER by HI1"; run;

Table of Gender by hi1

Gender(Gender) hi1

Frequency‚

Percent ‚

Row Pct ‚

Col Pct ‚1:High ‚2:Not Hi‚ Total

‚ ‚gh ‚

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

F ‚ 12 ‚ 49 ‚ 61

‚ 15.19 ‚ 62.03 ‚ 77.22

‚ 19.67 ‚ 80.33 ‚

‚ 75.00 ‚ 77.78 ‚

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

M ‚ 4 ‚ 14 ‚ 18

‚ 5.06 ‚ 17.72 ‚ 22.78

‚ 22.22 ‚ 77.78 ‚

‚ 25.00 ‚ 22.22 ‚

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

Total 16 63 79

20.25 79.75 100.00

There were 61 females in this data set. 12/61 females, or 19.67%, had high heart rate at time 1. There were 18 males, and 4/18 males, or 22.22% had high heart rate at time 1. We would not reject the null hypothesis that there gender is independent of high heart rate at time 1. (Pearson chi-square with 1 df = 0.0560, p=0.8130). We conclude that there is no evidence to suggest an association between gender and having a high heart rate at time 1.

ACTIVITY by HI1:

Here, we are testing the null hypothesis that there is no relationship between ACTIVITY level and an elevated heart rate at time 1 (HI1). The alternative hypothesis is that these variables are associated. We need to use the chisq option, plus the trend option for this question. The Cochran-Armitage test for trend is testing whether there is a linear trend in the proportion of those who have high heart rate with increasing levels of activity. It is appropriate when one of the variables (either the row or column variable) is ordinal and the other variable is binary (i.e. has two levels). This test is appropriate here since ACTIVITY is ordinal and HI1 has 2 levels. The trend could be either positive or negative. The CMH option is not necessary or particularly helpful here.

proc freq data=bine2;

tables activity*hi1 / chisq trend;

format activity actfmt. hi1 hifmt.;

title "Question 5: Crosstab of ACTIVITY by HI1, with TREND Test";

run;

Table of activity by hi1

activity hi1

Frequency ‚

Percent ‚

Row Pct ‚

Col Pct ‚1:High ‚2:Not Hi‚ Total

‚ ‚gh ‚

ƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

1: 4 hours ‚ 4 ‚ 18 ‚ 22

‚ 5.06 ‚ 22.78 ‚ 27.85

‚ 18.18 ‚ 81.82 ‚

‚ 25.00 ‚ 28.57 ‚

ƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

Total 16 63 79

20.25 79.75 100.00

Cochran-Armitage Trend Test

ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Statistic (Z) 0.0757

One-sided Pr > Z 0.4698

Two-sided Pr > |Z| 0.9397

There is no evidence of an association between activity level and whether one has an elevated heart rate at time 1, based on the Pearson chi-Square test (chi-square with 2 df = 0.4160, p=0.8122). The Cochran-Armitage test for trend is also not significant (Z=0.0757, p=0.9397). Since this is a Z-test, it does not have degrees of freedom associated with it. We use the two-sided p-value to be conservative). We conclude that there is not a trend in the proportion of students who have an elevated heart rate with increasing levels of activity per week. Note that the percentage of students with elevated heart rate for each of the activity levels does not show a consistently increasing or decreasing trend:

Activity level 1: 2/13 or 15.38%

Activity level 2: 10/44 or 22.73%

Activity level 3: 4/18 or 18.18%

The chart below (made in Excel) shows the lack of a trend in these percentages.

[pic]

Question 6: Cross-tabulation of RRAN by HI2.

Since we want to get the relative risk and the odds ratio, it is important that we have the row variable and column variable set up correctly. The row variable should be RRAN (which we can think of as the Risk Factor. The column variable should be HI2, which we can think of as the outcome. Since we have coded both of these variables so that 1=Yes and 2=No, we should be OK.

Table of rran by hi2

rran hi2

Frequency‚

Percent ‚

Row Pct ‚

Col Pct ‚1:High ‚2:Not Hi‚ Total

‚ ‚gh ‚

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

1:Yes ‚ 23 ‚ 7 ‚ 30

‚ 35.38 ‚ 10.77 ‚ 46.15

‚ 76.67 ‚ 23.33 ‚

‚ 74.19 ‚ 20.59 ‚

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

2:No ‚ 8 ‚ 27 ‚ 35

‚ 12.31 ‚ 41.54 ‚ 53.85

‚ 22.86 ‚ 77.14 ‚

‚ 25.81 ‚ 79.41 ‚

ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ

Total 31 34 65

47.69 52.31 100.00

Frequency Missing = 14

The percentage of those who ran who had high heart rate at time 2 is 76.67% (or 23/30 *100). The percentage of those who did not run who had high heart rate at time 2 is 22.86% (or 6/35*100). There was a highly significant association between running in place and having elevated heart rate at time 2 (Pearson chi-Square with 1 df = 18.7491, p ChiSq 0.6195

The Breslow-Day test statistic is a chi-square, under the null hypothesis, with 1 degree of freedom. The null hypothesis here is that the odds ratio for males is equal to the odds ratio for females. I would not reject the null (chi-square = 0.2465, p=0.6195) and conclude that the odds ratios for males and females are not significantly different.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download