Biostat 510 - University of Michigan



Biostat 510: Statistical Computing Packages

SAS Homework 4

Due Tuesday, Oct 21, 2003

This homework assignment uses the data collected by Afifi and Azen in the Los Angeles County Hospital. It is a study of Shock, which is a condition that is manifest by low blood pressure and other symptoms. It is a very serious and potentially fatal condition that can be caused in a number of different ways. There are 6 different types of shock listed in the data set (actually 5 types and one category that is non-shock). Each shock type has a different etiology. Measurements of several physiological variables were made on 113 patients at two time points. The first set of measurements was made for each patient when they were admitted to the emergency department. For those who were discharged, the second set of measurements was the last set made prior to discharge. For those who died, the second set of measurements was the last set made prior to their death. We will be using logistic regression for this homework set to examine factors measured at time 1 that are associated with death in the emergency room.

Note: Throughout the homework, model the probability that Died=1. Any time you report a statistical test, please give the test statistic, its degrees of freedom and the p-value. Throughout the homework, use alpha = .05 to determine the significance of statistical tests. Whenever r-square is called for, use the maximum rescaled r-square.

1. Read in the raw data from the AFIFI.DAT file. It is one of the data sets contained in the data.exe archive on my web page. There is a codebook for this data set in Appendix III of your course-pack, giving the variable names, their column locations and the missing value codes. Be sure to read in all variables, and be sure to include decimal points in your code, when necessary. (The example code for reading in the data shows how this is done). Read in the data so that there are 113 cases read from the 226 lines of raw data.

a. Create a temporary data set called AFIFI, by reading in all of the variables, and set up decimals and missing values correctly.

b. Create new variables.

i. Create a new variable called Died that has a value of 1 if the patient died and a value of 0 if the patient lived.

ii. Create a new variable called Shock that has a value of 1 for those categories of Shoktype that represent Shock and has a value of 0 for the category that represents No Shock.

iii. Recode SBP1 into a new variable called Sbpcat, using the recodes as shown below:

Level of Sbpcat

1: SBP1 < 70

2: SBP1 from 70 to 89

3: SBP1 from 90 to 109

4: SBP1 from 110 to 129

5: SBP1 >=130

Be sure that patients who were missing for SBP1 are also missing for Sbpcat.

c. Get descriptive statistics for all numeric variables. Get frequencies for all categorical variables in your data set. Comment in your write-up about the values of these variables, the number of cases for each variable, etc.

d. Get a proc contents for the data set. Comment in your write-up about the features of this data set.

2. Comparison of a Crosstab (2 X 2 table) vs. Logistic Regression:

a. Create formats for the variables Shock and Died, using proc format. Set up these formats as shown below, so that the value 1 will be alphabetically first, based on the formatted value, and the value 0 will be alphabetically second.

proc format;

value shockfmt 1 = "A: Shock"

0 = "B: No Shock";

value diedfmt 1 = "A: Died"

0 = "B: Lived";

run;

b. Get a crosstab of Shock as the Risk factor and Died as the Outcome.

i. Use the formats that you created for these variables. In order to make the values come out in the desired order, use syntax something like that shown below:

proc freq order = formatted ;

ii. What percent of patients in Shock died? What percent of patients not in Shock died?

iii. Test whether there is an association between being in shock and dying. What do you conclude from this test?

iv. Get the odds ratio (OR) and relative risk for this table. What is the OR and it's 95% Confidence interval? How does this result support the statistical test that you carried out?

c. Run a logistic regression using Shock to predict Died. Remember, even though Shock is categorical, you do not need to include it in a class statement, because it is a single dummy variable (0, 1 coding). Be sure you are modeling the probability that Died=1. Do not use any formats for the Logistic Procedure!

proc logistic descending;

model died = shock / rsquare;

run;

i. What is the value of the parameter estimate for Shock? Please interpret this parameter estimate.

ii. Is Shock a significant predictor of Dying? Give a statistical test, it's degrees of freedom and it's p-value.

iii. What is the OR from this analysis, and it's 95% confidence interval. Compare this to the OR results from Proc Freq in part b.

iv. Compare the Likelihood Ratio chi-square test for this model to the Likelihood Ratio Chi-square test from Proc Freq in part b. What are their values, degrees of freedom and p-values?

v. Compare the Score test from this model with the Pearson chi-square test from Proc Freq in part b. What are their values, degrees of freedom and p-values?

vi. What is the value of the (maximum rescaled) r-square for this model?

vii. What do you conclude about the relationship between Shock and Dying from this analysis?

3. Comparison of a 6 X 2 crosstab with a Logistic Regression using a class variable.

a. Get a crosstab using Shoktype as the Risk factor and Died as the Outcome. No formats are necessary for this cross-tabulation.

i. What percent of patients in each level of Shoktype died?

ii. Test whether there is an association between the Shoktype and Died. What do you conclude based on this test?

b. Carry out a logistic regression with Shock Type as the predictor and Died as the outcome, using Proc Logistic.

i. Be sure to use a class statement for Shoktype. Use the first level of Shoktype (non-shock) as the reference category.

ii. What is the overall p-value for this model (use the likelihood ratio chi-square test). What is the maximum r-square for this model?

iii. Create a table with the parameter estimates, their standard errors, the Odds Ratios and their 95% Confidence intervals, the chi-square tests and the p-values for each parameter. Interpret this table in terms of the signs of the parameter estimates, the Odds Ratios and the p-values.

iv. Compare the Likelihood ratio test from this model to that from part a above. Compare the Score test from this model to the Pearson chi-square from part a.

v. What do you conclude about the relationship between being in the different types of shock and dying from this model?

vi. Compare the logistic regression model from question 2 to the one for this question. Which model would you prefer to use? Why?

4. Logistic regression using a continuous predictor.

a. Run a logistic regression, with Died as the dependent variable and systolic blood pressure at time 1 (Sbp1) as the predictor.

b. What is the overall model significance? What is the model r-square?

c. Interpret the parameter estimate and the OR for Sbp1. What does the negative coefficient mean?

d. Is Systolic blood pressure at entry into the emergency room a significant predictor of mortality? What do you conclude about the relationship between Sbp at time 1 and mortality?

e. Compare this model to the logistic regression in this model to the model in question 2. Is Sbp1 a better predictor of mortality than Shock? Why or why not?

5. Ordinal Categorical variable as a predictor in Logistic Regression.

a. Run a logistic regression with Sbpcat as the predictor and Died as the outcome.

b. Use a class statement for Sbpcat, using the first level of Sbpcat (the lowest level of Sbp) as the reference.

c. What is the overall significance of this model? The model r-square?

d. What do the parameter estimates suggest about the relationship of the log odds of dying to increasing levels of Sbp at time 1?

e. Make a small graph (by hand is OK, but include in your write-up) of the parameter estimates for each level of Sbpcat. Make sure you include the parameter value for the reference category (it will always be zero!). What do you conclude about the relationship between sbpcat and the log odds of dying from this graph? Do you think that there is a linear relationship between the levels of Sbp at time 1 and the log odds of dying?

f. Does this graph suggest any other things you might want to investigate?

g. Rerun this same model, but now change the Class statement, so that the reference category is the last category of Sbpcat. Compare the overall model significance and model r-square from this setup to the earlier model for this question.

h. Interpret the parameter estimates from this model. What are they now comparing?

6. Run the same logistic regression using Proc Genmod and compare output.

a. Run the same logistic regression model as in question 5 above. Note, the class statement in Proc Genmod is somewhat different than in Proc Logistic.

proc genmod descending;

class sbpcat;

model died = sbpcat / dist=bin type3;

run;

b. Compare the output from this model to that from Proc Logistic in question 5. What are the similarities? What are the differences? Compare especially the type III test from Proc Genmod to that from Proc Logistic. Note that the type III tests from Proc Genmod are based on a Likelihood Ratio chi-square test, while that from Proc Logistic is based on a Wald test. The likelihood ratio test is preferred.

c. Also, note the reference category for the class variable in Proc Genmod.

7. Logistic regression using several variables, both continuous and categorical.

a. Run a logistic regression with Age and Sex as predictors, to predict Died. Discuss the results from this model.

i. What is the overall model significance?

ii. What are the parameter estimates for each predictor. Please interpret them.

b. Run another logistic regression with Age, Sex, Shock, and Mean Arterial Pressure at time 1 (MAP1) as the predictors.

i. What is the overall model signficance?

ii. What are the parameter estimates for each predictor. Please interpret them. How do the parameter estimates, and the significance level of the variables Age and Sex compare in this model to those in the earlier model?

8. Run a stepwise selection procedure for a logistic regression.

a. Carry out a stepwise logistic regression, in which all the physiological measurements made at time 1 are potential predictors in the model.

proc logistic descending;

model died = age sex sbp1 urine1 bsa1 map1 heart1

dbp1 cvp1 cardiac1 apptime1 circ1 plasma1 redcell1

hgb1 hct1

/ selection = stepwise details rsquare;

run;

b. Note: the details option will give you information on each step of the stepwise logistic regression, but you only need to include output from the last step.

c. Put a table in your output reporting the final model that was selected by SAS, using the default selection criteria. Give all the variables included in the final model, their parameter estimates, standard errors, chi-square tests, odds ratios and p-values.

d. What is the overall significance of this final model? What is it's r-square?

9. Run another stepwise logistic regression, in which all the physiological measurements made at time 1 are potential predictors in the model, but using Selection = backward.

a. Put a table in your output reporting the final model that was selected using this method. Give all the variables included in the final model, their parameter estimates, standard errors, chi-square tests, odds ratios and p-values.

b. What is the overall significance of this final model? What is it's r-square?

c. Why do you think this model is so different from that chosen in question 8 above?

10. Check all your commands and rerun them to be sure they all run without any errors. Save your command file as hw4.sas. Be sure to hand in your commands along with your write-up.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download