Lab Objectives - Stanford University



Lab Objectives

After today’s lab you should be able to:

1. Practice using PROC PHREG.

2. Obtain influence and diagnostic residuals from PROC PHREG.

3. Evaluate PH assumption using Schoenfeld residuals.

4. Add regression lines to PROC GPLOT using the I=option: add linear regression line and polynomial regression line with confidence intervals; add splines.

5. Practice time-dependent variables.

6. Practice an analysis using a new dataset.

SAS PROCs SAS EG equivalent

PROC LIFETEST Analyze(Survival Analysis(Life Tables

PROC PHREG Analyze(Survival Analysis(Proportional Hazards regression

PROC REG Analyze(Regression ((Linear Regression)

PROC GPLOT Graph ((Line Plot)

LAB EXERCISE STEPS:

Follow along with the computer in front…

1. Go to the class website: stanford.edu/~kcobb/courses/hrp262

Lab 3 data( Save(Save on your desktop as uis (SAS format)

Lab 4 data( Save(Save on your desktop as runners (SAS format)

2. Open SAS EG: From the desktop( double-click “Applications”( double-click SAS Enterprise Guide 4.2 icon

3. Click on “New Project”

4. You DO NOT need to import the datasets into SAS, since the datasets are already in SAS format (.sas7bdat). You DO need to name a library that points to the desktop, where the datasets are located. Assign the library name hrp262 to your desktop folder: [pic]Tools(Assign Project Library

Name the library HRP262 and then click Next.

[pic]

Browse to find your Desktop. Then Click Next.

[pic]

Click Next through the next screen.

[pic]

Click Finish.

[pic]

5. Confirm that you have created an hrp262 library that contains the SAS datasets uis and runners.

6. Recall last week we learned 2 methods for testing the PH assumption, (log(-log S)) plots and time*predictor interaction terms. This week we learned a third method, a plot of Schoenfeld residuals. We can perform this in EG using PH regression with a slight modification of the code to generate the desired residuals:

7. Test PH assumption for the UIS dataset using Schoenfeld residuals.

With the UIS dataset open, Select Analyze(Survival Analysis(Proportional Hazards regression:

[pic]

Drag time as your Survival time variable and Censor as your Censoring variable. Make sure to set the censoring value to 0. Add age, treat, becktota, and ndrugtx as your Explanatory variables.

[pic]

Under Results we can request Survival estimates, including Schoenfeld residuals (and a number of other estimates we won’t use for the moment). These will be output to a new dataset.

[pic]

Click Run.

Click on the Output Data tab and scroll to the right of your dataset. By default, EG only estimates the Schoenfeld residuals for your first explanatory variable (in this case, age).

[pic]

To get the residuals for the rest of our predictors, we have to modify the code very slightly. Click on the Code tab. Start typing anywhere and EG will ask you if you want to create a modifiable copy of the code. Click Yes.

[pic]

The culprit is the line of code under the PHREG model, RESSCH = _RESSCH1 - RESSCH1. We need to modify the numbers to reflect the number of explanatory variables we have (in our case, 4). In fact, we can change these titles to anything we want, to reflect that it is a Schoenfeld residual for a given predictor variable.

[pic]

Let’s modify the code so that we calculate Schoenfeld residuals for each of our predictors. Note that the order you list the variable names must be the same order that they are included in the PH model (in the code above).

[pic]

Once you have changed this line of code, click Run

After the proc has completed, click on Output Data. Your new variables for Schoenfeld residuals should have been created:

[pic]

FYI, this is MUCH EASIER done with SAS code:.

goptions reset=all;

proc phreg data=hrp262.uis;

title 'Schoenfeld residuals';

model time*censor(0)= age treat becktota ndrugtx/risklimits;

output out=outdata3 ressch=schage schtreat schbeck schtx;

run;

proc print data=outdata3;

run;

Schoenfeld residuals

Obs time censor age treat becktota ndrugtx schage schtreat schbeck schtx

1 188 1 39 1 9.000 1 6.6378 0.50174 -8.4672 -3.8568

2 26 1 33 1 34.000 8 1.1127 0.55672 15.8101 2.0560

3 207 1 33 1 10.000 3 0.5860 0.49680 -7.1708 -1.7670

4 144 1 32 0 20.000 1 -0.3562 -0.50575 2.6841 -3.9408

5 551 0 24 1 5.000 5 . . . .

6 32 1 30 1 32.550 1 -1.8788 0.55068 14.4386 -4.9026

7 459 1 39 1 19.000 34 6.5956 0.52044 1.3119 29.9617

8 22 1 27 1 10.000 2 -4.8943 0.56126 -8.2632 -3.9763

9 210 1 40 1 29.000 3 7.5668 0.50162 11.7884 -1.7998

10 184 1 36 1 25.000 7 3.6269 0.49483 7.6211 1.8979

11 5 1 35 1 . 12 . . . .

12 212 1 38 1 18.900 8 5.5427 0.50262 1.6679 3.1684

13 87 1 29 1 16.000 1 -3.1538 0.54049 -1.6367 -4.1958

14 598 0 32 1 36.000 2 . . . .

15 260 1 41 1 19.000 8 8.3894 0.48413 1.5622 3.2441

8. PLOT Schoenfeld residuals against time with a smooth line (cubic spline) fit to the points. Again, this is much easier done with code especially since we can generate multiple plots simultaneously. However, the EG version is straightforward. While you are still in Output Data, click on Graph > Line Plot.

[pic]

Choose Smooth Plot as your type of plot.

[pic]

Under Data, plot time (Horizontal) by schage (Vertical).

[pic]

Under Plots you can change the appearance of your graph. We will choose a Dot as the symbol for each data point, with a Height (size) of 5. Feel free to exercise your artistic license (

[pic]

Under Interpolations, make sure that your Interpolation method is Smooth. The smoothing value we will use is 60. This is somewhat arbitrary. A value of 1 basically connects all the points, very curvy; a value of 99 basically gives you a line; a value of 60 gives you something in-between. Also make sure that you have checked Sort data by the Horizonal column.

[pic]

Click Run.

[pic]

We will have to repeat this procedure for each Schoenfeld plot we want to obtain. Instead, we can obtain plots simultaneously for each of our predictors in code:

axis1 label=(angle=90);

axis2 order=(0 to 800 by 100);

proc gplot data=outdata3;

plot schage*time schtreat*time schtx*time schbeck*time /

vaxis=axis1 haxis=axis2 ; *do not overlay, get four separate plots;

symbol1 value=dot h=1 i=sm60S;

run;

[pic][pic]

[pic][pic]

9. We can also use linear regression with Schoenfeld residuals regressed against time to test the PH assumption. Significant correlation with time indicates a violation of the PH assumption (similar to results of plots above, but might miss non-linear patterns).

Under your Output Data, click on Analyze > Regression > Linear Regression.

[pic]

Under Data, choose time as your Dependent variable and schage as your Explanatory variable (we will have to repeat this regression for each of our Schoenfeld estimates).

[pic]

Click Run.

Repeating this for the 3 other Schoenfeld variables (schtreat, schbeck, schtx) produces the following results:

|Parameter Estimates |

|Variable |

|Variable |

|Variable |

|Variable |

|Parameter |

|Test |Chi-Square |DF |Pr > |

| | | |Chi-Square |

|Log-Rank |6.5419 |2 |0.0380 |

|Wilcoxon |7.8842 |2 |0.0194 |

|-2Log(LR) |6.4195 |2 |0.0404 |

[pic]

In code:

proc lifetest data=hrp262.runners plots=(s) graphics censoredsymbol=none maxtime=24;

time time*sf1(0);

strata calc1(1000,1500);

title 'Fracture-free survivorship by calcium level';

symbol1 v=none c=black w=2 i=join line=1;

symbol2 v=none c=black w=2 i=join line=2;

symbol3 v=none c=black w=2 i=join line=3;

run;

[pic]

7. Obtain hazard ratio for baseline calcium, controlling for bone mineral content of the skeleton and stratified on menstrual status. Also, obtain schoenfeld residuals and plot these against time.

Can you set up the appropriate analysis(es) in EG?

In code:

proc phreg data=hrp262.runners;

model time*sf1(0)=calc1 bmc1/ risklimits;

strata mencat1;

output out=outdata ressch=schcalc;

run;

axis1 label=(angle=90);

axis2 order=(0 to 24 by 2);

proc gplot data=outdata;

plot schcalc*time /

vaxis=axis1 haxis=axis2 ; *do not overlay, get four separate plots;

symbol1 value=dot h=1 i=sm60S;

run;

The PHREG Procedure

Analysis of Maximum Likelihood Estimates

Parameter Standard Hazard 95% Hazard Ratio

Variable DF Estimate Error Chi-Square Pr > ChiSq Ratio Confidence Limits

calc1 1 -0.00148 0.0006907 4.6185 0.0316 0.999 0.997 1.000

bmc1 1 -0.00131 0.00102 1.6374 0.2007 0.999 0.997 1.001

[pic]

APPENDIX: Interpolation Options on the Symbol statement

I=interpolation

specifies whether to leave the plotted points unconnected or to connect the plotted points with either straight lines or a smoothed line; use regression to fit a line to the points; or draw vertical lines connecting the points and the zero horizontal.

NOTE: You should sort by the X-variable as follows prior to invoking most interpolation options, or you will end up with more of a “splatter” plot!

[pic]

A. Basic Interpolation Options

I=NONE

requests that the points on the plot be left unconnected.

I=JOIN

requests that the points on the plot be connected by a straight line.

I=NEEDLE

draws a vertical line from each point on the plot to a horizontal line at zero on the Y axis.

I=STEPxx

requests that a step function be used to plot the data. When STEPL is used, the data point is on the left of the step; with STEPR, the point is on the right; with STEPC, the point is in the center of the step. If L, R, or C is not specified, L is assumed. Optionally, to connect the steps with a vertical line, follow the STEPx specification with a J.

I=Mxxxxx

fills a figure that your plotted points define. See the V= option in the PATTERN Statement for more information about the Mxxxxx value.

B. R-series Interpolation Option (regression lines)**

I=Rxxxxxxx

specifies the characteristics of the regression analysis to use for fitting a line to the plotted points. The regression equations can be linear, quadratic, or cubic; confidence limits can be drawn at one of three levels. The points on the plot do not necessarily fall on the regression line.

The form of the Rxxxxxxx value determines whether linear, quadratic, or cubic regression is used to fit a plot line; whether the intercept is set to zero to force the line through the origin; whether additional lines representing confidence limits should be drawn, and, if so, what confidence level should be used.

R x x x x x x x

L

Q

C

0

C L I

C L M

9 0

9 5

9 9

As shown above, valid values for the first variable after the R include L, Q, and C.

Confidence limits

When you request regression, you can ask that lines representing confidence limits for individual or mean predicted values be shown on the plot. Follow the L, Q, C, or zero in the Rxxxxxxx specification with the characters CLM to show confidence limits for mean predicted values; use CLI to show confidence limits for individual predicted values.

To specify the confidence level, include the numbers 90, 95, or 99 after CLI or CLM. If the confidence level is omitted, a value of 95 is used. The line style used for the confidence limit lines is determined by adding 1 to the L= value.

C. Spline Interpolation Options **

Several spline methods are available for smoothing points in a plot. SPLINE, which draws the smoothest line and is the least expensive of the non-trivial methods, is the method of choice for most plots. SM is the method to use for smoothing noisy data.

I=SPLINES

specifies that the plot line be smoothed using a spline routine. The points on the plot fall on the line. When you use I=SPLINE, the plot line is smoothed using a cubic spline method with continuous second derivatives. The polynomial passes through the plotted points and matches the first and second derivatives or neighboring segments at the points. If the values of the horizontal variable are not strictly increasing, the parametric interpolation method SPLINEP is used instead (Pizer, 1975).

I=SMxxS

specifies that a smooth line be fit to noisy data using a spline routine. The points on the plot do not necessarily fall on the line. Specifying I=SMxx results in fitting a cubic spline that minimizes a linear combination of the sum of squares of the residuals of fit and the integral of the square of the second derivative (Reinsch, 1967). The value xx can range from 01 to 99 and determines the relative importance of the two components: the larger the value, the smoother the fitted curve.

The letter S should be added to the end of the spline interpolation methods, as shown above, so that the procedure will automatically sort by the x axis variable before plotting!

(D. Interpolation options that we will use for continuous outcomes (gives mean and standard deviation/standard error of a continuous variable at each time point).

I=STDkxxx

is used when multiple Y values occur for each level of X and you want to join the mean Y value with (+ or -) 1, 2, or 3 standard deviations for each X. The value of k can be 1, 2, or 3 standard deviations. If you do not specify a value for k, the default value of two standard deviations is used. The xxx values can be M, P, J, B, and T. The sample variance is computed about each mean and from the sy, the standard deviation, is computed; or if you specify I=STDM, sy, the standard error of the mean, is computed.

If you specify I=STDP, sample variances are computed using a pooled estimate, as in a one-way ANOVA model.

If you specify I=STDJ, the means are connected from bar to bar. Use B to request bars (rather than lines) to connect the points for each X. T specifies that tops and bottoms should be added to each line.

B and T should not be used together, but other combinations of M, P, J, and B or T are acceptable.

Note: If VAXIS= is not specified, the vertical axis ranges from the minimum to maximum Y value in the data. If the requested number of standard deviations from the mean covers a range of values that exceeds the maximum or is less than the minimum, the STD lines are cut off at the minimum and maximum Y values. When this cutoff occurs, you should rescale the axis using the VAXIS= specification.

I=HILOxxx

is used when multiple Y values occur for each level of X. The value of xxx can be T, B, C, or J, or combinations of these letters (except T with B). When you specify I=HILOxxx, the minimum and maximum Y values at each X level are connected by a solid line.For each X value, the mean Y value is marked with a tick.)

-----------------------

No Schoenfeld residuals for censored observations

Test of Equality over Strata

Pr >

Test Chi-Square DF Chi-Square

Log-Rank 6.4934 2 0.0389

Wilcoxon 8.0359 2 0.0180

-2Log(LR) 6.4143 2 0.0405

With small numbers in each group, this is a pretty striking difference in fracture rates by calcium level. The significant log-rank test here means that at least one pair of groups is significantly different in fracture rate.

This asks SAS to divide into groups as follows: [-[pic],1000) [1000, 1500) [1500,[pic]).

I’ve added the option “maxtime=24” to cut the graph at 24 months of follow-up.

Formatting improves the appearance of the legend that appears on the Kaplan-Meier graph.

Notice that the curve plateaus after 24 months; that’s because participants were supposed to finish the study in 2 years. The flat line from 24 to 48 months represents a single participant who took much too long to finish and never had a fracture. When you see a flat curve at the end like this, it usually indicates that there are very few participants remaining in the study or remaining at risk, and does not represent a reduction in risk of the event.

Indicates that cumulative probability of fracture is about 20%.

Slightly different distribution of amenorrhea and oligomennorhea—could be an important confounder, as oligomenorrhea and amenorrhea increase fracture risk; and treatment efficacy may vary by menstrual regularity.

Asks for Schoenfeld residuals for each predictor

Controlling for bone mineral, and stratified on menstrual status, calcium significantly decreases fracture risk by 77% for every 1000mg/day:

[pic]

Slightly different distribution of amenorrhea and oligomennorhea—could be an important confounder, as oligomenorrhea and amenorrhea increase fracture risk; and treatment efficacy may vary by menstrual regularity.

There is evidence of violation of the PH assumption for the treat variable, as seen in plots (recall treat was better modeled as a time-dependent variable!)

With small numbers in each group, this is a pretty striking difference in fracture rates by calcium level. The significant log-rank test here means that at least one pair of groups is significantly different in fracture rate.

What does this point represent?

To avoid this:

PROC SORT data=YourData;

by Xvariable;

run;

Becktota

age

treat

ndrugtx

Note: graph for Schoenfeld residuals for the treat variable shows a clear pattern with time! Smoothing line helps you to visualize this.

Syntax (see also: APPENDIX):

sm – requests smooth line be fit, using cubic spline routine

60 – next 2 digits (01-99 allowed) specifies smoothness of the curve

▪ 01 basically connects all the points, very curvy

▪ 99 basically gives you a line

▪ 60 gives you something in-between

S –tells SAS to sort your X-variable before plotting; absolutely essential or the plot will be a mess!

Before asking for a regression line plot, you must sort by the X-variable (unlike spline plots which did this automatically)

Syntax for Regression Lines:

R x x x x x x

L (linear)

Q (quadratic)

C (cubic)

0 (intercept thru 0)

C L I (prediction interval)

C L M (CI for mean)

9 0 (confidence level)

9 5

9 9

linear

quadratic

Cubic

(with confidence intervals for an individual rather than the mean)

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download