STATISTICAL ANALYSIS OF SURVIVAL DATA

STATISTICAL ANALYSIS OF SURVIVAL DATA

IN CLINICAL RESEARCH 3

INTERACTIONS – EFFECT MODIFICATION

Consider again the effects of type of infusion and anti-coagulation treatment on the rate of occurrence of it: In the analysis day 2 page 34-37 the effect of AK-treatment was the same in the two groups (described by the multiplier [pic]). Is this a reasonable assumption?

Statistical Model: Two factors with interaction

Reference group: Na-lactate without AK-treatment

event-rate for reference group: [pic]

Assume:

* If NA-lactate is replaced by Glucose the rate is changed by a factor [pic].

* If the patient receive AK treatment the rate is changed by a factor [pic].

* If the patient receive both AK treatment and Glucose the rate is further changed by a factor [pic]

Then the event rates in the 4 group become

| |- anti.coag. |+ anti-coag. |

|Na-lactate |[pic] |[pic] |

|Glucose |[pic] |[pic] |

Effect of AK-treatment in Na-lactate group: [pic]

Effect of AK-treatment in Glucose group: [pic]

No interaction corresponds to [pic]

Using dummy variables to represent the model.

Define [pic], [pic] and [pic]

|Infusion |Anti-coag. |[pic] |[pic] |[pic] |

|Na-lactate |no |0 |0 |0 |

|Na-lactat |yes |0 |1 |0 |

|Glucose |no |1 |0 |0 |

|Glucose |yes |1 |1 |1 |

The model may then be written as

[pic]

where [pic]

No interaction corresponds to [pic]

NOTE: Interaction means that the effect of one variable on the occurrence of events depends on the value of another variable (i.e. the effect is modified the other variable

STATA commands (the two versions produce the same output)

xi:stcox i.group i.ak i.group*i.ak

xi: stcox i.group*i.ak

Output (selected parts only):

i.group _Igroup_1-2 (naturally coded; _Igroup_1 omitted)

i.ak _Iak_0-1 (naturally coded; _Iak_0 omitted)

i.group*i.ak _IgroXak_#_# (coded as above)

************ output omitted here **************

No. of subjects = 85 Number of obs = 85

No. of failures = 27

Time at risk = 9532

LR chi2(3) = 6.64

Log likelihood = -98.231844 Prob > chi2 = 0.0845

--------------------------------------------------------------

_t |Haz. Ratio Std. Err. z P>|z| [95% Conf. Int]

-------------+------------------------------------------------

_Igroup_2 | 2.2942 1.052591 1.81 0.070 .9334595 5.63855

_Iak_1 |.3840848 .4105318 -0.90 0.371 .0472731 3.12061

_IgroXak_2_1 |2.904121 3.630981 0.85 0.394 .2504779 33.6713

--------------------------------------------------------------

Comments to output

1. Note that the interpretation of the parameters now is different: [pic] describes the difference between the NA-lactate and Glucose among patients without AK. The difference for patients with AK is [pic].

A similar change in interpretation applies to [pic]

2. The relative rate ratio [pic] for the interaction term is not significantly different from 1 (p = 0.39). The likelihood ratio test becomes 0.81 (computed as 2[98.6349 – 98.2318], see also day 2, page 36) gives the same conclusion. The assumption of no interaction can therefore not be rejected.

3. Note however the extremely wide confidence interval for [pic]. Based on 27 events among 85 patients very little can be said about interactions.

CHECKING THE VALIDITY OF THE MODEL - MORE

PREVIOUSLY (day 2, page 32-33): Checking the validity of a model with a single covariate.

HERE: Checking the validity of a model with several explanatory variables.

Example

Consider a model in which the risk of developing it depends on type of infusion, AK-treatment and sex of the patient.

The variables group, ak and sex are covariates in the model.

xi: stcox i.group i.ak i.sex

This model defines 8 [pic] different subgroups of patients. In the model the 8 different event rates are assumed to be proportional.

Checking proportionality:

For each factor (group, ak and sex) consider the model in which this factor has been removed as a covariate and instead included as a stratification.

In a stratified Cox regression model a separate baseline hazard rate is estimated for each stratum.

A log-minus-log plot for a stratified Cox model will show if the proportionality between levels of the stratifying factor is reasonable after correction for the remaining factors.

STATA’s command stphplot has an option which allows adjustment for other factors.

STATA commands for checking proportionality in the present example. Each variable in turn are used for stratification while adjusting for the remaining variables.

stphplot , by(group) adjust(ak sex) nolntime

stphplot , by(ak) adjust(group sex) nolntime

stphplot , by(sex) adjust(group ak) nolntime

Output

Stratified on treatment groups:

[pic]

Stratified on anti-coagulation treatment

[pic]

Stratified on patient’s sex

[pic]

Testing the proportional hazards assumption with STATA.

STATA can perform a formal test of the proportional hazards assumption based on so-called Schoenfeld residuals (overall test) and scaled Schoenfeld residuals (separate test for each variable in the model). These residuals must be saved in the stcox command which fits the model to be validated.

Example

stcox group ak sex , ///

sch(res*) sca(sres*) nolog noshow

stphtest , detail

Options nolog noshow minimize output from stcox. Alternative: To omit output from stcox completely add quietly in front.

OUTPUT from stphtest

Test of proportional hazards assumption

Time: Time

-----------------------------------------------------

| rho chi2 df Prob>chi2

-------------+---------------------------------------

group | -0.16714 0.66 1 0.4176

ak | -0.20866 1.29 1 0.2562

sex | 0.03580 0.03 1 0.8553

-------------+---------------------------------------

global test | 1.84 3 0.6057

-----------------------------------------------------

Alternative procedures:

Use a time-dependent covariates to obtain a formal test of the proportional hazards assumption.

Note: The stcoxkm , by() command (day 2, page 33) allows only one variable at a time, so this command is less useful for models with several variables.

STRATIFIED COX REGRESSION MODELS

Use of stratified Cox models

If the effect of an important factor on the survival time is inadequately described by proportional hazards we may instead consider a model in which this factor enters as a stratifying factor with separate baseline hazards and the remaining factors are usual covariates.

The output contains no regression coefficient (or hazard ratio) for a stratifying factor, but the separate baseline hazards may be plotted an compared.

In STATA a this model is specifying by an option strata(varnames). Up to 5 stratifying factors are allowed.

Example

xi: stcox i.group , strata(sex)

COX REGRESSION WITH MANY COVARIATES

Typical Dataset:

For each patient: a waiting time t, a status d and socio-demographic, clinical and other variables, some of which may have impact on the prognosis.

Some problems:

1. How should the collected information be represented as covariates? (i.e. choice of categories, coding of variables, transformations of data etc.).

2. How many and which variables should be included?

3. Which variables should be investigated in details including scoring of information and interactions?

4. How should the results be presented?

Missing values

For some patients relevant information is missing. Only patients with complete information on the variables in the model are included in the analysis.

Missing values may therefore complicate the analysis and the interpretation of the results considerably.

SELECTION OF VARIABLES

Two main strategies:

1. Forward selection

2. Backward elimination

Forward selection

1. Start out with no variables in the model

2. Check each variable separately and include the variable that is most statistically significant.

3. Check the remaining variables one at the time together with the variables selected to far. Add the most significant to the model.

4. Continue this procedure until no new variable is statistically significant.

Backward elimination

1. Start out with all variables in the model

2. Check each variable and exclude the most non-significant.

3. Re-estimate the effects of the remaining parameters and remove the variable which is now most non-significant.

4. Continue until all remaining variables are statistically significant.

Hybrid methods exist – stepwise procedures – that allow both inclusion and removal of variables in each step.

STATA

In STATA the command sw can be placed in front of any regression command, including stcox. Options are used to specify details of the variable selection procedure.

Examples

//backward elimination, sig. level 0.1, Wald’s test

sw stcox group ak sex age , pr(0.1)

//forward selection, sig. level 0.05, Wald’s test, sex and age evaluated together

sw stcox group ak (sex age) , pe(0.05)

//backward stepwise, based on likelihood ratio test

sw stcox group ak sex age , pr(0.1) pe(0.05) lr

//forward stepwise, hierarchical (=specified order)

sw stcox group ak sex age , pr(0.1) pe(0.05) forward hier

//backward elimination, group and ak forced in model

sw stcox (group ak) sex age , pr(0.1) lock

Forward selection or Backward elimination?

Both approaches have drawbacks:

* In forward selection all conclusion are based on comparison of ”wrong” models.

* Backward elimination may often be infeasible because of too many missing values.

* Both methods are automatic and treat all variables in the same way

Conclusion?

No nice and easy solution exists. An order of priority of the variables based on knowledge and insight is a must.

Recommendation

Never rely (completely) on automatic variable selection procedures.

PRESENTATION OF RESULTS

A table with only variable names and p-values does not say anything about the size or direction of the effects and is therefore is clearly inadequate.

Regression coefficients (with standard errors and

p-values) are rather uninteresting if you forget to explain how the variables are coded in the analysis.

The rate ratios - the [pic] or exp([pic]) – are usually easier to understand than the regression coefficients.

If the “final” model includes covariates [pic], one may compute an individual prognosis for a future patient with covariates values [pic].

For instance the 5 year survival probability for such a patient can be estimated

[pic],

where PI is the value of the prognostic index for the patient

[pic],

and [pic] is the estimated survival function for the reference group.

A table given an estimate of 5-years survival or an estimate of the median survival time for patients with certain characteristics may also be a useful way to illustrate the implications of the model.

Formulas for standard errors of these quantities are available, but the calculations are not included in the standard statistical software packages.

Other possibilities

A Plot of the estimated survival function for patients with a particular covariate.

A plot of the estimated 5-years survival probability or the median survival time against the prognostic index.

STATA

To obtain plots of survival function, integrated hazard or (smoothed) hazard rates for particular values of the covariates use the command stcurve after stcox. The corresponding baseline function must be specified and saved with stcox. Example:

quietly stcox group ak age , basesurv(surv0)

stcurve , survival at(group=1 ak=1 age=50)

After each fit stcox saves a large number of results in system variables that may be accessed and used for further calculations. Example:

mat define rc=e(b)

gen pi=rc[1,1]*(group==2)+rc[1,2]*(ak==1)+rc[1,3]*age

COX REGRESSION

TIME-DEPENDENT COVARIATES

In a standard Cox regression the user has two options:

1. The effect of a covariate is described by a single number, a rate ratio, giving the change of the rate if the covariate is increased by one unit.

For a dichotomous variate this is just the ratio of the rates in the two categories.

2. A covariate is used as a stratifying factor. The rate ratio between rates in different strata will then becomes an unspecified function of follow-up time

With time-dependent covariates the rate ratio may depend on follow-up time in a specified way. This may e.g. be used to obtain a statistical test of the proportional hazards assumption.

Example:

Occurrence of it in heart infarct patients (continued).

Based on a log-minus-log plot (see day 2 page 33, page 5 above) we concluded that the rates in the two infusion groups could be assumed to be proportional.

STATA has also a command, stphtest, giving a statistical test of the hypothesis of proportional hazard rates (see page 7).

An alternative test of the hypothesis of proportional hazards assumption can be established using a time-dependent covariate.

The idea is to fit a model in which the log(rate ratio) depends linearly on follow-up time and in this model test if the trend with follow-up time is significantly different from 0. This is accomplished using the option tvc.

Example

The following STATA command fit a model in which the regression coefficient (i.e. the log(hazard ratio)) is a linear function of follow-up time. The nolog option is included to omit output from the iterative estimation process.

xi: stcox i.group , tvc(i.group) nolog

Output

Cox regression -- Breslow method for ties

No. of subjects = 85 Number of obs = 85

No. of failures = 27

Time at risk = 9532

LR chi2(2) = 6.76

Log likelihood = -98.167651 Prob > chi2 = 0.0340

--------------------------------------------------------------

_t |Haz. Ratio Std. Err. z P>|z| [95% Conf. Int.]

----------+---------------------------------------------------

rh |

_Igroup_2 | 8.243608 9.967054 1.74 0.081 .770832 88.1607

----------+---------------------------------------------------

t |

_Igroup_2 | .9865033 .0134885 -0.99 0.320 .9604174 1.01300

--------------------------------------------------------------

Note: Second equation contains variables that continuously vary with respect to time; variables are interacted with current values of _t.

Comments:

1. The results show that the difference between the groups decreases with increasing follow-up time (the hazard ratio is less than 1 for _Igroup_2 in the section with time-varying estimates). This tendency is however not statistically significant (p = 0.32) so the data is consistent with the hypothesis of proportional rates in the two groups.

2. The regression coefficient estimated for group (in the first section) is considerably large than before, since the parameter now gives the hazard ratio at the start of follow-up (the intercept at time = 0), and the negative trend with time therefore introduce this change. Note also that the standard error on this estimate is large, since the shortest follow-up time is 28.5 hours.

3. Since this analysis does not reject proportional hazards one would then go on to fit the simpler model without time-dependence, i.e. the analysis shown on day 2, page 29-31.

Occasionally, one may want to consider other forms of dependence on time, e.g. log(time). STATA has an option texp which allows specification of this.

Example

xi: stcox i.group , tvc(i.group) texp(ln(t))

Other uses of time dependent covariates

1. Time-dependent categorical variables

In some studies one may want to assess if occurrence of other events during follow-up, e.g. a specific complication or treatment, influences the subsequent rate of occurrence of the endpoint. Example: survival of heart transplant patients measured from enrollment in the study.

2. Effect of covariates that are updated

during follow-up

In some studies we want to describe effect of diagnostic variables that are measured regularly during follow-up. Rather than studying the prognostic value of the baseline measurement, we may want to estimate a “dynamic effect” using the latest available measurement.

3. Cumulative exposure

In some studies we want to evaluate the effect of an exposure that accumulates during follow-up.

Example: In a study of cancer mortality among radar operators in the army one may want to relate to the mortality rate at time t to the accumulated dose received prior to time t (perhaps allowing for some latency).

Warning

Interpretation of the results of analyses with time-dependent covariates may be complex.

TIME-DEPENDENT VARIABLES WITH STATA

STATA’s approach to fitting models with time-dependent covariates differs from most other statistical programs.

Most programs (incl. SPSS) use a special fitting algorithm which is considerably slower than the fitting algorithm used with no time-dependent covariates.

STATA, on the other hand, splits the follow-up time of each individual up into intervals on which the time-dependent covariate is constant, and the usual algorithm is therefore applicable for the new, larger dataset. This approach is advantageous for some problems, but not for others.

STATA’s approach is well suited for problems involving time-dependent categorical variables and covariates that are updated during follow-up, but models with accumulating exposure are often easier to define and analyze using the approach used by other programs.

The following examples explain the approach used by STATA in the analysis with a time-dependent categorical variable.

Example 1: Intraveneous drops -continued

The standard Cox regression analysis showed that the it rate in the Glucose group was a factor 2.69 larger than the it rate in the Na-lactate group.

Above we used the tvc() option to assess if this effect varied with time.

Alternatively, we may split the follow-up period in two (or several) intervals, estimate an effect for each interval, and valuate if the effect varies significantly with follow-up period, i.e. perform a test of no interaction between treatment group and follow-up period.

The following commands define the data as survival time data and split the follow-up period at 80 hours.

stset time , failure(it==1) id(ptnr)

stsplit period , at(80)

Note:

• A variable identifying the individual patients (here ptnr) must be specified before splitting the data,

• A variable name (here period) must be given to the variate identifying the follow-up interval.

Output from stsplit

(56 observations (episodes) created)

The 56 patients with more than 80 hours follow-up are now represented by two records (lines) in the file: one for the first period and one from the second period.

We may now consider

1. xi: stcox i.group

2. xi: stcox i.group if period==0

3. xi: stcox i.group if period==80

4. xi: stcox i.period*i.group

The first command is just to check that the standard comparison is unchanged by the splitting.

The last command evaluates the interaction between period and group, i.e. provide a test of the proportional hazrd assumption

Output (only the final parts)

command 1

--------------------------------------------------------------

_t | Haz.Ratio Std. Err. z P>|z| [95% Conf. Int]

----------+---------------------------------------------------

_Igroup_2 | 2.689008 1.171293 2.27 0.023 1.14504 6.314854

command 2

--------------------------------------------------------------

_t | Haz.Ratio Std. Err. z P>|z| [95% Conf. Int]

----------+---------------------------------------------------

_Igroup_2 | 3.079946 1.778889 1.95 0.051 .992918 9.553728

--------------------------------------------------------------

command 3

--------------------------------------------------------------

_t | Haz.Ratio Std. Err. z P>|z| [95% Conf. Int]

----------+---------------------------------------------------

_Igroup_2 | 2.202925 1.506434 1.15 0.248 .576659 8.415503

--------------------------------------------------------------

command 4

--------------------------------------------------------------

_t | Haz.Ratio Std. Err. z P>|z| [95% Conf. Int]

----------+---------------------------------------------------

Iperiod_80| 2.71834 . . . . .

_Igroup_2 | 3.079946 1.778889 1.95 0.051 .992918 9.553728

IperXgr_~2| .715248 .6402239 -0.37 0.708 .123748 4.13403

--------------------------------------------------------------

Comments

Command 1:

We get the same result as in the analysis of the data before splitting the follow-up times (day 2, page 30). On the full follow-up period the hazard raito is estimated to 2.69.

Command 2 & 3:

As anticipated, the hazard ratio is slightly larger, 3.08, in the first period (0 – 80 hours) than in the second period, 2.20.

Command 4:

The first parameter (_Iperiod_80) is redundant and should be ignored.

The second parameter (_Igroup_2) gives the hazard ratios between the the two treatments in the first period (cf. command 2)

The third parameter (_IperXgro_~2) is the ratio of the hazard ratio in the second period to that in the first period: 0.715 = 2.20/3.08, and the p-value (0.708) refers to the test of the hypothesis of no interaction between period and group, i.e. that the ratio of the hazard ratios is 1, or the hypothesis that the hazard ratio is independent of time. We see that the data do not contradict this hypothesis.

Example 2: Stanford Heart Transplant data

The file stanford.dta contains data on the survival of 103 heart transplant patients accepted for surgery at Stanford University Hospital from late 1967 to early 1974. In the present dataset all patients are followed until April 1 1974.

Once accepted a patient was put on a waiting list and a search started for a suitable donor heart. If the patient was still alive a transplant was performed when a donor heart was found. A few patients were transferred out of the program before receiving a transplant.

Problem:

Does a heart transplant prolong the life of a patient?

The situation can be represented by the following figure:

[pic]

Note that transplant status is time-dependent. Comparing the survival of patients who received a new heart with those who did not is therefore not meaningful.

All patients are initially in the state “Accepted” waiting for a heart transplant. If a patient survives long enough he/she receives a new heart and therefore changes transplant status.

The data

In the file stanford.dta the follow-up history of a patient is captured by the following variables

id patient identification number

wait time (in days) from entry until surgery,

wait is 0 if no transplant was perform.

transplant q if the patient received a transplant,

0 otherwise.

stime time (in days) from entry until death or

April 1 1974.

died 1 if dead, 0 otherwise

Some examples of follow-up histories

. list id wait transplant stime died ///

if id==5|id==20|id==50|id==100|id==102

+--------------------------------------+

| id wait transp~t stime died |

|--------------------------------------|

14. | 102 0 0 11 0 |

20. | 5 0 0 18 1 |

23. | 20 1 1 28 1 |

31. | 100 38 1 39 0 |

93. | 50 83 1 979 1 |

+--------------------------------------+

A simple model for the effect of a heart transplant on survival/mortality:

The effect of a transplant at some point in time s is to change the mortality rate at time t ([pic]) by a factor [pic], i.e. [pic].

A value of [pic] less/greater than 1 reflects a better/worse prognosis after a heart transplant.

By introducing a time-dependent dummy variable

[pic]

the the mortality rate be represented in the usual form [pic] with [pic] and [pic].

For a patient who survives the waiting period and receives a transplant the covariate is 0 from acceptance until transplant and from this time on the covariate is 1.

For a patient who dies while waiting for a suitable transplant the covariate is 0 all the time.

The problem can therefore be reduced to a problem involving a time-constant covariate if the follow-up time for each patient is split at the time of transplant.

Creating the new dataset

In the new dataset a patient receiving a transplant will have two records, a pre-transplant record and a post-transplant record, whereas a patient not receiving a transplant has only on record.

The following STATA commands set-up the new dataset.

expand 2 if transplant

* a dublicate record is inserted if transplant is

* 1, i.e. the file now has identical records for

* patients who had a transplant

bysort id: gen recno=_n

* the data are sorted on id and a new variable,

* recno, giving the record number (1 or 2) is created

gen posttrans=recno-1

* a variable, posttrans, taking the value 0 in the

* first record and 1 in the second record is created

by id: gen survtime=stime if _n==_N

* survime is a new variable to be used as time

* variable in the Cox regression.

* _n is a system variable giving the number of the

* current record for the given patient.

* _N is a system variabe ging the total number of

* record for a given patient.

* For patients with one record (_N=1), i.e. who never

* recieved a transplant survtime is equal to stime.

* For patients with two records (_N=2) survtime is

* set equal to stime in the last record.

by id: replace survtime=wait if _n==1 & transplant

* For patients with two records survtime is set

* equal to as wait in the first record.

by id: replace died=0 if _n==1 & transplant

* For patients with two records died is equal to 0

* in the first record (they survived the first period

* and recieved a transplant).

The new and expanded data set should now be declared as survival time data. Since some patients have more than one record the option id is used to identify records belong to the same patient.

stset survtime , failure(died) id(id)

Output

. stset survtime , failure(died) id(id)

id: id

failure event: died != 0 & died < .

obs. time interval: (survtime[_n-1], survtime]

exit on or before: failure

--------------------------------------------------------------

172 total obs.

2 multiple records at same instant PROBABLE ERROR

(survtime[_n-1]==survtime)

--------------------------------------------------------------

170 obs. remaining, representing

102 subjects

74 failures in single failure-per-subject data

31933 total analysis time at risk, at risk from t = 0

earliest observed entry t = 0

last observed exit t = 1799

We have a problem!

Inspection of the data reveals that patient 38 has two records (line 60 and 61). He first waits for 5 days, then has a transplant, but dies on the same day.

This is perfectly OK, except that STATA and other programs uses the following convention in case of ties: first events, then censoring and finally entries.

Patient 38 therefore dies in the second period before he enters that period and STATA complains.

The following fix solves the problem for now

replace survtime=5.1 in 61

After redoing the stset command we are now ready to perform the Cox regression analysis.

stset survtime , failure(died) id(id)

stcox age posttran surgery year, nolog noshow

The model includes four covariates

age age at acceptance

posttrans the (time-dependent) transplant status

surgery 1 if the patient had previously had surgery,

0 otherwise

year year of acceptance

Output

Cox regression -- Breslow method for ties

No. of subjects = 102 Number of obs = 170

No. of failures = 74

Time at risk = 31933

LR chi2(4) = 17.73

Log likelihood = -284.81589 Prob > chi2 = 0.0014

--------------------------------------------------------------

_t |Haz. Ratio Std. Err. z P>|z| [95% Conf. Int]

----------+---------------------------------------------------

age | 1.032286 .0145523 2.25 0.024 1.00416 1.06121

posttrans | .8951127 .2775564 -0.36 0.721 .487458 1.64368

surgery | .3779578 .1651275 -2.23 0.026 .160531 .889873

year | .8872802 .0602063 -1.76 0.078 776788 1.01349

--------------------------------------------------------------

Older patients have higher rates, patients do better over time and patients with previous surgery do better. Whether a patient ultimately receives a heart transplant does not seem to make much difference.

Note 1

Once the data have been reorganized and expanded so that covariates are constant within records STATA can compute a log rank test to assess the effect of a time-dependent categorical variable:

sts test posttrans , noshow

Output

Log-rank test for equality of survivor functions

| Events Events

posttrans | observed expected

----------+-------------------------

0 | 30 30.34

1 | 44 43.66

----------+-------------------------

Total | 74 74.00

chi2(1) = 0.01

Pr>chi2 = 0.9187

Note 2:

Also for this data set the command stsplit could be used to create a new dataset with multiple records per individual.

We shall consider stsplit again tomorrow in connection with Poisson regression.

SOME ADDITIONAL FEATURES IN STATA

Sofar, the following options have been introduced (note:

options are place after the comma):

nohr regression coefficients instead of

hazard ratios in output. (day 2, page 29)

Strata() separate baseline hazards for each

category of the stratifying factor(s).

Useful for modeling factors that changes

The event rate in a non-proportional

manner. (page 8)

basesurv() the baseline survival function is stored in

the new variate specified by the option.

Used e.g. prior to plotting the survival

curves for specified covariate patterns

(day 2, page 31)

noshow, nolog less important details of the output is

omitted (day 2, page 29, page 7)

tvc(), texp() used to specify a time-dependent

covariate (page 15-16)

schoenfeld(), scaledsch() two types of residuals are

stored in the specified variates. Used e.g. when testing the adequacy of the proportional hazards assumption. (page 7)

Additonal options for Stcox

robust specifies that so-called robust standard

errors are calculated instead of the usual

model-based standard errors. If an id variable has been set robust is clustered on this variable.

cluster() specifies the variable defining clusters.

Implies use of robust standard errors

clustered on this variable.

estimate force fitting the null model, i.e. a model with no covariate. Used mainly to obtain the value of the log-likelihood function for

the null model.

offset() specifies a variable that is to be entered directly in the model with a fixed regression coefficient of 1.

level(#) specifies the confidence level in percent

shared(), frailty(), effects options used to define

Cox models with excess variability or correlated survival times. Used e.g. to model family data or data with multiple event per individual.

basehc(), basechazard() used in the same way as

basesurv() before plotting estimated hazards or estimated integrated hazards. Each option adds a new variable to data.

mgale(), esr()used for various diagnostic plots. Different types of residuals are added to data.

Short description of a few other st commands

stdes gives a short data-base description of the

basic features of data defined as survival

time data.

stsum summarizes basic statistical aspects of

data defined as survival time data

stgen generates variables reflecting the entire

history. Advanced version of sts gen

(day 2, page 17)

stci reports confidence intervals for means,

medians and percentiles of survival times

streg A comprehensive set of parametric

survival models can be fitted with this

command

stvary reports which covariates vary over time.

General pre and post-estimation commands

estimate a suite of commands that allow the user to

display and store specific parts of the output. Example:

quietly stcox group ak

estimate table , b se p

produce the following output

---------------------------

Variable | active

-------------+-------------

group | .97443927

| .43556917

| 0.0253

ak | -.25716537

| .54813542

| 0.6390

---------------------------

legend: b/se/p

lrtest used to compute likelihood ratio test for

nested models as was done manually on day 2 page 38. Example

xi: stcox i.group i.sex*i.ak

est store A

xi: stcox i.group

lrtrest A

Some useful general modifications of STATA commands

by varname: when placed in front of a command

separate analyses is done for each category of the specified variable

bysort varname: Similar to by varname, but data are

first sorted on the specified variable

quietly when placed in front of a command the output is suppressed (day 2, page 31)

in range, if logical expression when placed in before

the option comma the analyses is restricted to record in the specified range or records satisfying the logical expression. Examples:

stcox group ak if sex==1

Only males are included in the analysis

stcox group age in 1/50

The analysis is restricted to the first 50

records

COX LITE

In the first period the command sts test was used to do comparisons groups including stratification on confounding factors using the option strata(varlist).

The output gives the value of the log rank test (or some other specified test statistic) and the corresponding p-value, but no rate ratio estimates.

The command stmc performs similar analyses, but the output focuses on estimates of rate ratios and the corresponding confidence intervals.

The estimates reported are obtained by a method similar to the Mantel-Haenszel approach for analysis of several 2x2 tables (i.e. better than the QaD estimates, but not quite as efficient as maximum-likelihood estimates)

If a single categorical variable is of particular interest the stmc command therefore provide a simple way to get adjusted estimates of a rate ratio for this variable.

Example

To obtain an estimate of the rate ratio (Glucose relative Na-lactat) stratified for time and anti-coagulation treatment (ak) use

stmc group , by(ak)

Output:

. stmc group , by(ak)

failure _d: it == 1

analysis time _t: time

Mantel-Cox comparisons

Mantel-Haenszel estimates of the rate ratio

comparing group==2 vs. group==1

controlling for time (by clicks)

by ak

RR estimate, and lower and upper 95% confidence limits

+------------------------------+

| ak RR Lower Upper |

|------------------------------|

| no 2.524 0.955 6.672 |

| yes 2.764 0.306 24.977 |

+------------------------------+

Overall Mantel-Haenszel estimate, controlling for time and

ak

----------------------------------------------------------

RR chi2 P>chi2 [95% Conf. Interval]

----------------------------------------------------------

2.562 4.63 0.0314 1.053 6.232

----------------------------------------------------------

Approx chisq for unequal RRs (effect modification) 0.01

(1 df, p = 0.94116)

The results are very similar (but not identical) to those obtained from the Cox regression analysis of the same problem (day 2, page 36). Note, however, that no rate ratio is estimates for the stratifying factor(s). The chi-square test reported is the stratified log rank test.

THE SIZE OF THE TRIAL –

HOW MANY PATIENTS DO WE NEED?

STATA’s command for sample size calculations, sampsi, cannot be used for problems with censored data. A few commercial software packages permit power and sample size calculations for survival data, e.g. nQuery and Power and Precision.

Here: Sample size calculations based on the tables in Andersen & Væth (1984).

Set-up

• Randomized clinical trial.

• Survival endpoint.

• Comparison of two treatments

• 1:1 randomization – total study size rather robust to small deviation from this (1:2 – 2:1)

[pic]

Terminology

A = length of accrual period

F = length of follow-up period

n = total number of patients

[pic]= hazard rate in treatment group 1

[pic]= hazard rate in treatment group 2

Hazard rates assumed constant (i.e. survival according to an exponential distribution).

Effect measure

Hazard rate ratio = measure of treatment difference

[pic]

Note: The second and the third equality are valid only if the rates are constant (i.e. when survival times follow an exponential distribution).

Usually the value of [pic] considered reflects a “clinically relevant” treatment difference, i.e. a treatment effect that we want to detect with a high probability.

SPECIFICATION OF THE TEST

Level of significance = Risk of type 1 error = [pic]

Power = 1- risk of type 2 error = 1-[pic]

Example: A typical scenario

Design: A = 3 years

F = 3 years

Traditional treatment:

Median survival = M1 = 4 years

New treatment:

Median survival = M2

Clinical relevant difference: A 50% increase in median survival time with high probability, i.e.

[pic]

Basic steps in the sample size determination

1. Calculate

Average value of medians

[pic]

and

[pic]

[pic]

2. Use Table 6.1 (see page 39)

Find the probability that a randomly selected patient in the trial will die = the average probability of “event” while on study. From Table 6.1

p=0.460

The expected number of events in a trial with a total of n patients:

Expected number = n∙p = n∙0.460

3. Use Table 6.2 (see page 39)

Table 6.2 gives the expected number of events needed to achieve the specification defined by[pic].

If [pic], then a total of 320 events are needed.

4. Putting the pieces together

To determine the total sample size find n from

Expected number of events = 320 = n∙0.460

i.e.

[pic]

10 new patients per month in each treatment group for three years

F/A |0.0 |0.5 |1.0 |1.5 |2.0 |2.5 |3.0 |3.5 |4.0 |4.5 |5.0 | |A/M | | | | | | | | | | | | |0.5 |0.155 |0.289 |0.402 |0.497 |0.577 |0.645 |0.701 |0.749 |0.789 |0.822 |0.851 | |1.0 |0.279 |0.490 |0.639 |0.745 |0.820 |0.872 |0.910 |0.936 |0.955 |0.968 |0.977 | |1.5 |0.378 |0.630 |0.780 |0.869 |0.922 |0.954 |0.973 |0.984 |0.990 |0.994 |0.997 | |2.0 |0.459 |0.729 |0.865 |0.932 |0.966 |0.983 |0.992 |0.996 |0.998 |0.999 |0.999 | |2.5 |0.525 |0.800 |0.916 |0.965 |0.985 |0.994 |0.997 |0.999 |1.000 |1.000 |1.000 | |3.0 |0.579 |0.851 |0.947 |0.981 |0.993 |0.998 |0.999 |1.000 |1.000 |1.000 |1.000 | |3.5 |0.624 |0.888 |0.967 |0.990 |0.997 |0.999 |1.000 |1.000 |1.000 |1.000 |1.000 | |4.0 |0.662 |0.915 |0.979 |0.995 |0.999 |1.000 |1.000 |1.000 |1.000 |1.000 |1.000 | |4.5 |0.694 |0.936 |0.986 |0.997 |0.999 |1.000 |1.000 |1.000 |1.000 |1.000 |1.000 | |5.0 |0.720 |0.951 |0.991 |0.998 |1.000 |1.000 |1.000 |1.000 |1.000 |1.000 |1.000 | |F/A |0.0 |0.2 |0.4 |0.6 |0.8 |1.0 |1.2 |1.4 |1.6 |1.8 |2.0 | |A/M | | | | | | | | | | | | |0.2 |0.066 |0.092 |0.112 |0.141 |0.164 |0.187 |0.209 |0.231 |0.252 |0.272 |0.292 | |0.4 |0.127 |0.174 |0.218 |0.261 |0.300 |0.338 |0.374 |0.408 |0.440 |0.470 |0.498 | |0.6 |0.182 |0.247 |0.307 |0.363 |0.413 |0.460 |0.503 |0.543 |0.579 |0.613 |0.644 | |0.8 |0.232 |0.313 |0.385 |0.450 |0.507 |0.559 |0.605 |0.647 |0.684 |0.717 |0.747 | |1.0 |0.279 |0.372 |0.453 |0.524 |0.586 |0.639 |0.686 |0.727 |0.762 |0.793 |0.820 | |1.2 |0.321 |0.425 |0.513 |0.588 |0.651 |0.704 |0.750 |0.788 |0.821 |0.848 |0.871 | |1.4 |0.360 |0.473 |0.566 |0.642 |0.706 |0.757 |0.800 |0.835 |0.865 |0.888 |0.908 | |1.6 |0.396 |0.516 |0.612 |0.689 |0.751 |0.801 |0.840 |0.872 |0.898 |0.918 |0.934 | |1.8 |0.429 |0.555 |0.635 |0.730 |0.789 |0.836 |0.872 |0.900 |0.922 |0.940 |0.953 | |2.0 |0.459 |0.590 |0.689 |0.765 |0.822 |0.865 |0.897 |0.922 |0.941 |0.955 |0.966 | |Table 6.1. Average probability of event as a function of A/M and F/A. Lower part of the table gives further details for values in the upper left part of the upper table.

Alpha | | |0.05 | | | | |0.01 | | | |power |0.25 |0.50 |0.75 |0.90 |0.95 |0.25 |0.50 |0.75 |0.90 |0.95 | |Theta | | | | | | | | | | | |1.10 |730 |1700 |3100 |4650 |5750 |1600 |2950 |4700 |6600 |7850 | |1.20 |200 |465 |840 |1270 |1570 |440 |800 |1280 |1800 |2150 | |1.30 |96 |225 |410 |615 |760 |215 |390 |615 |865 |1040 | |1.40 |59 |136 |250 |375 |460 |128 |235 |375 |530 |630 | |1.50 |41 |94 |170 |260 |320 |88 |165 |260 |365 |435 | |1.60 |30 |70 |126 |195 |240 |66 |121 |195 |270 |325 | |1.70 |24 |55 |99 |150 |185 |52 |95 |155 |215 |255 | |1.80 |20 |45 |81 |122 |151 |42 |77 |123 |175 |210 | |1.90 |16 |38 |68 |103 |127 |36 |65 |103 |145 |175 | |2.00 |14 |32 |58 |88 |109 |31 |56 |88 |124 |150 | |Table 6.2. The expected number of events as a function of the treatment effect theta, the significance level alpha and the power (1-beta).

Note:

The sample size calculations can also be used when the hazard rates are not constant as long as the effect measure is the hazard ratio [pic].

The expected number of events found in Table 6.2 is still applicable but the relation between the average probability of an event and the design parameters A, F, and M (i.e. Table 6.1) must be modified, since these results are based on calculations that rely on the hazard rates being constant.

Also, for non-constant hazard rates the relationships given above between the hazard ratio [pic] and the ratio of medians or ratio of means are no longer be correct in general, so the alternative interpretation of the effect measure may not be valid.

A sample size calculation based only on the proportions survived will be conservative (i.e. give too large sample size), since the information in the survival times it not fully utilized.

-----------------------

Accepted

Transplant

Dead

[pic]

[pic]

[pic]

Accrual Period (Patient intake). Length = A

Follow-up period

Length = F

1.

1

2

2

1

Treatment

................
................

In order to avoid copyright disputes, this page is only a partial summary.

To fulfill the demand for quickly locating and searching documents.

It is intelligent file search solution for home and business.

Literature Lottery

STATISTICAL ANALYSIS OF SURVIVAL DATA

To fulfill the demand for quickly locating and searching documents.

Related download

Related searches

STATISTICAL ANALYSIS OF SURVIVAL DATA

Diff in diff stata

To fulfill the demand for quickly locating and searching documents.

Related download

Related searches