STATISTICAL ANALYSIS OF SURVIVAL DATA
STATISTICAL ANALYSIS OF SURVIVAL DATA
IN CLINICAL RESEARCH 3
INTERACTIONS – EFFECT MODIFICATION
Consider again the effects of type of infusion and anti-coagulation treatment on the rate of occurrence of it: In the analysis day 2 page 34-37 the effect of AK-treatment was the same in the two groups (described by the multiplier [pic]). Is this a reasonable assumption?
Statistical Model: Two factors with interaction
Reference group: Na-lactate without AK-treatment
event-rate for reference group: [pic]
Assume:
* If NA-lactate is replaced by Glucose the rate is changed by a factor [pic].
* If the patient receive AK treatment the rate is changed by a factor [pic].
* If the patient receive both AK treatment and Glucose the rate is further changed by a factor [pic]
Then the event rates in the 4 group become
| |- anti.coag. |+ anti-coag. |
|Na-lactate |[pic] |[pic] |
|Glucose |[pic] |[pic] |
Effect of AK-treatment in Na-lactate group: [pic]
Effect of AK-treatment in Glucose group: [pic]
No interaction corresponds to [pic]
Using dummy variables to represent the model.
Define [pic], [pic] and [pic]
|Infusion |Anti-coag. |[pic] |[pic] |[pic] |
|Na-lactate |no |0 |0 |0 |
|Na-lactat |yes |0 |1 |0 |
|Glucose |no |1 |0 |0 |
|Glucose |yes |1 |1 |1 |
The model may then be written as
[pic]
where [pic]
No interaction corresponds to [pic]
NOTE: Interaction means that the effect of one variable on the occurrence of events depends on the value of another variable (i.e. the effect is modified the other variable
STATA commands (the two versions produce the same output)
xi:stcox i.group i.ak i.group*i.ak
xi: stcox i.group*i.ak
Output (selected parts only):
i.group _Igroup_1-2 (naturally coded; _Igroup_1 omitted)
i.ak _Iak_0-1 (naturally coded; _Iak_0 omitted)
i.group*i.ak _IgroXak_#_# (coded as above)
************ output omitted here **************
No. of subjects = 85 Number of obs = 85
No. of failures = 27
Time at risk = 9532
LR chi2(3) = 6.64
Log likelihood = -98.231844 Prob > chi2 = 0.0845
--------------------------------------------------------------
_t |Haz. Ratio Std. Err. z P>|z| [95% Conf. Int]
-------------+------------------------------------------------
_Igroup_2 | 2.2942 1.052591 1.81 0.070 .9334595 5.63855
_Iak_1 |.3840848 .4105318 -0.90 0.371 .0472731 3.12061
_IgroXak_2_1 |2.904121 3.630981 0.85 0.394 .2504779 33.6713
--------------------------------------------------------------
Comments to output
1. Note that the interpretation of the parameters now is different: [pic] describes the difference between the NA-lactate and Glucose among patients without AK. The difference for patients with AK is [pic].
A similar change in interpretation applies to [pic]
2. The relative rate ratio [pic] for the interaction term is not significantly different from 1 (p = 0.39). The likelihood ratio test becomes 0.81 (computed as 2[98.6349 – 98.2318], see also day 2, page 36) gives the same conclusion. The assumption of no interaction can therefore not be rejected.
3. Note however the extremely wide confidence interval for [pic]. Based on 27 events among 85 patients very little can be said about interactions.
CHECKING THE VALIDITY OF THE MODEL - MORE
PREVIOUSLY (day 2, page 32-33): Checking the validity of a model with a single covariate.
HERE: Checking the validity of a model with several explanatory variables.
Example
Consider a model in which the risk of developing it depends on type of infusion, AK-treatment and sex of the patient.
The variables group, ak and sex are covariates in the model.
xi: stcox i.group i.ak i.sex
This model defines 8 [pic] different subgroups of patients. In the model the 8 different event rates are assumed to be proportional.
Checking proportionality:
For each factor (group, ak and sex) consider the model in which this factor has been removed as a covariate and instead included as a stratification.
In a stratified Cox regression model a separate baseline hazard rate is estimated for each stratum.
A log-minus-log plot for a stratified Cox model will show if the proportionality between levels of the stratifying factor is reasonable after correction for the remaining factors.
STATA’s command stphplot has an option which allows adjustment for other factors.
STATA commands for checking proportionality in the present example. Each variable in turn are used for stratification while adjusting for the remaining variables.
stphplot , by(group) adjust(ak sex) nolntime
stphplot , by(ak) adjust(group sex) nolntime
stphplot , by(sex) adjust(group ak) nolntime
Output
Stratified on treatment groups:
[pic]
Stratified on anti-coagulation treatment
[pic]
Stratified on patient’s sex
[pic]
Testing the proportional hazards assumption with STATA.
STATA can perform a formal test of the proportional hazards assumption based on so-called Schoenfeld residuals (overall test) and scaled Schoenfeld residuals (separate test for each variable in the model). These residuals must be saved in the stcox command which fits the model to be validated.
Example
stcox group ak sex , ///
sch(res*) sca(sres*) nolog noshow
stphtest , detail
Options nolog noshow minimize output from stcox. Alternative: To omit output from stcox completely add quietly in front.
OUTPUT from stphtest
Test of proportional hazards assumption
Time: Time
-----------------------------------------------------
| rho chi2 df Prob>chi2
-------------+---------------------------------------
group | -0.16714 0.66 1 0.4176
ak | -0.20866 1.29 1 0.2562
sex | 0.03580 0.03 1 0.8553
-------------+---------------------------------------
global test | 1.84 3 0.6057
-----------------------------------------------------
Alternative procedures:
Use a time-dependent covariates to obtain a formal test of the proportional hazards assumption.
Note: The stcoxkm , by() command (day 2, page 33) allows only one variable at a time, so this command is less useful for models with several variables.
STRATIFIED COX REGRESSION MODELS
Use of stratified Cox models
If the effect of an important factor on the survival time is inadequately described by proportional hazards we may instead consider a model in which this factor enters as a stratifying factor with separate baseline hazards and the remaining factors are usual covariates.
The output contains no regression coefficient (or hazard ratio) for a stratifying factor, but the separate baseline hazards may be plotted an compared.
In STATA a this model is specifying by an option strata(varnames). Up to 5 stratifying factors are allowed.
Example
xi: stcox i.group , strata(sex)
COX REGRESSION WITH MANY COVARIATES
Typical Dataset:
For each patient: a waiting time t, a status d and socio-demographic, clinical and other variables, some of which may have impact on the prognosis.
Some problems:
1. How should the collected information be represented as covariates? (i.e. choice of categories, coding of variables, transformations of data etc.).
2. How many and which variables should be included?
3. Which variables should be investigated in details including scoring of information and interactions?
4. How should the results be presented?
Missing values
For some patients relevant information is missing. Only patients with complete information on the variables in the model are included in the analysis.
Missing values may therefore complicate the analysis and the interpretation of the results considerably.
SELECTION OF VARIABLES
Two main strategies:
1. Forward selection
2. Backward elimination
Forward selection
1. Start out with no variables in the model
2. Check each variable separately and include the variable that is most statistically significant.
3. Check the remaining variables one at the time together with the variables selected to far. Add the most significant to the model.
4. Continue this procedure until no new variable is statistically significant.
Backward elimination
1. Start out with all variables in the model
2. Check each variable and exclude the most non-significant.
3. Re-estimate the effects of the remaining parameters and remove the variable which is now most non-significant.
4. Continue until all remaining variables are statistically significant.
Hybrid methods exist – stepwise procedures – that allow both inclusion and removal of variables in each step.
STATA
In STATA the command sw can be placed in front of any regression command, including stcox. Options are used to specify details of the variable selection procedure.
Examples
//backward elimination, sig. level 0.1, Wald’s test
sw stcox group ak sex age , pr(0.1)
//forward selection, sig. level 0.05, Wald’s test, sex and age evaluated together
sw stcox group ak (sex age) , pe(0.05)
//backward stepwise, based on likelihood ratio test
sw stcox group ak sex age , pr(0.1) pe(0.05) lr
//forward stepwise, hierarchical (=specified order)
sw stcox group ak sex age , pr(0.1) pe(0.05) forward hier
//backward elimination, group and ak forced in model
sw stcox (group ak) sex age , pr(0.1) lock
Forward selection or Backward elimination?
Both approaches have drawbacks:
* In forward selection all conclusion are based on comparison of ”wrong” models.
* Backward elimination may often be infeasible because of too many missing values.
* Both methods are automatic and treat all variables in the same way
Conclusion?
No nice and easy solution exists. An order of priority of the variables based on knowledge and insight is a must.
Recommendation
Never rely (completely) on automatic variable selection procedures.
PRESENTATION OF RESULTS
A table with only variable names and p-values does not say anything about the size or direction of the effects and is therefore is clearly inadequate.
Regression coefficients (with standard errors and
p-values) are rather uninteresting if you forget to explain how the variables are coded in the analysis.
The rate ratios - the [pic] or exp([pic]) – are usually easier to understand than the regression coefficients.
If the “final” model includes covariates [pic], one may compute an individual prognosis for a future patient with covariates values [pic].
For instance the 5 year survival probability for such a patient can be estimated
[pic],
where PI is the value of the prognostic index for the patient
[pic],
and [pic] is the estimated survival function for the reference group.
A table given an estimate of 5-years survival or an estimate of the median survival time for patients with certain characteristics may also be a useful way to illustrate the implications of the model.
Formulas for standard errors of these quantities are available, but the calculations are not included in the standard statistical software packages.
Other possibilities
A Plot of the estimated survival function for patients with a particular covariate.
A plot of the estimated 5-years survival probability or the median survival time against the prognostic index.
STATA
To obtain plots of survival function, integrated hazard or (smoothed) hazard rates for particular values of the covariates use the command stcurve after stcox. The corresponding baseline function must be specified and saved with stcox. Example:
quietly stcox group ak age , basesurv(surv0)
stcurve , survival at(group=1 ak=1 age=50)
After each fit stcox saves a large number of results in system variables that may be accessed and used for further calculations. Example:
mat define rc=e(b)
gen pi=rc[1,1]*(group==2)+rc[1,2]*(ak==1)+rc[1,3]*age
COX REGRESSION
TIME-DEPENDENT COVARIATES
In a standard Cox regression the user has two options:
1. The effect of a covariate is described by a single number, a rate ratio, giving the change of the rate if the covariate is increased by one unit.
For a dichotomous variate this is just the ratio of the rates in the two categories.
2. A covariate is used as a stratifying factor. The rate ratio between rates in different strata will then becomes an unspecified function of follow-up time
With time-dependent covariates the rate ratio may depend on follow-up time in a specified way. This may e.g. be used to obtain a statistical test of the proportional hazards assumption.
Example:
Occurrence of it in heart infarct patients (continued).
Based on a log-minus-log plot (see day 2 page 33, page 5 above) we concluded that the rates in the two infusion groups could be assumed to be proportional.
STATA has also a command, stphtest, giving a statistical test of the hypothesis of proportional hazard rates (see page 7).
An alternative test of the hypothesis of proportional hazards assumption can be established using a time-dependent covariate.
The idea is to fit a model in which the log(rate ratio) depends linearly on follow-up time and in this model test if the trend with follow-up time is significantly different from 0. This is accomplished using the option tvc.
Example
The following STATA command fit a model in which the regression coefficient (i.e. the log(hazard ratio)) is a linear function of follow-up time. The nolog option is included to omit output from the iterative estimation process.
xi: stcox i.group , tvc(i.group) nolog
Output
Cox regression -- Breslow method for ties
No. of subjects = 85 Number of obs = 85
No. of failures = 27
Time at risk = 9532
LR chi2(2) = 6.76
Log likelihood = -98.167651 Prob > chi2 = 0.0340
--------------------------------------------------------------
_t |Haz. Ratio Std. Err. z P>|z| [95% Conf. Int.]
----------+---------------------------------------------------
rh |
_Igroup_2 | 8.243608 9.967054 1.74 0.081 .770832 88.1607
----------+---------------------------------------------------
t |
_Igroup_2 | .9865033 .0134885 -0.99 0.320 .9604174 1.01300
--------------------------------------------------------------
Note: Second equation contains variables that continuously vary with respect to time; variables are interacted with current values of _t.
Comments:
1. The results show that the difference between the groups decreases with increasing follow-up time (the hazard ratio is less than 1 for _Igroup_2 in the section with time-varying estimates). This tendency is however not statistically significant (p = 0.32) so the data is consistent with the hypothesis of proportional rates in the two groups.
2. The regression coefficient estimated for group (in the first section) is considerably large than before, since the parameter now gives the hazard ratio at the start of follow-up (the intercept at time = 0), and the negative trend with time therefore introduce this change. Note also that the standard error on this estimate is large, since the shortest follow-up time is 28.5 hours.
3. Since this analysis does not reject proportional hazards one would then go on to fit the simpler model without time-dependence, i.e. the analysis shown on day 2, page 29-31.
Occasionally, one may want to consider other forms of dependence on time, e.g. log(time). STATA has an option texp which allows specification of this.
Example
xi: stcox i.group , tvc(i.group) texp(ln(t))
Other uses of time dependent covariates
1. Time-dependent categorical variables
In some studies one may want to assess if occurrence of other events during follow-up, e.g. a specific complication or treatment, influences the subsequent rate of occurrence of the endpoint. Example: survival of heart transplant patients measured from enrollment in the study.
2. Effect of covariates that are updated
during follow-up
In some studies we want to describe effect of diagnostic variables that are measured regularly during follow-up. Rather than studying the prognostic value of the baseline measurement, we may want to estimate a “dynamic effect” using the latest available measurement.
3. Cumulative exposure
In some studies we want to evaluate the effect of an exposure that accumulates during follow-up.
Example: In a study of cancer mortality among radar operators in the army one may want to relate to the mortality rate at time t to the accumulated dose received prior to time t (perhaps allowing for some latency).
Warning
Interpretation of the results of analyses with time-dependent covariates may be complex.
TIME-DEPENDENT VARIABLES WITH STATA
STATA’s approach to fitting models with time-dependent covariates differs from most other statistical programs.
Most programs (incl. SPSS) use a special fitting algorithm which is considerably slower than the fitting algorithm used with no time-dependent covariates.
STATA, on the other hand, splits the follow-up time of each individual up into intervals on which the time-dependent covariate is constant, and the usual algorithm is therefore applicable for the new, larger dataset. This approach is advantageous for some problems, but not for others.
STATA’s approach is well suited for problems involving time-dependent categorical variables and covariates that are updated during follow-up, but models with accumulating exposure are often easier to define and analyze using the approach used by other programs.
The following examples explain the approach used by STATA in the analysis with a time-dependent categorical variable.
Example 1: Intraveneous drops -continued
The standard Cox regression analysis showed that the it rate in the Glucose group was a factor 2.69 larger than the it rate in the Na-lactate group.
Above we used the tvc() option to assess if this effect varied with time.
Alternatively, we may split the follow-up period in two (or several) intervals, estimate an effect for each interval, and valuate if the effect varies significantly with follow-up period, i.e. perform a test of no interaction between treatment group and follow-up period.
The following commands define the data as survival time data and split the follow-up period at 80 hours.
stset time , failure(it==1) id(ptnr)
stsplit period , at(80)
Note:
• A variable identifying the individual patients (here ptnr) must be specified before splitting the data,
• A variable name (here period) must be given to the variate identifying the follow-up interval.
Output from stsplit
(56 observations (episodes) created)
The 56 patients with more than 80 hours follow-up are now represented by two records (lines) in the file: one for the first period and one from the second period.
We may now consider
1. xi: stcox i.group
2. xi: stcox i.group if period==0
3. xi: stcox i.group if period==80
4. xi: stcox i.period*i.group
The first command is just to check that the standard comparison is unchanged by the splitting.
The last command evaluates the interaction between period and group, i.e. provide a test of the proportional hazrd assumption
Output (only the final parts)
command 1
--------------------------------------------------------------
_t | Haz.Ratio Std. Err. z P>|z| [95% Conf. Int]
----------+---------------------------------------------------
_Igroup_2 | 2.689008 1.171293 2.27 0.023 1.14504 6.314854
command 2
--------------------------------------------------------------
_t | Haz.Ratio Std. Err. z P>|z| [95% Conf. Int]
----------+---------------------------------------------------
_Igroup_2 | 3.079946 1.778889 1.95 0.051 .992918 9.553728
--------------------------------------------------------------
command 3
--------------------------------------------------------------
_t | Haz.Ratio Std. Err. z P>|z| [95% Conf. Int]
----------+---------------------------------------------------
_Igroup_2 | 2.202925 1.506434 1.15 0.248 .576659 8.415503
--------------------------------------------------------------
command 4
--------------------------------------------------------------
_t | Haz.Ratio Std. Err. z P>|z| [95% Conf. Int]
----------+---------------------------------------------------
Iperiod_80| 2.71834 . . . . .
_Igroup_2 | 3.079946 1.778889 1.95 0.051 .992918 9.553728
IperXgr_~2| .715248 .6402239 -0.37 0.708 .123748 4.13403
--------------------------------------------------------------
Comments
Command 1:
We get the same result as in the analysis of the data before splitting the follow-up times (day 2, page 30). On the full follow-up period the hazard raito is estimated to 2.69.
Command 2 & 3:
As anticipated, the hazard ratio is slightly larger, 3.08, in the first period (0 – 80 hours) than in the second period, 2.20.
Command 4:
The first parameter (_Iperiod_80) is redundant and should be ignored.
The second parameter (_Igroup_2) gives the hazard ratios between the the two treatments in the first period (cf. command 2)
The third parameter (_IperXgro_~2) is the ratio of the hazard ratio in the second period to that in the first period: 0.715 = 2.20/3.08, and the p-value (0.708) refers to the test of the hypothesis of no interaction between period and group, i.e. that the ratio of the hazard ratios is 1, or the hypothesis that the hazard ratio is independent of time. We see that the data do not contradict this hypothesis.
Example 2: Stanford Heart Transplant data
The file stanford.dta contains data on the survival of 103 heart transplant patients accepted for surgery at Stanford University Hospital from late 1967 to early 1974. In the present dataset all patients are followed until April 1 1974.
Once accepted a patient was put on a waiting list and a search started for a suitable donor heart. If the patient was still alive a transplant was performed when a donor heart was found. A few patients were transferred out of the program before receiving a transplant.
Problem:
Does a heart transplant prolong the life of a patient?
The situation can be represented by the following figure:
[pic]
Note that transplant status is time-dependent. Comparing the survival of patients who received a new heart with those who did not is therefore not meaningful.
All patients are initially in the state “Accepted” waiting for a heart transplant. If a patient survives long enough he/she receives a new heart and therefore changes transplant status.
The data
In the file stanford.dta the follow-up history of a patient is captured by the following variables
id patient identification number
wait time (in days) from entry until surgery,
wait is 0 if no transplant was perform.
transplant q if the patient received a transplant,
0 otherwise.
stime time (in days) from entry until death or
April 1 1974.
died 1 if dead, 0 otherwise
Some examples of follow-up histories
. list id wait transplant stime died ///
if id==5|id==20|id==50|id==100|id==102
+--------------------------------------+
| id wait transp~t stime died |
|--------------------------------------|
14. | 102 0 0 11 0 |
20. | 5 0 0 18 1 |
23. | 20 1 1 28 1 |
31. | 100 38 1 39 0 |
93. | 50 83 1 979 1 |
+--------------------------------------+
A simple model for the effect of a heart transplant on survival/mortality:
The effect of a transplant at some point in time s is to change the mortality rate at time t ([pic]) by a factor [pic], i.e. [pic].
A value of [pic] less/greater than 1 reflects a better/worse prognosis after a heart transplant.
By introducing a time-dependent dummy variable
[pic]
the the mortality rate be represented in the usual form [pic] with [pic] and [pic].
For a patient who survives the waiting period and receives a transplant the covariate is 0 from acceptance until transplant and from this time on the covariate is 1.
For a patient who dies while waiting for a suitable transplant the covariate is 0 all the time.
The problem can therefore be reduced to a problem involving a time-constant covariate if the follow-up time for each patient is split at the time of transplant.
Creating the new dataset
In the new dataset a patient receiving a transplant will have two records, a pre-transplant record and a post-transplant record, whereas a patient not receiving a transplant has only on record.
The following STATA commands set-up the new dataset.
expand 2 if transplant
* a dublicate record is inserted if transplant is
* 1, i.e. the file now has identical records for
* patients who had a transplant
bysort id: gen recno=_n
* the data are sorted on id and a new variable,
* recno, giving the record number (1 or 2) is created
gen posttrans=recno-1
* a variable, posttrans, taking the value 0 in the
* first record and 1 in the second record is created
by id: gen survtime=stime if _n==_N
* survime is a new variable to be used as time
* variable in the Cox regression.
* _n is a system variable giving the number of the
* current record for the given patient.
* _N is a system variabe ging the total number of
* record for a given patient.
* For patients with one record (_N=1), i.e. who never
* recieved a transplant survtime is equal to stime.
* For patients with two records (_N=2) survtime is
* set equal to stime in the last record.
by id: replace survtime=wait if _n==1 & transplant
* For patients with two records survtime is set
* equal to as wait in the first record.
by id: replace died=0 if _n==1 & transplant
* For patients with two records died is equal to 0
* in the first record (they survived the first period
* and recieved a transplant).
The new and expanded data set should now be declared as survival time data. Since some patients have more than one record the option id is used to identify records belong to the same patient.
stset survtime , failure(died) id(id)
Output
. stset survtime , failure(died) id(id)
id: id
failure event: died != 0 & died < .
obs. time interval: (survtime[_n-1], survtime]
exit on or before: failure
--------------------------------------------------------------
172 total obs.
2 multiple records at same instant PROBABLE ERROR
(survtime[_n-1]==survtime)
--------------------------------------------------------------
170 obs. remaining, representing
102 subjects
74 failures in single failure-per-subject data
31933 total analysis time at risk, at risk from t = 0
earliest observed entry t = 0
last observed exit t = 1799
We have a problem!
Inspection of the data reveals that patient 38 has two records (line 60 and 61). He first waits for 5 days, then has a transplant, but dies on the same day.
This is perfectly OK, except that STATA and other programs uses the following convention in case of ties: first events, then censoring and finally entries.
Patient 38 therefore dies in the second period before he enters that period and STATA complains.
The following fix solves the problem for now
replace survtime=5.1 in 61
After redoing the stset command we are now ready to perform the Cox regression analysis.
stset survtime , failure(died) id(id)
stcox age posttran surgery year, nolog noshow
The model includes four covariates
age age at acceptance
posttrans the (time-dependent) transplant status
surgery 1 if the patient had previously had surgery,
0 otherwise
year year of acceptance
Output
Cox regression -- Breslow method for ties
No. of subjects = 102 Number of obs = 170
No. of failures = 74
Time at risk = 31933
LR chi2(4) = 17.73
Log likelihood = -284.81589 Prob > chi2 = 0.0014
--------------------------------------------------------------
_t |Haz. Ratio Std. Err. z P>|z| [95% Conf. Int]
----------+---------------------------------------------------
age | 1.032286 .0145523 2.25 0.024 1.00416 1.06121
posttrans | .8951127 .2775564 -0.36 0.721 .487458 1.64368
surgery | .3779578 .1651275 -2.23 0.026 .160531 .889873
year | .8872802 .0602063 -1.76 0.078 776788 1.01349
--------------------------------------------------------------
Older patients have higher rates, patients do better over time and patients with previous surgery do better. Whether a patient ultimately receives a heart transplant does not seem to make much difference.
Note 1
Once the data have been reorganized and expanded so that covariates are constant within records STATA can compute a log rank test to assess the effect of a time-dependent categorical variable:
sts test posttrans , noshow
Output
Log-rank test for equality of survivor functions
| Events Events
posttrans | observed expected
----------+-------------------------
0 | 30 30.34
1 | 44 43.66
----------+-------------------------
Total | 74 74.00
chi2(1) = 0.01
Pr>chi2 = 0.9187
Note 2:
Also for this data set the command stsplit could be used to create a new dataset with multiple records per individual.
We shall consider stsplit again tomorrow in connection with Poisson regression.
SOME ADDITIONAL FEATURES IN STATA
Sofar, the following options have been introduced (note:
options are place after the comma):
nohr regression coefficients instead of
hazard ratios in output. (day 2, page 29)
Strata() separate baseline hazards for each
category of the stratifying factor(s).
Useful for modeling factors that changes
The event rate in a non-proportional
manner. (page 8)
basesurv() the baseline survival function is stored in
the new variate specified by the option.
Used e.g. prior to plotting the survival
curves for specified covariate patterns
(day 2, page 31)
noshow, nolog less important details of the output is
omitted (day 2, page 29, page 7)
tvc(), texp() used to specify a time-dependent
covariate (page 15-16)
schoenfeld(), scaledsch() two types of residuals are
stored in the specified variates. Used e.g. when testing the adequacy of the proportional hazards assumption. (page 7)
Additonal options for Stcox
robust specifies that so-called robust standard
errors are calculated instead of the usual
model-based standard errors. If an id variable has been set robust is clustered on this variable.
cluster() specifies the variable defining clusters.
Implies use of robust standard errors
clustered on this variable.
estimate force fitting the null model, i.e. a model with no covariate. Used mainly to obtain the value of the log-likelihood function for
the null model.
offset() specifies a variable that is to be entered directly in the model with a fixed regression coefficient of 1.
level(#) specifies the confidence level in percent
shared(), frailty(), effects options used to define
Cox models with excess variability or correlated survival times. Used e.g. to model family data or data with multiple event per individual.
basehc(), basechazard() used in the same way as
basesurv() before plotting estimated hazards or estimated integrated hazards. Each option adds a new variable to data.
mgale(), esr()used for various diagnostic plots. Different types of residuals are added to data.
Short description of a few other st commands
stdes gives a short data-base description of the
basic features of data defined as survival
time data.
stsum summarizes basic statistical aspects of
data defined as survival time data
stgen generates variables reflecting the entire
history. Advanced version of sts gen
(day 2, page 17)
stci reports confidence intervals for means,
medians and percentiles of survival times
streg A comprehensive set of parametric
survival models can be fitted with this
command
stvary reports which covariates vary over time.
General pre and post-estimation commands
estimate a suite of commands that allow the user to
display and store specific parts of the output. Example:
quietly stcox group ak
estimate table , b se p
produce the following output
---------------------------
Variable | active
-------------+-------------
group | .97443927
| .43556917
| 0.0253
ak | -.25716537
| .54813542
| 0.6390
---------------------------
legend: b/se/p
lrtest used to compute likelihood ratio test for
nested models as was done manually on day 2 page 38. Example
xi: stcox i.group i.sex*i.ak
est store A
xi: stcox i.group
lrtrest A
Some useful general modifications of STATA commands
by varname: when placed in front of a command
separate analyses is done for each category of the specified variable
bysort varname: Similar to by varname, but data are
first sorted on the specified variable
quietly when placed in front of a command the output is suppressed (day 2, page 31)
in range, if logical expression when placed in before
the option comma the analyses is restricted to record in the specified range or records satisfying the logical expression. Examples:
stcox group ak if sex==1
Only males are included in the analysis
stcox group age in 1/50
The analysis is restricted to the first 50
records
COX LITE
In the first period the command sts test was used to do comparisons groups including stratification on confounding factors using the option strata(varlist).
The output gives the value of the log rank test (or some other specified test statistic) and the corresponding p-value, but no rate ratio estimates.
The command stmc performs similar analyses, but the output focuses on estimates of rate ratios and the corresponding confidence intervals.
The estimates reported are obtained by a method similar to the Mantel-Haenszel approach for analysis of several 2x2 tables (i.e. better than the QaD estimates, but not quite as efficient as maximum-likelihood estimates)
If a single categorical variable is of particular interest the stmc command therefore provide a simple way to get adjusted estimates of a rate ratio for this variable.
Example
To obtain an estimate of the rate ratio (Glucose relative Na-lactat) stratified for time and anti-coagulation treatment (ak) use
stmc group , by(ak)
Output:
. stmc group , by(ak)
failure _d: it == 1
analysis time _t: time
Mantel-Cox comparisons
Mantel-Haenszel estimates of the rate ratio
comparing group==2 vs. group==1
controlling for time (by clicks)
by ak
RR estimate, and lower and upper 95% confidence limits
+------------------------------+
| ak RR Lower Upper |
|------------------------------|
| no 2.524 0.955 6.672 |
| yes 2.764 0.306 24.977 |
+------------------------------+
Overall Mantel-Haenszel estimate, controlling for time and
ak
----------------------------------------------------------
RR chi2 P>chi2 [95% Conf. Interval]
----------------------------------------------------------
2.562 4.63 0.0314 1.053 6.232
----------------------------------------------------------
Approx chisq for unequal RRs (effect modification) 0.01
(1 df, p = 0.94116)
The results are very similar (but not identical) to those obtained from the Cox regression analysis of the same problem (day 2, page 36). Note, however, that no rate ratio is estimates for the stratifying factor(s). The chi-square test reported is the stratified log rank test.
THE SIZE OF THE TRIAL –
HOW MANY PATIENTS DO WE NEED?
STATA’s command for sample size calculations, sampsi, cannot be used for problems with censored data. A few commercial software packages permit power and sample size calculations for survival data, e.g. nQuery and Power and Precision.
Here: Sample size calculations based on the tables in Andersen & Væth (1984).
Set-up
• Randomized clinical trial.
• Survival endpoint.
• Comparison of two treatments
• 1:1 randomization – total study size rather robust to small deviation from this (1:2 – 2:1)
[pic]
Terminology
A = length of accrual period
F = length of follow-up period
n = total number of patients
[pic]= hazard rate in treatment group 1
[pic]= hazard rate in treatment group 2
Hazard rates assumed constant (i.e. survival according to an exponential distribution).
Effect measure
Hazard rate ratio = measure of treatment difference
[pic]
Note: The second and the third equality are valid only if the rates are constant (i.e. when survival times follow an exponential distribution).
Usually the value of [pic] considered reflects a “clinically relevant” treatment difference, i.e. a treatment effect that we want to detect with a high probability.
SPECIFICATION OF THE TEST
Level of significance = Risk of type 1 error = [pic]
Power = 1- risk of type 2 error = 1-[pic]
Example: A typical scenario
Design: A = 3 years
F = 3 years
Traditional treatment:
Median survival = M1 = 4 years
New treatment:
Median survival = M2
Clinical relevant difference: A 50% increase in median survival time with high probability, i.e.
[pic]
Basic steps in the sample size determination
1. Calculate
Average value of medians
[pic]
and
[pic]
[pic]
2. Use Table 6.1 (see page 39)
Find the probability that a randomly selected patient in the trial will die = the average probability of “event” while on study. From Table 6.1
p=0.460
The expected number of events in a trial with a total of n patients:
Expected number = n∙p = n∙0.460
3. Use Table 6.2 (see page 39)
Table 6.2 gives the expected number of events needed to achieve the specification defined by[pic].
If [pic], then a total of 320 events are needed.
4. Putting the pieces together
To determine the total sample size find n from
Expected number of events = 320 = n∙0.460
i.e.
[pic]
10 new patients per month in each treatment group for three years
F/A |0.0 |0.5 |1.0 |1.5 |2.0 |2.5 |3.0 |3.5 |4.0 |4.5 |5.0 | |A/M | | | | | | | | | | | | |0.5 |0.155 |0.289 |0.402 |0.497 |0.577 |0.645 |0.701 |0.749 |0.789 |0.822 |0.851 | |1.0 |0.279 |0.490 |0.639 |0.745 |0.820 |0.872 |0.910 |0.936 |0.955 |0.968 |0.977 | |1.5 |0.378 |0.630 |0.780 |0.869 |0.922 |0.954 |0.973 |0.984 |0.990 |0.994 |0.997 | |2.0 |0.459 |0.729 |0.865 |0.932 |0.966 |0.983 |0.992 |0.996 |0.998 |0.999 |0.999 | |2.5 |0.525 |0.800 |0.916 |0.965 |0.985 |0.994 |0.997 |0.999 |1.000 |1.000 |1.000 | |3.0 |0.579 |0.851 |0.947 |0.981 |0.993 |0.998 |0.999 |1.000 |1.000 |1.000 |1.000 | |3.5 |0.624 |0.888 |0.967 |0.990 |0.997 |0.999 |1.000 |1.000 |1.000 |1.000 |1.000 | |4.0 |0.662 |0.915 |0.979 |0.995 |0.999 |1.000 |1.000 |1.000 |1.000 |1.000 |1.000 | |4.5 |0.694 |0.936 |0.986 |0.997 |0.999 |1.000 |1.000 |1.000 |1.000 |1.000 |1.000 | |5.0 |0.720 |0.951 |0.991 |0.998 |1.000 |1.000 |1.000 |1.000 |1.000 |1.000 |1.000 | |F/A |0.0 |0.2 |0.4 |0.6 |0.8 |1.0 |1.2 |1.4 |1.6 |1.8 |2.0 | |A/M | | | | | | | | | | | | |0.2 |0.066 |0.092 |0.112 |0.141 |0.164 |0.187 |0.209 |0.231 |0.252 |0.272 |0.292 | |0.4 |0.127 |0.174 |0.218 |0.261 |0.300 |0.338 |0.374 |0.408 |0.440 |0.470 |0.498 | |0.6 |0.182 |0.247 |0.307 |0.363 |0.413 |0.460 |0.503 |0.543 |0.579 |0.613 |0.644 | |0.8 |0.232 |0.313 |0.385 |0.450 |0.507 |0.559 |0.605 |0.647 |0.684 |0.717 |0.747 | |1.0 |0.279 |0.372 |0.453 |0.524 |0.586 |0.639 |0.686 |0.727 |0.762 |0.793 |0.820 | |1.2 |0.321 |0.425 |0.513 |0.588 |0.651 |0.704 |0.750 |0.788 |0.821 |0.848 |0.871 | |1.4 |0.360 |0.473 |0.566 |0.642 |0.706 |0.757 |0.800 |0.835 |0.865 |0.888 |0.908 | |1.6 |0.396 |0.516 |0.612 |0.689 |0.751 |0.801 |0.840 |0.872 |0.898 |0.918 |0.934 | |1.8 |0.429 |0.555 |0.635 |0.730 |0.789 |0.836 |0.872 |0.900 |0.922 |0.940 |0.953 | |2.0 |0.459 |0.590 |0.689 |0.765 |0.822 |0.865 |0.897 |0.922 |0.941 |0.955 |0.966 | |Table 6.1. Average probability of event as a function of A/M and F/A. Lower part of the table gives further details for values in the upper left part of the upper table.
Alpha | | |0.05 | | | | |0.01 | | | |power |0.25 |0.50 |0.75 |0.90 |0.95 |0.25 |0.50 |0.75 |0.90 |0.95 | |Theta | | | | | | | | | | | |1.10 |730 |1700 |3100 |4650 |5750 |1600 |2950 |4700 |6600 |7850 | |1.20 |200 |465 |840 |1270 |1570 |440 |800 |1280 |1800 |2150 | |1.30 |96 |225 |410 |615 |760 |215 |390 |615 |865 |1040 | |1.40 |59 |136 |250 |375 |460 |128 |235 |375 |530 |630 | |1.50 |41 |94 |170 |260 |320 |88 |165 |260 |365 |435 | |1.60 |30 |70 |126 |195 |240 |66 |121 |195 |270 |325 | |1.70 |24 |55 |99 |150 |185 |52 |95 |155 |215 |255 | |1.80 |20 |45 |81 |122 |151 |42 |77 |123 |175 |210 | |1.90 |16 |38 |68 |103 |127 |36 |65 |103 |145 |175 | |2.00 |14 |32 |58 |88 |109 |31 |56 |88 |124 |150 | |Table 6.2. The expected number of events as a function of the treatment effect theta, the significance level alpha and the power (1-beta).
Note:
The sample size calculations can also be used when the hazard rates are not constant as long as the effect measure is the hazard ratio [pic].
The expected number of events found in Table 6.2 is still applicable but the relation between the average probability of an event and the design parameters A, F, and M (i.e. Table 6.1) must be modified, since these results are based on calculations that rely on the hazard rates being constant.
Also, for non-constant hazard rates the relationships given above between the hazard ratio [pic] and the ratio of medians or ratio of means are no longer be correct in general, so the alternative interpretation of the effect measure may not be valid.
A sample size calculation based only on the proportions survived will be conservative (i.e. give too large sample size), since the information in the survival times it not fully utilized.
-----------------------
Accepted
Transplant
Dead
[pic]
[pic]
[pic]
Accrual Period (Patient intake). Length = A
Follow-up period
Length = F
1.
1
2
2
1
Treatment
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- note first the all important properties of the logarithm
- q missing data my data set has missing values
- researchgate find and share research
- nic spaull education is the most powerful weapon which
- a fitted value is simply another name for a predicted
- statistical analysis of survival data
- applied statistics for political scientists
- apostila noções do stata
- generalidades e principais comandos do stata versão
Related searches
- analysis of data procedure
- data analysis of research study
- examples of statistical analysis reports
- analysis of data example
- analysis of data procedure 13485
- types of statistical analysis pdf
- example of data analysis what is data analysis in research
- types of statistical analysis methods
- types of statistical analysis in research
- analysis of qualitative data pdf
- examples of qualitative data analysis methods
- analysis of quantitative data pdf