. SURVIVAL DATA ANALYSIS - LIFE TABLE METHODS

Statistics 581

. SURVIVAL DATA ANALYSIS - LIFE TABLE METHODS

Wei Zhang, Synergic Reso~rces Corporation

Key words: Censored observations, Cohort, Hazard function.

Abstract

Even though SAS PC DOS version 6.04 has been released for a quite bit time, its UFETEST procedure especially the life table option might be still new to some of SAS statistical users. This tutorial will focus on life table techniques in estimating survival function, hazard function, and lifetime (median residual lifetime). SAS UFETEST procedure in creating life tables will be discussed along with detailed example about its applications. The example used is from a utility conservation program. However the technique can be used in many other areas, such as pharmaceutical research, marketing research, and demographic research. The key concept of ~red observations in survival data analysis will be introduced and the SAS coding techniques are provided. The distinction between the medical (clinical) life table and popUlation (demographic) life table will also be discussed.

Introduction

Although life table methods have been widely used by demographers and medical researchers, they are not well known by many others. Some researchers use parametric approaches to do any kind of survival data estimations without realizing some nonparametric methods, such as life tables, can be good alternatives because they are easy to use and to understand.

It is inappropriate to think that parametric methods are always better than nonparametric methods. Nonparametric or distribution-free methods are less efficient than parametric methods when survival times follow a theoretical distribution and more efficient when no suitable theoretical distributions are known (Elisa T. Lee, 1992). According to Lee, nonparametric methods should be used to analyze survival data before attempting to fit a theoretical distribution. In the following sections,

some general concepts about survival analysis and its applications will be introduced. Then life table methods together with SAS LIFETEST procedure will be detailed.

What is Survival Data Analysis ?

Survival data analysis centers on the period of time it takes a group of individuals (persons or things) to reach a predefined event, called failure. Survival data can be from any types of subjects, such as the durations of employment in companies, the periods of stock market up, and the survival times of patients in a clinical trial. In many studies, researchers follow some subjects until one of their characteristics disappear. However, the study may be completed before the characteristic disappears or subjects sometimes withdraw in the middle of a study. In these cases, the survival times are incomplete. Incomplete survival times are called censored times. Completed survival times are called event times. All survival analysis techniques take into account the fact that not all subjects being studied have their complete survival times at the point that the data are collected. Therefore, the techniques in survival analysis are different from conventional statistical methods, either parametric or nonparametric, because survival data are almost always incomplete.

The concept of censored observations is very important in order to understand survival analYllis. Figure 1 illustrates how censored observations are defined.

As illustrated in Figure I, observations for A and B have completed their lifetime (e.g., A started in 1981 and failed in 1983). However, for observation C its lifetime was not completed at the time of data collection in 1993 (survey cut off date), i.e. the observation C was still alive. Thus, observation C is a censored observation. Although observation C's complete lifetime was not available, it should not be excluded from the survival analysis because in estimating a lifetime for a group we need the total number exposed to a risk of failure (this includes both failures and censored observations). In other words, any observations exposed to a risk of failure, for the purpose of this study, have either a complete or

NESUG '93 Proceedings

582 Statistics

incomplete lifetime history and must be included in the survival analysis. Including censored observations in calculating a group lifetime is the unique feature of a survival analysis. In the above example, the nine years (1984-1993) of life history from observation C contribute information which is useful in estimating a group lifetime. SAS bas three procedures in doing survival analyses: UFETEST. UFEREG, and PHREG. The LIFETEST procedure uses nonparametric methods including life tables. The other two, LIFEREG and PHREG, use parametric methods. Primarily. LIFETEST is used to

estimate the survival time while LIFEREG and PHREG

are used to investigate the factors affecting the survival time.

Figure 1

.

AD Example of CeosORld ObservatioDs

Measures

A- - - e

B_ _ _ _ _~??

c-----.....

L--'-'1---~-1--"-3---"4---"-$---"-'--"3~S

Year

Where Can Survival Analysis Be Used ?

Survival analysis can be used in many different areas. Medical researcbers may use it to study patient's survival time after a medical treatment. Marketing researchers from telephone company or credit cards company can use survival analysis to estimate the average time of stay for their customers. The example given in this paper was from a research project for a utility company. The subject being studied was a water heater tank wrap. one kind of insulation measure from an energy conservation program. The methodology used in. this

study can be applied to many other types of studies involving survival data.

Life Table Methods

There are two types of life tables in general: population or demographic life table and medical or clinical life table. In SAS UFETEST, the life table option is for medical life table by default. Within each type of life tabl~, it can be further classified as cohort (longitudinal) life table and period (cross-sectional) life table. In the example given in this paper, a period life table was used for the analysis. The main reason for this choice is that the insulation measureS data collected in the survey were installed from 1981 to 1988 and are therefore not in a same cohort'. The period life table is capable of including eight years of data in one life table and generating meaningful statistical estimations from these observations even though the data are from different cohorts. The period life table is a mathematical model of the life history of a hypothetical cohort. The key assumption with li!e period life table is that the hazard function (a point estimates of age-specific failure experiences) during the current time represents the failure experience of the whole cohort. That is, we assume the failure rate of one type of measure "x" years old now to be the same failure rate that the measures "(x-I)" years old today will have next year when they became ?x? years old.

The follOlNing statistical estimates are generated in the life table analysis: survivorship function, probability density function, hazard function, effective sample size, and conditional probability of failure. A defmition of these statistical terms can be found in Appendix A. As was mentioned before, the key concept in the survival analysis is the censored observation. Therefore, it is important to understand how life table methods handle the censored observations. The effective sample size takes care of the censored problems. As can be seen from its definition, the effective sample size equals the actual sample size minus one half of the censored observation in

the corresponding intervals. The re3soD that censored

observations should be subtracted from the actual sample size is that at the time the censored observations are censored, they are no longer in the position to provide useful information in survival estimations. But before the time they are censored, they can provide such information. This is why censored observation can not be excluded from survival data at the beginning. The reason that only one half of censored observations is subtracted from the actual sample size is because actual as well as

NESUG '93 Proceedings

Statistics 583

effective sample sizes are point estimates. Normally sample sizes in survival data are defined at the mid-point of each age interval. Thus if censored observations are symmetrically distributed in each survival year interval, only one half of censored observations need to be subtracted from the actual sample to get the effective sample size.

SAS LIFETEST Procedure

There is a life table option, LT, in PROC LIFETEST. If its option is not specified, by default PROC LIFETEST uses product-limit method in doing survival estimation. The product-limit estimate can be considered as a special case of the life table estimate where each interval contains only one observation. The basic input data to construct a life table are the following three dates: the date of installation (starting data), the date of removal (ending date), and the date of data collection (survey cut off date). The survival history data contributed to the life table by the event observations can

be obtained bY subtracting the date of installation from the

date of removal; the survival history data contributed to the life table by the censored observations can be obtained by subtracting the date of installation from the date of data collection. The detailed SAS code for a life table is provided in Appendix B. As can be seen from the SAS code, all one has to do? in life table methods is to make durations of survival time for both events and censored observations available. The durations of survival time then will be used as the TIME variable after PROC LIFETEST statement. The plot option in LIFETEST can produce not only a set of plots for the estimated survival function, hazard function, and density function against time, but also a plot of the negative log of the estimated survival function against time (by specifying LS), and a plot of the log of the negative log of the estimated survival function against log time (by specifying LLS). The LS and LLS plots provide an empirical check of the appropriateness of the exponential model and the Weibull model, respectively, for the survival data. The LLS plot is especially useful to check out the proportional hazards model assumption, proportional odds, before attempting to use PHREG. The OUTSURV option provides interval estimates with 95 % confidence intervals.

Results of the Survival Analysis

From these life tables, we are able to examine both .the level of measure retention, and patterns related to this retention, in greater detail than is possible through more simplistic univariate analysis techniques.

Table 1 summarizes the major statistical estimations from the life table. More detailed findings can be found in Appendix C. Both point and interval estimates of the survivorship function and hazard function are provided. The point estimates can be used as a framework for comparison with estimates of measure life developed by engineers. The interval estimates can serve as a range between which the true measure life lies. It should be noted that both upper and lower confidence limits for the life table estimations were calculated based on the assumption of a normal distribution and a confidence level is 95%.. However, a survival distribution is often skewed or away from a normal distribution. Since the survival distribution may not be an exact normal distribution, the level of confidence which can be associated with the interval estimates (i.e., the range between which the ~ measure life lies) may deviate slightly from 95%. Therefore, caution should be taken when using the interval estimates for planning andlor forecasting.

Table 1.

Survival Analysis for the Water Heater Tank Wrap

Year (Interval)

0-1 1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9 9-10 10-11 11-12 12-13 13-14

Survival Function

1.000 0.998 0.988 0.961 0.944 0.909 0.869 0.825 0.786 0.740 0.689 0.609 0.554 0.482

Year

Hazard

(Mid Point) Function

0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5 10.5 11.5 12.5 13.5

0.002 0.010 0.028 0.018 0.038 0.045 0.052 0.048 0.061 0.076 0.118 0.095 0.138 0.566

The following are the most important results from the life table analysis:

? Median Residual Lifetime ? Mortality Patterns ? Probability of Survival

NESUG '93 Proceedings

584 Statistics

?

Median Residual Lifetime

A very important statistic estimated from the life table is the median residual lifetime. This is the amount of time which elapses before reducing the number of atrisk units to one-half, also known as the median future lifetime. For this study, this statistic estimate is available only for the first three years because of limited number of years for which data were available and the relatively high frequency with which these measures were observed to be in place still. The median residual lifetime can be used as a proxy for the water heater tank wrap lifetime. See table 1 in appendix C for details.

?

Mortalitv Patterns

Using a life table statistic called .the Hazard function, it is possible to identify trends in measure failure over the life of a measure. In other words, at what time during the life of a measure is failure most likely to occur? Mortality patterns have been explored using instantaneous failure rates for each age interval, as represented by the hazard function. The value of the hazard function for each year can be compared with more traditioual engineering estimations.

Figure 2

Hazan! Function h(t) Estimates

hIt)

0.1',...--------------,

Mortality patterns are illustrated in Figure 2. The failure rate for the tank wrap increases as years increase (with some fluctuations). It is not easy to explain these fluctuations without prior knowledge about the tank 1Nf8p. In general, sophisticated modelling techniques may

be needed to further examine the fluctuation from the hazard function.

?

Probability of Survival

.Using the survivorship function generated in the life tables, it can provide the probability that, at any given time following installation, the measure will still be in place. This probability is referred to as ?Survival Function Estimates? which are presented in table I and figure 3.

Figure 3

Survival FUDction S(t) Estimates

SIt)

~

0.1

~

0.4

0.2

o D

? 10 II 14

Year

- Tad: Wrap

. ,

10 Il 14

Year

- T??kWnp

NESUG '93 Proceedings

Tips and Pitfalls

?

Measure Retention by Income Group

A comparison analysis can be conducted in life table by dividing data into two (or more) mutually exclusive strata. In the given example, a? separate survival analysis was performed by using high and low

Statistics 585

income groups: To do this, a new variable called 'group' was created. The STRATA option was used in PROC UFETEST to take 'group' variable. The SAS code for this analysis is also in Appendix B. In this example, family income below $30,000 was defined as a low income group. Family income greater than, or equal to, $30,000 was considered to be a high income group. Comparisons were made in terms of survival level and pattern.

As illustrated in Figure 4, the survival curves by income group are similar. However, the low income group has higher survival rates.

Figure 4

.

Su"hal Function S(t) by Income Group

,Sill ------------------,

O.8r------

0.6\-------- ---"-,....."'--

O.4r---------------j 0.2r------------?------'-

6 8 10

14

Year

- Low e -+- BI.h Income

?

Missin( values

It is not uncommon that survival data contain missing values. Most often, a portion of failure dates are missing values. One commonly used solution to handle this type of missing value is to use the average duration of survival time from those events that their failure dates are not missing as a proxy to impute the missing values. For example, the average duration of survival time for non-missing value events is six years (use PROC MEANS and VAR will be the difference between the date of

failure and date of installation from the events). One can replace the missing values with six years plus their date of installation. Caution should be taken when a relatively large portion of events has missing values. Because if 30% of events have missing values on their failure dates and six years lifetime is used to impute the missing values, the hazard function can be biased (age-specific failure rate at age six and half will go up sharply). In this case the interpretations about mortality pattern (hazard function) should be made carefully. A simple simulation model can be used to avoid destroying the mortality pattern. ?Instead of adding six years to all failure-datemissing observations, one can give them more or less than six years as well as six years so that the total average of 'imputation years' is still the six. By this method, the mortality pattern can remain the same as the one with no missing values, assuming that missing failure dates are evenly distributed.

?

Which Types of Life Table Should be

Used?

Depending on what kind of survival data are available, one can decide if population life table or medical life table is more appropriate. For popUlation life tables, a whole history of death data is needed. In other words, to use population life table, there should be no censored observations at the survey cut off day (or if censored observations occur, one has to estimate each of the future death day for the censored observations before a population life table is constructed). A unique statistic from population life tables is life expectancy, which is the average number of years remaining at beginning of age interval. A good general reference on how to construct population life tables is Shryock and Siegel (1975).

For medical or clinical life tables, one does not need to know all the death records when data are collected. Withdrawals in the middle of a study and survivors at the day when data are collected are allowed. Both can be treated as censored observations. In the given example, there were no withdrawals. Only survivors (measures still in place) were coded as censored observations. If withdrawals occur, they should be coded as censored observations the same way as the survivors.

Note:

1. Cohort is defined as a group of individuals or subjects that enter a study as the same time (normally in the same year).

NFSUG '93 Proceedings

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download