Exposure effects in chronic air pollution studies: We have ...



The interpretation of exposure effect estimates in chronic air pollution studies

Sebastien J-P.A. Haneuse, PhD,1 Jon Wakefield PhD, 2,3 Lianne Sheppard, 2,4 PhD.

1 Center for Health Studies, Group Health Cooperative, Seattle, WA.

2 Department of Biostatistics, University of Washington, Seattle, WA.

3 Department of Statistics, University of Washington, Seattle, WA.

4 Department of Environmental and Occupational Health Sciences, University of Washington, Seattle, WA.

Institutional Address: Center for Health Studies,

Group Health Cooperative,

1730 Minor Ave, Suite 1600,

Seattle, WA 98101-1448, USA.

Corresponding author: Sebastien J-P.A. Haneuse,

Center for Health Studies, Group Health Cooperative,

1730 Minor Ave, Suite 1600,

Seattle, WA 98101-1448, USA.

Tel: (206) 285 2005

Fax: (206) 287 2871

e-mail: haneuse.s@

Running title: Interpretation of chronic effects studies.

Keywords: Air pollution, chronic effects studies, Cox PH regression,

exposure, model misspecification.

Acknowledgements: This work was funded by UW/EPA NW Research Center for Particulate Air Pollution and Health (CR827355), and the second author was supported by grant R01 CA095994 from the National Institutes of Health.

Outline of section headers:

1. Abstract.

2. Ecological exposure assessment.

3. Introduction.

4. Cox’s proportional hazards model.

5. Individual-level models.

6. Insensitivity analysis.

7. Discussion.

Abstract

In this article, we consider the interpretation of regression parameters used to represent “chronic” or “long-term” air pollution exposure effects. Studies that investigate such effects have generally been semi-ecological; outcomes and confounder/adjustment variables are available at the level of the individual, while exposures are only observed at the aggregate or group level. The use of this design has been driven by the lack of adequate long-term individual-level air pollution information. We assume that primary interest lies in understanding individual-level associations, and consequently a precise interpretation of results from a semi-ecological design must take into account the aggregated nature, both spatial and temporal, of the exposure measure. We suggest that this can only be done within the context of an individual-level model, which is specified on the basis of an underlying question of interest. We outline one possible framework and discuss several key components. The most common analysis approach for assessing chronic exposure effects has been Cox's proportional hazards model. We revisit the underlying assumptions of this model and discuss the implications of two common aspects of chronic effects studies: time-varying exposures and time-varying effects. Focusing on the consequences of temporal aggregation of exposure, we show that an estimate obtained from a time-aggregated semi-ecological design can correspond to very different time-varying exposure and risk scenarios. Further, distinguishing which of these is correct is not possible from the semi-ecological data alone. Suggestions for designs that allow discrimination between competing exposure-risk models are briefly discussed.

1. Introduction

The assessment of the impact of air pollution on health outcomes may broadly be classified into two groups; acute and chronic effect studies. Although the air pollution taxonomy is not always clear cut (Vedal, 1997), acute effects generally refer to the short-term impact of exposure to air pollution (e.g., Nafstad et al, 2003). Typically, a time-series framework is used where daily morbidity or mortality counts are related to current and lagged daily assessments of air pollution. Recently, there has been considerable research on methodological issues in the assessment of acute effects, particularly in the context of times series studies (see Dominici et al., 2003, and references therein). This has resulted in several advances including the combination of information from multiple sources (e.g., multiple areas and/or separate sources for pollutant, outcome and confounder data), estimation of health effects taking into account model uncertainties, the consequences of measurement error, and the estimation of dose-response curves. Each of these advances adds to the totality of evidence for a causal interpretation (in terms of an individual’s risk of outcome) for the observed impact of short-term exposure to air pollution.

In contrast, chronic effects studies typically refer to the impact of medium or long-term average exposure, often over many years (e.g., Dockery et al., 1993; Pope et al., 1995; Abbey et al., 1999; Peters et al., 1999; Avol et al., 2001; Pope et al., 2002). The assessment of chronic effects has generally been based on longitudinal cohort studies where mortality and morbidity outcomes are compared across communities, such as cities or metropolitan areas, which exhibit varying levels of air pollution concentration. Such studies may be described as semi-ecological, in the sense that, while outcome and confounder/adjustment variables are available at the level of the individual, the design employs a group-level assessment of exposure.

Compared to developments for acute effects studies, less work has been done towards identifying and addressing methodological issues associated with chronic effects studies. Kunzli et al (2001) consider the assessment of deaths attributable to air pollution and argue that estimates based on chronic effect studies, rather than those based on acute effect studies, are more appropriate in this respect. In this paper, we examine the interpretation of commonly used statistical models for chronic effects studies. Cox’s proportional hazards (PH) model has often been used for analysis, and following Chiogna and Bellini (2002), it is important to distinguish two issues: (i) the existence of a statistical association between air pollution and adverse health outcomes and (ii) a clear understanding of the nature of the association, if present. The current literature on the study of chronic effects generally provides insight towards the first issue. The focus of this paper is to consider the extent to which the second issue may be addressed on the basis of results obtained from commonly used design and analysis strategies. Specifically, we examine the interpretation of regression parameters in the Cox model in the context of two common features of chronic or long-term air pollution studies; (i) exposure is generally only observed at the aggregate or group level, and (ii) the proportional hazards assumption in the Cox model.

Throughout this work, we consider the scientific goal to be the assessment of individual-level effects. Hence, the precise interpretation of parameter estimates must be made within the context of individual-level associations. In terms of such associations, the misclassified (or summarized) nature of the exposure assessment and its’ impact on the interpretation of the results has rarely been addressed. The remainder of the paper is as follows: Section 2 provides a brief discussion on the semi-ecological nature of exposure assessment, and Section 3 provides a review of Cox’s PH model, paying particular attention to the underlying assumptions and their impact on both model interpretation, and bias in parameter estimation. In addition, we examine the use of the PH model in two published reports and discuss the underlying assumptions of the analyses contained in these reports. In Section 4 we discuss the formulation of individual-level models in the context of chronic effects studies and in Section 5 provide a brief illustration of the potential pitfalls of ignoring the underlying time-varying individual-level model. Section 6 contains a concluding discussion.

2. Ecological exposure assessment.

A common feature of chronic effects air pollution studies is that exposure is generally observed as an aggregated or group-level summary, while outcomes and confounders are available at the level of the individual. Kunzli and Tager (1997) refer to this design as a semi-individual study. We prefer the term “semi-ecological”, as this better reflects the extent of available information for the exposure of interest. In terms of methodological and inferential properties, Kunzli and Tager (1997) suggest that an analysis based on a semi-ecological study design is more closely related to an analysis based on an individual-level study than a purely ecological study. Since the implications of within-area variability in confounders need not be considered, the semi-ecological study is clearly superior to an ecological study. However, in contrast to the recent intense focus on the pitfalls of purely ecological studies (for example, Greenland and Robins, 1994; Morgenstern, 1998; Wakefield, 2003), more research is needed to shed light on the exact consequences of only having group-level exposure measures available. In particular, while recognizing that assessments of exposure from point locations are often the only feasible measurements in environmental epidemiology (due to financial, technological, or logistical reasons), it is also important to recognize their limitations and to be as precise as possible in the interpretation of results.

Depending on the nature of exposure measurement, information that is critical to addressing a scientific question may be lost. Misclassification of exposure may occur as a consequence of the spatial and/or temporal collection process. Spatial misclassification occurs when individual exposures are assigned on the basis of geographic location with a single exposure measure (e.g. concentration from a single monitor) being assumed for all individuals in any given area. This form of misclassification, and corresponding loss of information, is explicitly related to traditional ideas of ecological bias (Greenland and Robins, 1994; Morgenstern, 1998; Wakefield, 2003). Temporal aggregation occurs when information from repeated concentration measures taken over time is reduced to a single summary measure. A typical example is to compute the average concentration over a specified time period. A key distinction between the two types of summarization is that spatial aggregation is across individuals (because individuals are nested within areas) while temporal aggregation is within individuals and results in the same individuals appearing in different time periods. Consequently, one might not expect the impact of summarization to be equivalent for the two types. While recognizing the importance of accounting for spatial summarization, in this article we focus on the consequences of temporal aggregation of exposure measures. One motivation for this is that temporal variation is likely to dominate spatial variability for individuals within a particular city who are assigned a spatially and temporally summarized exposure.

3. Cox’s proportional hazards model.

Since the first major longitudinal studies in the early 1990's researchers have employed Cox's proportional hazards model (Cox, 1972) to elucidate the effects of long-term exposure to air pollution on mortality. In this section we briefly review the basic Cox model, including key underlying assumptions and parameter interpretation. To understand the impact of violations of the assumptions, some knowledge of how parameter estimation is performed is useful. While we focus on the Cox model, other models, such as linear or logistic regression models, have been used for a variety of outcomes (Peters et al, 1999; Avol et al, 2001), and the issues we outline below also apply to these as well.

Assumptions, Interpretation and Estimation

Consider modeling the time to an event, denoted T, where the event may refer to death or, the development of a respiratory condition. For simplicity, suppose interest lies in a single exposure, X, and the association between X and T. A common approach for addressing this association is to model the hazard function, which may be interpreted as the instantaneous risk of failure at time t given survival until that time. Cox’s proportional hazards model has the form

[pic],

where [pic] is the baseline hazard function over time, t, and β denotes the regression parameter associated with exposure X. It is assumed that there is a well-defined origin at which follow-up begins for all subjects and at which the explanatory exposure variable, X, is measured. In an effort to distinguish this specific form of the model, from the many extensions that have been proposed, in this article we refer to the above as the standard Cox model.

In general, the interpretation of parameters in any given model is driven by several factors. These include, but are not limited to, the functional form (in our case this refers, in part, to the use of the exponential function), the definition of the exposures of interest (e.g., cumulative over a specified period, or the amount of time that a pollutant is above a threshold), the inclusion of confounder and adjustment variables, and the characteristics of the sampling scheme. In the proportional hazards model the interpretation of exp(β) may be obtained via a comparison of the hazard between two individuals whose X value differ by 1 unit. Taking the ratio of the hazards comparing two such individuals yields

[pic],

which is independent of the time. Hence the proportional hazards assumption means that the hazard ratio or relative risk between the two individuals (defined on the basis of X measured at baseline) is the same regardless of the time at which the comparison is being made. This well-known implicit assumption has been found to be empirically useful in a large number of applications, particularly clinical trials in which the time scale is relatively short. However, in the context of a model for chronic air pollution effects this assumption may not be as appropriate. In particular, the assumption implies that the impact on risk of an increase in X is the same both 1 day and 15 years after the increase. As we outline below, it may be more reasonable to assume different effects at different time lags, if the exposure information is extensive enough to allow such models to be fitted.

The comparison used to establish the interpretation of the regression parameter β is often referred to as a contrast. These contrasts also form the basis of estimation of the parameter, which proceeds as follows. Initially, all times at which an event is observed to occur are identified. At each such time, a risk set which is composed of all individuals who were at risk to be observed to fail, but did not, is determined. Estimation of the parameter is then based on a comparison of exposure values between the individual that failed and the exposure values of those individuals in the corresponding risk set. Heuristically, if the individual that failed had a large exposure value relative to the exposure values in the risk set then the value of β will be large. The estimate of β combines information from all such contrasts at each failure time.

Application in chronic effects studies.

We illustrate the use of the Cox model and the interpretation of the hazard ratio parameters in the context of two published studies; Dockery et al (1993) which reports on results of the Harvard Six Cities Study, and, more recently, Pope et al (2002) which reports on results from the American Cancer Society (ACS) Cancer Prevention Study (CPS) II. Examples of other papers that employ Cox regression for the assessment of chronic effects include Pope et al (1995), Abbey et al (1999), Hoek et al (2002), Pope et al (2002), and Nafstad et al (2003).

The Harvard Six Cities Study enrolled 8111 individuals in 1974 with 14-16 years of follow-up (Dockery et al. 1993). The motivation for the study was that previous studies found associations between long-term exposure to air pollution and mortality but failed to account for potentially strong confounders, and in particular smoking. Study participants provided baseline information regarding sex, age, smoking habits, as well as other risk factors. They were also asked to complete annual questionnaires in order to monitor health outcomes. For a range of pollutants two sets of exposure measures were considered: city-specific indicator variables and city-specific pollution concentrations. The latter were computed by taking the mean concentration over the time period for which air pollution measurements were available for each of the cities. The time periods of measurement varied by pollutant, and none corresponded to the entire 14-16 year follow-up period. Furthermore, although annual concentrations were available, the time-dependent measures were aggregated to a single mean. The mean concentrations were then included in a standard Cox regression as if they were obtained at the start of observation. Unless exposure is constant over time, this can lead to biased estimates of exposure effects. To see this, from the above discussion, note that the contrasts that form the basis of estimation of exposure effects depend in part on future exposure values. That is, the model is using future exposure values to assess current risk.

The second example is from the American Cancer Society (ACS) Cancer Prevention Study (CPS) II (Pope et al., 2002). This study enrolled approximately 1.2 million individuals from 151 metropolitan areas in 1982, with follow-up continuing until December 31, 1998. Individual level risk factor data, based on completed questionnaires at baseline, were available for 552,138 participants and were linked to air pollution data for the metropolitan area of residence at baseline. Information on a variety of airborne pollutants was available for 1979-83 from the Inhalable Particle Monitoring Network, including PM2.5 (particulate matter with aerodynamic diameter less than 2.5 microns) which was the main exposure of interest in the report. In addition, data on PM2.5 for 1999-2000 was available from the Environmental Protection Agency Aerometric Information Retrieval System (AIRS) database. On the basis of the available information, the authors considered 3 fine particle exposure variables; (a) PM2.5(1979-83), the five-year mean PM2.5 concentration during 1979 to 1983, (b) PM2.5(1999-2000), the mean PM2.5 concentration during 1999 and the first three quarters of 2000, and (c) PM2.5(average), the average of PM2.5(1979-83) and PM2.5(1999-2000). The latter was taken to represent the mean PM2.5 concentration during the period when PM2.5 measurements were not directly available. The primary analysis of the report is based on a two-stage approach. The first stage involved fitting a standard Cox regression with indicator variables for each of the 151 metropolitan areas. The second stage involves the use of linear random effects models to relate the city-specific intercepts (from the first stage) to each of the three exposures. The slope from the second stage analysis is then interpreted as the hazard ratio associated with exposure. (Although we focus on this model, we note that the authors considered extensions in an effort to account for spatial autocorrelation between the 151 city-specific intercepts.) While recognizing the practical difficulty of having limited exposure data, the use of the PM2.5(1999-2000) variable in an analysis for outcomes observed from 1982-1998, provides another example of using exposure defined in the future.

Although the imputation of the mean exposure from future time periods can be problematic, Dockery et al (1993) argue that due to long-term transport and large-scale mixing the concentrations would be relatively uniform. This argument can be interpreted as assuming that an individual’s exposure distribution over time is constant. Consequently, the contrasts in “exposure” between cities can be interpreted as uniform shifts in exposure distributions, over the entire follow-up period. This would suggest that the mean future exposure they use would be a good surrogate for exposure at the origin. In this case, one may expect the bias in the estimates of β to be small. If, however, the exposure distribution is not constant then it is no longer clear how one might interpret a unit change in mean exposure. Such a contrast between means can arise in an infinite number of different ways when we allow differences in the entire time-varying exposure distributions, as we illustrate in Section 4. Since model parameters are defined in terms of these contrasts, it would therefore be very difficult to interpret the corresponding coefficient.

4. Individual-level models

In both of the above examples, a key study feature was the aggregated nature of the exposure variable. A consequence of this is that, while outcome and confounder information is available at the level of the individual, the overall analysis must be viewed at the group-level (Sheppard, 2003). Situations where group-level associations are of direct interest include regional policy development, where assessment at the ecological level may be more appropriate for the determination of suitable regional air pollution standards. Typically, however, primary scientific interest lies at the level of the individual. Following the discussion of Section 2, one may view the assessment and interpretation of air pollution effects as a contrast between two (possibly hypothetical) individuals who differ in exposure, but remain equivalent in every other aspect. When individual-level exposure information is not available, however, and only an ecological assessment is employed, it is natural to ask which individuals we are referring to when the contrasts are performed. The issue is complicated by the summarization of exposure over both space and time.

To investigate the challenges of not having individual-level data available, we consider the following strategy. Initially, postulate an individual level model whose components reflect the underlying question of interest. This model is viewed as being scientifically relevant and would form the basis of inference under ideal exposure measurement scenarios. Given this model, consider the impact of having limited exposure data available for the estimation of parameters in the individual level model. Other examples of the use of this approach include Prentice and Sheppard (1995), Sheppard (2003) and Wakefield (2003). Such an individual-level model has not been explicitly considered in the chronic effects literature and requires care in its specification, as we illustrate shortly. In the following we do not build the individual level model. Rather we introduce a general framework within which such an individual level model, which will depend on the particular application, may be specified. In this framework, we consider the time period over which chronic effects are assumed to propagate and allow the effect parameter to vary over time. The latter may be used to represent scenarios in which, for example, transient or irreversible effects may be present.

Before presenting the general framework, it is important to consider the unit of time for the study. While an individuals exposure to air pollution occurs on a continuous basis, when developing an individual-level model one must establish the lowest resolution at which exposure measurement is relevant. Since chronic effects are presumed to propagate over the course of many years, it is unclear that the measurement of daily exposure variation is necessary. While the following notation is generic, in Section 4 we take the unit of time to be a year, although in practice the choice will depend heavily on the specific application. We note that the issue of time resolution provides a key distinction between acute effect studies, in which daily variation is of direct interest, and chronic effects studies.

As pointed out above, one of the main limitations of many applications of Cox’s proportional hazards model for chronic effects is the lack of inclusion of time-varying exposure. Consider a study with K areas and let Xk(s) denote exposure in area k, k = 1, …, K, at time s. At any given time t, we assume that individual i in area k will have an entire exposure history until time t: Hki(t) = {Xk(s); s ≤ t}, for i=1,…, Nk so that Nk is the number of such individuals in area k. For simplicity and towards isolating issues of temporal aggregation we have assumed constant exposure across all individuals within an area. Issues of spatial aggregation may be addressed by extending this notation to include a subject-specific index on the exposure measurement (i.e. Xki(s)). In specific applications, when developing an individual level model, it will be important to consider which aspects of exposure are relevant for the development of adverse health effects. For example, in addition to ambient concentration, it may be of interest to incorporate recent intensity of exposure, exposure levels above some threshold or time since exposure cessation.

A second limitation of the standard Cox model is the absence of time-varying effects. In chronic effects studies it may not be reasonable to assume that the long-term effects due to elevated exposure at some specific time point, such as the origin, propagate at constant levels. Rather, a reasonable hypothesis might be that elevated exposure doesn't influence risk immediately following exposure (a latency period) and although the risk may increase for a short period of time, in the long run there is no influence so that the effects are reversible (e.g., Blot and Fraumeni, 1996). Towards a more flexible modeling framework, let β(t;s) denote the influence of exposure at time s on risk at time t. At any given time t, the full set of parameters representing influence of prior exposure on current risk is referred to as the risk model or association paradigm and is denoted by B(t) = {β(t;s),s(t}. Further examples of association paradigms for time-varying risk models include delayed effects, diminishing effects over time, recovery after exposure cessation, and irreparable damage. This approach follows closely that of Bandeen-Roche et al (1999) (see also Breslow et al, 1983).

Thus, for a given choice of time scale, exposure and risk model, a general form of the model for the hazard function is given by

[pic].

5. Insensitivity analysis

We now present an insensitivity analysis in which we show that results obtained from a semi-ecological analysis may arise from a broad range of underlying time-varying exposure-risk models. The genesis of our terminology is a sensitivity analysis (e.g. Rothman and Greenland 1998, Chapter 19) in which a range of individual-level models are specified and the variation in results is examined.

In Section 4 we outlined a strategy for investigating the impact of only having group-level exposure data, with the first step being the development of an individual-level model. In this section we consider the second component of the strategy and illustrate the pitfalls of not having individual-level data. The aim of the insensitivity analysis approach is to show that under a variety of assumptions regarding the underlying time-varying individual-level model (including both exposure history and risk model), a standard Cox regression analysis based on a semi-ecological (time-aggregated) assessment of exposure will produce the same results, that is the same estimates.

The general set-up of the study assumes a hypothetical air pollution study based on 100 areas. To allow prior exposure to impact risk during the observation period, we generate individual-level data over a 60-year period, with the final 20 years taken to be the observation period. Throughout, and for simplicity, we assume the underlying hazard is constant in time, with λ0 = 0.0035. This corresponds to a 60-year survival rate, in the absence of exposure, of approximately 81%. Finally, we assume that censoring occurs only at the end of the observation period. Note that for simplicity our set-up does not include any additional confounder or adjustment variables. Furthermore, the aggregation is over time and the exposure is constant across space within areas.

To examine the impact of time-varying exposures we consider four sets of underlying exposure scenarios over the 60-year period. Figure 1 provides a graphical summary of the exposure trajectories under each scenario for a sample of 5 areas. The first assumes uniform (constant) exposure throughout the 60 years. The second scenario assumes exposure levels were constant for the first 40 years, while during the final 20 years the levels have been decreasing. In this scenario, areas for which exposure was highest during the first 40 years experienced greater rates of decrease during the final 20 years. Such a scenario could correspond to the increased effort on the part of highly polluted areas to conform to the levels of remaining areas. The third scenario assumes pollution levels to be decreasing over the entire 60 years, again with the most polluted areas exhibiting greater rates of decrease. The final scenario assumes levels to be decreasing over the entire 60 years, although the rate of decline is the same for all areas. Under each scenario, the scale of the pollutant measure was taken to resemble the scale of variation in “total particles” reported by Dockery et al (1993) (see their Figure 1). For the decreasing exposures, the rates of decrease were also taken to resemble the observed rates across the six cities reported by Dockery et al (1993). While the overall exposure trajectories under the four scenarios are different, the levels have been specified to ensure that during the final 20 years of follow-up the average exposure in any given area is the same across all four scenarios. Specifically mean exposure during the final 20 years varies uniformly from 50 units to 100 units, across the 100 areas, and we assume no within-area variation. For both the second and third exposure scenarios exposure during year 40 is fixed to range from 55 to 140 units and from 45 to 60 units during year 60. For the final scenario the ranges during years 40 and 60 are from 75 to 125 units and 25 to 75 units respectively. In Figure 1 the outer two lines correspond to the minimum and maximum scenarios.

Figure 1 here

To examine the impact of time-varying effects we consider five risk models. The first assumes risk at time t is affected by exposure solely at time t. Thus, β(t;s) = β1 for s=t and zero otherwise, to give

[pic].

This model may loosely be interpreted as an acute effect with no lag, since it is only current exposure that determines disease risk. The second risk model assumes that risk is determined by cumulative exposure during the previous 5 years, so that β(t;s) = β2 for t-5 < s ≤ t, and zero otherwise to give

[pic].

Hence at any time the total exposure accrued over the previous 5 years is relevant, with the exposures in the previous periods assumed to be irrelevant. The third model attempts to capture lagged effects where risk is determined over a 5 year period, but with a 5-year lag. Under this model we have β(t;s) = β3 for t-10 < s ≤ t-5, and zero otherwise:

[pic].

Fourthly, we consider effects that propagate over much longer periods by assuming that risk is determined by cumulative exposure during the previous 20 years, so that β(t;s) = β4 for t-20 < s ≤ t and zero otherwise:

[pic].

Finally, we examine a model where risk is determined by exposure over the previous 10 years, although the effect of exposure is double during the most recent 5 years. Under this model we have β(t;s) = β5 for t-10 < s ≤ t-5, β(t;s) = 2β5 for t-5 < s ≤ t, and zero otherwise:

[pic].

Figure 2 provides a graphical illustration of these risk models.

Figure 2 here

To illustrate the loss of information that results from aggregation of the exposure over time, we consider each of these 20 exposure/risk scenarios and fit a semi-ecological model. The latter is taken to be a function of the area-specific mean exposure, over the entire 20-year observation period, denoted [pic], for k = 1,…,100 areas. Specifically, the form of the fitted semi-ecological proportional hazards model is

[pic]. (1)

Note that, as indicated above, under each of the four exposure scenarios the same set of area-specific means, [pic], k = 1,…,100, are obtained. Further, although [pic] is a summary of exposure between years 40 and 60, this model implicitly assumes [pic] is measured at baseline, here year 40. In this model we have used the notation γ to emphasize the difference between the “exposure effect” in the semi-ecological model, from the general exposure-risk model containing parameters {β(t;s); s ≤ t }. The key issue here is that, while primary interest lies in β, by fitting the semi-ecological model we are only estimating γ. The question then becomes the extent to which knowing γ provides any information regarding β.

Under a properly specified model, standard estimation procedures for Cox’s proportional hazards model, based on partial likelihood, yield consistent estimation of the regression coefficients. Consistency refers to the concept that as the sample size increases the value of the estimate converges to the true underlying value of the parameter. In other words, the partial likelihood estimates for a correctly specified Cox’s proportional hazards model are unbiased in large samples. For an improperly specified model estimates will generally be inconsistent (see e.g. Ford et al, 1995), and in such situations it is desirable to be able to quantify the degree of large-sample bias. To determine large-sample bias of γ as an estimate of β, rather than employing simulation techniques we consider estimation based on a single very large dataset. This provides a means of understanding exactly what is being estimated when we fit a semi-ecological model. We therefore report on results based on assuming a sample size of Nk = 50,000 in each of 100 areas providing a total sample size of 5 million individuals. With such a large sample size, issues of sampling variability (which are typically addressed via simulation) may be ignored.

Table 1 here

For each of the 20 exposure-risk scenarios the data were generated as follows. A particular value of β was chosen and data were simulated under this value. Model (1) was then fitted and the coefficient ( was estimated. This procedure was repeated until a value of β corresponding to a 10-unit increase in mean exposure of exp(10() = 1.05, was found. Table 1 provides the results, and we note that the heights in Figure 2 correspond to values of exp(β) associated with the third exposure history (decreasing; converging). We reiterate that we seek to determine what exactly is being estimated when a semi-ecological model is fitted, under a variety of individual-level time-varying models.

Generally, when the semi-ecological and individual-level models do not coincide, then one cannot expect the estimates from a semi-ecological analysis to correspond to the values of the parameters in the individual-level time-varying model. Under certain special cases, however, one can obtain a direct correspondence between the two models. For example, under the first exposure scenario we have uniform (constant) exposure, so that [pic] for all s. Under the first (current exposure) risk model, the individual-level model then reduces to

[pic],

in which case this model is equivalent to the semi-ecological model above, and therefore would expect the large-sample estimate of γ to equal the true underlying β. This can be seen in the first element of Table 1 where the true underlying model is equivalent to the ecological model. We also see that for the remaining scenarios under the current exposure risk model there is little impact, and we get approximately the correct answer by fitting the semi-ecological model. Under the uniform exposure scenario but with risk determined by cumulative exposure over the previous 5 years, we see a substantial difference between the semi-ecological and true individual-level time-varying model parameters. However, in this case one can (at least approximately) recover the individual-level parameter from the semi-ecological estimate. To see this note that under this exposure-risk model we have

[pic].

The final inequality indicates that (at least approximately) we have γ ( 5β, and the corresponding value in Table 1 shows this to be the case. The final inequality is not exact since times before t=5 will not have accrued enough exposure (since time zero) to have a complete history. However, using this approach requires additional information in the form of the specification of the period over which risk is determined by previous exposure (in this case the previous 5 years). The same ideas may be applied to each of the scenarios where exposure history is uniform except for the final risk model where one cannot recover the individual-level parameter. Here, further information (and in particular the exact form of the risk model) is required.

The results for the exposure scenario where the trajectories are decreasing in a parallel manner mimic those of the uniform exposure history. To see why this is the case we refer back to the above discussion regarding estimation of coefficients in the proportional hazards model. In particular, for the uniform exposure history we find that exposure contrasts based on the area-specific means [pic] are equivalent throughout the entire time period. That is, differences in exposure levels across the areas are constant throughout the 20-year observation period. For the decreasing/parallel exposure history the same phenomenon occurs; although exposures are decreasing, between-area differences remain constant. Hence contrasts where exposure history was uniform and where exposure history was decreasing/parallel are equivalent, similarly estimation in the semi-ecological models is the same. In contrast, where differences across time in exposure between cities do not remain constant, as for the second and third exposure histories, one cannot hope to recover the individual-level parameters without further information about the form of the individual-level model.

Overall, from Table 1, we find that the results from a single semi-ecological analysis correspond to a broad range of very different underlying individual-level models. A primary difficulty, however, is that with only time-aggregated exposure assessment, one cannot distinguish between the 20 possible exposure-risk scenarios. If exposure over time is available then including this time-varying history offers the opportunity of fitting a selection of models, and then choosing between them on the basis of model checking diagnostics. Distributed lag models (e.g. Diggle et al, 2002) may be helpful in this regard, by embedding a number of plausible models within a single framework.

6. Discussion

A key difficulty in assessing the effects of long-term exposure to air pollution is the lack of high-quality individual-level data. Logistical and ethical considerations often impact the availability of individual-level data, and the only recourse for a researcher is to base their analyses on group-level data. Published work has served to confirm links between long-term air pollution exposure and adverse health events, and has been influential in determining the public health impact (e.g. Kunzli et al, 2000) and in shaping policy (e.g. EPA Particulate Matter Staff Paper, 2005). However, it is important to also consider some of the limitations inherent in the loss of information associated with aggregate or group-level exposure data. Here, we examine the interpretation of statistical models and, in particular, in the context of broadening the understanding of the exact nature of the individual-level associations.

In this paper we have sought to clarify basic statistical assumptions within the Cox proportional hazards model framework that are required for the interpretation of model parameters. The primary issue that we seek to highlight in this paper relates to the consequences of reducing information regarding a time-varying exposure, such as air pollution exposure, to a single summary statistic (such as the mean), and assuming constant effects over time. By focusing on the impact of temporal aggregation, we have shown that a broad range of individual-level models may lead to the same ecological or semi-ecological estimates of parameters that have been used to represent chronic effects.  Our work builds on similar issues raised by Krewski et al (2000) and supports the notion that cohort studies estimate both acute and chronic effects (e.g. Kunzli et al, 2001). Further, the loss of information associated with aggregation renders the task of choosing between competing individual-level models very difficult. In terms of interpreting results from published reports one can, at best, interpret estimates of association as being mixtures of a variety of effects over different time periods. However, unless one is willing to make strong assumptions, neither the components of the mixture nor the nature of the mixture may be recovered.

The loss of information is similar in nature to that found in ecological studies in spatial epidemiology. A key difference, however, is that in spatial epidemiology exposures are aggregated across individuals within specific areas and the effect parameters are generally assumed to be constant across all areas. In the context of temporal aggregation that we consider here, we have aggregation of exposure across time and with (potentially) different effect parameters in different time periods. To enhance understanding of the role of temporal aggregation, we have ignored the issue of spatial summarization by assuming that the exposure in a given area is constant across all individuals within the area. Realistic individual-level models will allow exposure to vary within areas, and as a result ecological bias, in its traditional sense, must also be considered.

Although the issues we have dealt with focus on the interpretation of model parameters, they should be considered within a broader scope for the interpretation of results. For example, as the effects of long-term exposure to air pollution are fairly small, an important consideration is the adequate control of potential confounding variables. Of particular concern is the need to control for time-dependent confounding, such as smoking patterns (which may be associated with an individual’s exposure). This requires considerable care, particularly if the response at a given time predicts the covariate at a future time (which may be relevant if the response of interest is a continuous measure of lung function, for example). Diggle et al (2002, Chapter 12) contains details of the pitfalls of the modeling of time-varying exposures/confounders.

To obtain an understanding of the etiological nature of the relationship between air pollution and health outcomes we need rich datasets and a focus on specific scientific hypotheses that drive such studies. In the statistical literature a number of authors including Bandeen-Roche et al (1999) and Breslow et al. (1983) have attempted to consider more general exposure-risk models. Such models may form the basis for more realistic investigations into the nature of the association, if it exists, between long-term exposure to air pollution and adverse health outcomes. Generally it is clear that longitudinal exposure information is required, including information that precedes the start of the observation period of the cohort.

To summarize: the assessment of the impact of chronic exposure to air pollution has primarily been assessed by employing a semi-ecological longitudinal cohort study design; this choice has been driven by the lack of adequate long-term, individual-level exposure information, and hence relies on between-community contrasts of time-averaged exposure as the basis for estimation. While ecological or summarized measures of air pollution are often the only feasible means of assessing exposure, it is critical to account for the loss of information, both temporal and spatial, when attempting to interpret results from an individual or etiological perspective. From the work in this paper, the message in the temporal setting is clear: the estimate from a single exposure for an area across time is consistent with both multiple different exposure distributions across time, and multiple effect parameter distributions across time.

References

Abbey D, Nishino N, McDonnell W, Burchette R, Knutsen S, Beeson W, et al. 1999. Longterm inhalable particles and other air pollutants related to mortality in nonsmokers. American Journal of Respiratory and Critical Care Medicine 159:373-382.

Avol E, Gauderman W, Tan S, London S, Peters JM. 2001. Respiratory effects of relocating to areas of differing air pollution levels. American Journal of Respiratory and Critical Care Medicine 164:2067-2072.

Bandeen-Roche K, Hall CB, Stewart WF, Zeger SL. 1999. Modeling disease progression in terms of exposure history. Statistics in Medicine 18:2899-2916.

Blot W, Fraumeni J. 1996. Cancer of the lung and pleura. In: Cancer Epidemiology (Schottenfeld D and Fraumeni J, eds). 2nd ed. Oxford University Press, 637-665.

Breslow N, Lubin J, Marek, P, Langholz, B. 1983. Multiplicative models and cohort analysis. Journal of the American Statistical Society 78:1-11.

Chiogna M, Bellini P. 2002. Alternative air pollution measures for detecting short-term health effects in epidemiological studies. Environmetrics 13:55-69.

Cox DR. 1972. Regression models and life tables (with discussion) Journal of the Royal Staististical Society, Series B 34:187-220.

Dockery D, Pope A, Xu X, Spengler J, Ware J, Fay M, et al. 1993. An association between air pollution and mortality in six US cities. NEJM 329: 1753-1759.

Diggle PJ, Heagerty P, Liang K-Y, Zeger SL (2002). Analysis of Longitudinal Data, Second Edition. Oxford University Press.

Dominici F, Sheppard L, Clyde M. 2003. Health effects of air pollution: A statistical review. International Statistical Review 71: 243-276.

U.S. Environmental Protection Agency (EPA). Review of the National Ambient Air Quality Standards for Particulate Matter: Policy assessment of scientific and technical information. OAQPS Staff Paper, EPA-452/R-05-005a, December 2005.

Ford I, Norrie J, Ahmadi S. 1995. Model inconsistency, illustrated by the Cox proportional hazards model. Statistics in Medicine 14:735-746.

Greenland S, Robins J. 1994. Ecologic studies: Biases, misconceptions, and counterexamples (Disc: p761-771). American Journal of Epidemiology 139:747-760.

Hoek G, Brunekreef B, Gouldhohm S, Fischer P, van den Brandt P. 2002. Association between mortality and indicators of trans-related air pollution in the Netherlands: a cohort study. Lancet 360, 15:1203-1209.

Krewski D, Burnett, RT, Goldberg MS, Hoover K, Siemiatycki J, Jerrett M, Abrahamowicz, White WH and Others 2000. Reanalysis of the Harvard Six Cities Study and the American Cancer Society Study of Particulate Air Pollution and Mortality: Part II Sensitivity Analyses. A Special Report of the Health Effects Institute’s Particle Epidemiology Reanalysis Project.

Kunzli N, Tager IB. 1997. The semi-individual study in air pollution epidemiology: A valid design as compared to ecological studies. Environmental Health Perspectives 105:1078-1083.

Kunzli N, Kaiser R, Medina S, Studnicka M, Chanel O Filliger P, Herry M, Horak F, Puybonnieux-Texier V, Quenel P, Schneider J, Seethaler R, Vergnaud J-C, Sommer H. 2000. Public-health impact of outdoor and traffic-related air pollution: a European assessment. The Lancet 356:795-801.

Kunzli N, Medina S, Kaiser R, Quenel P, Horak F, Studnicka M. 2001. Assessment of deaths attributable to air pollution: Should we use risk estimates based on time series or on cohort studies? American Journal of Epidemiology 153 (11): 1050-1055.

Morgenstern H. 1998. Ecological studies. In: Modern Epidemiology (Rothman K, Greenland S, eds). 2nd ed. Lippincott, Williams and Wilkins, 459-480.

Nafstad P, Haheim L, Oftedal B, Gram F, Holme I, Hjermann I, et al. 2003. Lung cancer and air pollution: A 27 year follow up of 16,209 Norwegian men. Thorax 58: 1071-1076.

Peters J, Avol E, Navidi W, London S, Gauderman W, Lurmann F, et al. 1999. A study fo twelve southern California communities with differing levels and types of air pollution: I. prevalence of respiratory morbidity. American Journal of Respiratory and Critical Care Medicine 159: 760-767.

Pope A, Thun M, Namboodiri M, Dockery D, Evans J, Speizer F, et al. 1995. Particulate air pollution as a predictor of mortality in a prospective study of U.S. adults. American Journal of Respiratory and Critical Care Medicine 151:669-674.

Pope C, Burnett R, Thun M, Calle E, Krewski D, Ito K, et al. 2002. Lung cancer, cardiopulmonary mortality and long-term exposure to fine particulate air pollution. JAMA 287:1132-1141.

Prentice RL, Sheppard L. 1995. Aggregate data studies of disease risk factors. Biometrika 82:113-125.

Rothman KJ, Greenland S 1998. Modern Epidemiology, Second Edition. Lippincott, Williams and Wilkins, Philadelphia.

Sheppard L. 2003. Insights on bias and information in group-level studies. Biostatistics 4:265-278.

Vedal S. 1997. Ambient particles and health: Lines that divide. Journal of the Air and Waste Management Association 47:551-581.

Wakefield J. 2003. Sensitivity analyses for ecological regression. Biometrics 59:9-17.

Figure 1: Four exposure history scenarios, for a sample of 5 areas from 100, considered in the insensitivity analysis. Exposure occurs over 60 years, although study observation only occurs during the final 20. Area-specific mean exposure during the 20-year observation period is fixed to be equal across each area in each of the scenarios[pic]

Figure 2: Illustration of five risk models; current exposure, previous 5 years, 5 years with 5 year lag, previous 20 years and changing risk.

[pic]

Table 1: Results from the insensitivity analysis; value of the relative risk associated with a 10-unit increase in exposure, exp(Σs10((s)), such that the corresponding relative risk estimate from an ecological analysis is exp(10() = 1.05.

| |Risk Model |

| Exposure | Current |Previous |5 years with |Previous |Changing |

|History |exposure |5 years |5 year lag |20 years |risk |

| |[pic] |[pic] |[pic] |[pic] |[pic] |

|Uniform |1.0513 |1.0100 |1.0101 |1.0025 |1.0033 |

|Uniform; decreasing |1.0493 |1.0086 |1.0069 |1.0017 |1.0026 |

|Decreasing (converging) |1.0495 |1.0085 |1.0065 |1.0014 |1.0026 |

|Decreasing (parallel) |1.0517 |1.0101 |1.0101 |1.0025 |1.0033 |

-----------------------

1

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download