How D-I-D you do that? Basic Difference-in-Differences ...

How D-I-D you do that? Basic Difference-in-Differences Models in SAS?

E Margaret Warton, Kaiser Permanente Division of Research, Oakland, CA

Melissa M Parker, Kaiser Permanente Division of Research, Oakland, CA

Andrew J Karter, Kaiser Permanente Division of Research, Oakland, CA

ABSTRACT

Long a mainstay in econometrics research, difference-in-differences (D-I-D) models have only recently become more commonly used in health services and epidemiologic research. D-I-D study designs are quasi-experimental, can be used with retrospective observational data, and do not require exposure randomization. This study design estimates the difference in pre-post changes in an outcome comparing an exposed group to an unexposed (reference) group. The outcome change in the unexposed group estimates the expected change in the exposed group had the group been, counterfactually, unexposed. By subtracting this change from the change in the exposed group (the "difference in differences"), the effects of background secular trends are removed. In the basic D-I-D model, each subject serves as his or her own control, removing confounding by known and unknown individual factors associated with the outcome of interest. Thus, the D-I-D generates a causal estimate of the change in an outcome associated with the initiation of the exposure of interest while controlling for biases due to secular trends and confounding. A basic repeated-measures generalized linear model provides estimates of population-average slopes between two time points for the exposed and unexposed groups and tests whether the slopes differ by including an interaction term between the time and exposure variables. In this paper, we illustrate the concepts behind the basic D-I-D model and present SAS? code for running these models. We include a brief discussion of more advanced D-I-D methods and present an example of a real-world analysis using data from a study on the impact of introducing a value-based insurance design (VBID) medication plan at Kaiser Permanente Northern California on change in medication adherence.

INTRODUCTION

Difference-in-differences (D-I-D) methods have been used in the field of econometrics for several decades but have only recently become more widely used in the fields of epidemiology and health research. D-I-D analysis is a quasiexperimental design used in the study of longitudinal cohort data with pre- and post-exposure repeated measures. It allows the comparison of changes over time in an outcome between exposed and control groups while accounting for changes in secular trends and controlling for both measured and unmeasured confounding. Because the design forces adherence to time-ordering in exposure and outcome measures, estimates from D-I-D models can be interpreted causally. Simple D-I-D models can be used effectively when data are available from a longitudinal pre/post cohort design. Either prospective or retrospective data collection is possible, so long as the timing of measurements is known. In this paper, we explain the fundamental D-I-D study design and illustrate a basic analysis using SAS? , specifically the GLM and MIXED procedures that allow accounting for repeated measures.

THE D-I-D DESIGN OVERVIEW

The D-I-D design is conceptually simple: measure the change in an outcome between the pre and post periods for an exposed group and a control group, then subtract one from the other to see the "difference in the differences" between the groups. In other words, the most basic D-I-D study is a "pre-post" design that compares the changes between two groups over two time points.

EXPOSED VERSUS UNEXPOSED

In order to use the D-I-D analytic approach, a longitudinal cohort is divided into at least two groups: subjects exposed and unexposed to the condition or treatment of interest. Outcome measures must be available for members of both groups before and after a time point at which exposure occurs for the exposed group. While the time points do not have to be specified calendar dates or even the same for each subject, this timing is the simplest way to create the pre-post longitudinal study. For this reason, this study design is very useful for measuring the results of programs, policies or protocols that are implemented at a specific time and are applied to a subgroup within a population. As with any study design requiring an unexposed comparison group, the identification of an appropriate control group is key. The control group should be as similar as possible to the exposed group, observed over the same period of time, and hopefully differing only in the exposure.

PRE AND POST MEASURES

In its simplest form, the only data required for a D-I-D analysis are the exposure flag, the outcome measures,

1

How D-I-D you do that? Basic Difference-in-Differences Models in SAS?, continued

identified as pre or post, and an identifier variable for each individual. In situations where the predicted outcomes should take account of the various population characteristics (age and sex, for example), these variables can be included in the model and then used to adjust predicted values. The simplest D-I-D models are used with continuous outcomes, as changes in continuous outcomes are more easily interpreted. D-I-D models can be used with binary outcomes, although the interpretation for binary outcomes is a little more complicated. This paper will focus on continuous outcomes.

CAUSAL INFERENCE The pre-post design maintains the time-ordering of events, an important aspect of the design that meets the basic requirement of any model that can be interpreted causally. In addition, in a D-I-D analysis, each subject serves as its own control: the characteristics of each subject that remain the same in both periods therefore cannot be confounders. This is true whether or not those characteristics are measured, so results from the D-I-D model account for both measured and unmeasured confounders. Including the unexposed control group in the model adjusts for underlying temporal trends in the outcome, thus differences between changes in the exposed group and the unexposed group represent changes specifically due to the exposure. For these reasons, the results from a D-I-D analysis can be interpreted causally, making it an ideal design for pre-post analyses in observational studies.

EXAMPLE STUDY BACKGROUND Healthcare costs have been rising rapidly in the United States for many years. In the past, nearly all health care plans provided by Kaiser Permanente of Northern California (KPNC) had no deductible and low co-pays. However, during the past decade, employers and individuals have begun to purchase health insurance with a deductible and higher co-pays to reduce premium costs. To counter increasing out-of-pocket costs, new benefit programs have been introduced to reduce costs for effective treatments that have been shown to improve patient health. The theory is that encouraging use of these preventive treatments will lower the cost of providing care over time. Plans like these that are built into the insurance policy are known as Value-Based Insurance Design (VBID) benefits. One important healthcare area experiencing rising costs is prescription medications. As costs increase for medications, patients may take lower doses than recommended or stop taking a medication altogether. These behaviors are often measured by medication adherence: the percentage of time over a given period during which a patient has adequate medication available, usually based on prescription refill data. In 2013, KPNC began offering a VBID pharmacy benefit option (VBID Rx) to provide certain prescription medications for free, including drugs to treat high cholesterol, diabetes, and hypertension. Using this potential natural experiment, we wanted to know if the VBID medication benefit improved adherence to medications for chronic conditions among patients with a deductible plan(Reed, Mary. 2016). However, we did not have a large enough sample of patients who were on a deductible plan and had the VBID medication plan added later. Instead, we identified a cohort of patients on a non-deductible plan in 2013 whose employers switched to a deductible plan at the beginning of 2014. Our comparison cohort consisted of patients on a non-deductible plan in 2013 who switched to a deductible plan with no VBID benefit in 2014. We then compared medication adherence among those with and without the VBID benefit in 2013 and 2014. With the D-I-D design, we could identify changes in medication adherence due to the VBID plan while simultaneously removing the influence of the underlying change to a deductible plan and adjusting for confounding, measured or not, at the subject level. Figure 1 illustrates the study timing, cohort composition and sample sizes.

Figure 1. Cohort Description for Difference-in-Differences Study of VBID Medication Plan Implementation and Medication Adherence

2

How D-I-D you do that? Basic Difference-in-Differences Models in SAS?, continued

THE D-I-D STUDY DESIGN IN DETAIL

A graphical illustration can be helpful in understanding the D-I-D study design. In Figure 2, A1 and A2 indicate the mean medication adherence values of the outcome at the pre and post time periods, respectively, in the unexposed group. Similarly, B1 and B2 represent the change in pre and post adherence for the exposed group. The change in the unexposed group over time is represented by the difference in height between the pre and post mean outcomes, the dashed line. Since the measurement points are a year apart, the slope of this line represents the annual rate of change and is interpreted as the background secular trend in the outcome over time in a group not affected by the exposure. Similarly, the slope of the solid line indicates the change in the medication adherence between the pre and post periods among those who experienced the exposure. Subtracting the change in the unexposed (control) group from the change in the exposed group, or vice versa, provides an estimate of the effect of the exposure, adjusted for background trends.

Figure 2. Graphic Representation of the Difference-in-Differences Design EQUATION FOR A BASIC DIFFERENCE-IN-DIFFERENCES MODEL In Figure 2, the slopes of the lines for the exposed and unexposed groups are allowed to vary, which implies that the underlying model will need to include an interaction term. The basic equation for the D-I-D model is

it = 0 + post*Post + exp*Exposure + interaction*Post*Exposure + it

where ij is the expected mean value for subject i at time t, Post is a binary indicator that the outcome measurement was made in the post period, Exposure is a binary indicator that the subject is in the exposure group during the post period, and it is the error term for the outcome measure of subject i at time t . As usual, errors are assumed to be normally distributed with a mean of zero. Note that this model equation includes only the outcome, time, and exposure measures ? it includes no other subject-level measures. Using this equation along with the figure, we can show that the coefficient on the interaction term alone provides the estimate and inference of the difference-in-

3

How D-I-D you do that? Basic Difference-in-Differences Models in SAS?, continued differences. From Figure 2, the Difference-in-Differences estimate (ignoring error terms) is

A - B = (0 ? [0 + post ]) - ([0 + exp] ? [0 + post+ exp + interaction ]) = (0 - 0 -post) - (0 + exp - 0 - post- exp - interaction ) = (-post) - (- post- interaction ) = interaction

In Figure 3, the parameter 0 represents the intercept, the mean medication adherence in the non-VBID group at the 2013 measurement. post is the change in medication adherence in the non-VBID group between 2013 and 2014. The coefficient exp indicates the difference in adherence between the VBID and the non-VBID groups in 2013, while interaction measures the difference in slopes between the two groups: it is a direct measure of the difference-indifferences between the two groups. In the model results, if this coefficient estimate is statistically significant, it is likely the slopes in the two groups are not parallel, and so the exposure has affected the outcome in the exposed group differently than the underlying background trend, as captured by the unexposed group.

Figure 3. Graphic Representation of the Difference-in-Differences Design with Model Coefficients 4

How D-I-D you do that? Basic Difference-in-Differences Models in SAS?, continued

D-I-D MODELING METHODS Because the D-I-D model is a repeated measures design, outcome values for a given subject are assumed to be correlated, while the outcome values between subjects are assumed independent. To account for the correlation within subjects, any model used will need to deal with repeated measures as well as providing results for comparisons between the exposure groups.

Repeated Measures ANOVA (The GLM Procedure) One method for analyzing D-I-D studies is repeated measures ANOVA using PROC GLM in SAS. This method accounts for correlation within subjects and provides the mean outcome values in each exposure group at each time period. It also determines the statistical significance of the interaction term and can compare the differences in the mean outcome values among the time/exposure groups. However, being an ANOVA, it does not generate model coefficients or allow for the generation of individual predicted values; it simply tests whether the mean values between the groups are different. This means that the mean outcome values from the ANOVA are based on the characteristics of a specific cohort and are not easily generalizable to other populations with different characteristics.

Repeated Measures Linear Regression Model (The MIXED Procedure) A more flexible analytic approach is the repeated measures linear model using PROC MIXED(Wolfinger, Russ and Chang, Ming). As in repeated measures ANOVA, this method tests the significance of the interaction term while accounting for correlation between measures. Unlike the ANOVA model, however, this method provides estimates model parameters and can be used to create predicted outcome values. Like the ANOVA model, an interaction term between time and exposure in the model allows the slopes of the exposure groups to differ. The LSMEANS and ESTIMATE statements can be used to generate summary predicted values for different subgroups within the analytic cohort. In addition, the model coefficients can be used to predict mean outcome values in the full cohort under different counterfactual scenarios. For example, to estimate the mean adherence in the full cohort during the pre period assuming no exposure in the post period, keep only the POST=0 records, set EXPOSED=0 for all cohort members, and generate predicted values using the model coefficients. Repeating this for all time periods and exposures can provide more realistic estimates of the outcome than the methodology used by the LSMEANS statement.

Using Propensity Score In many situations, it is desirable to analyze a cohort composed of exposure groups with very similar characteristics, as would be the case in a cohort created for a randomized control trial. As in other causal observational methods, this similarity of distributions may be achieved by first modeling the probability of being in the exposed group using the available measured characteristics for the cohort. These probabilities, the "propensity" to be exposed, are then used to weight the cohort, hopefully resulting in similar distributions of potential confounders in the exposed and unexposed groups.

THE D-I-D STUDY DESIGN IN ACTION

DO VBID DRUG BENEFITS AFFECT MEDICATION ADHERENCE IN PATIENTS WITH CHRONIC CONDITIONS FACING INCREASED DEDUCTIBLE COSTS? As a first step in any D-I-D analysis, it is helpful to plot a small sample of pre and post outcome values, grouped by the exposure variable to get a feel for the overall data. A small random sample of 30 individuals from the analytic cohort is shown in Figure 4. From this plot, it is clear that the majority of the cohort had relatively high adherence in 2013 and that most patients had only small alterations in adherence during the study period. The overall impression is that those in the exposed group, those with the VBID Rx plan, may be increasing adherence slightly, while those without the plan are more likely to remain stable or decrease slightly.

5

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download