An information resource for aging surveys



A. Specific Aims

The overarching goal of the current joint project between RAND and the University of Michigan, “Internet Interviewing and the HRS,” has been to inform the Health and Retirement Study (HRS) of the potential of Internet interviewing of its respondents. We have conducted Internet interviews of a subsample of HRS respondents and have set up a separate Internet panel (the “American Life Panel”, ALP) of approximately 1,500 respondents over 40. The ALP has been used to compare telephone interviewing and Internet interviewing; recently, we provided non-Internet users with Web TVs, which allow them to access the Internet by TV and a telephone line. Both the HRS Internet interviews and the ALP have been used to conduct a large number of pilot experiments dealing with mode effects, selectivity, new ways of measuring beliefs and preferences (including probabilities of several events and preferences for different Medicare Part D plans), measurement of health and health histories, visual presentation of tasks, and measurement of objective quantities such as wealth and consumption.

Based on these results, we propose a competing renewal to further integrate Internet interviewing into the tool box of the HRS and conduct a substantial number of new substantive experiments that will not only be specifically helpful in designing the content of the HRS interviews but also have potential applications in other surveys. These experiments will be aimed at measurement of preferences, expectations, cognition, health and well-being, as well as at supplementing current interview modes (e.g., CATI, Mail, CAPI) by Internet interviewing, high-frequency interviewing, and event-related interviewing.

We expect that experiments will lead to knowledge about how the increased use of high-quality and cost-effective Internet surveys can enhance and supplement HRS data collection.

Specifically our aims are as follows:

1. To contribute to the long-run improvement in the HRS instrument by designing and fielding new experiments in questionnaire and question design and measurement methods, exploiting the potential of Internet interviewing, These activities will include

a. Experimental measurement of variables such as health, consumption, well-being, expectations, and preferences, among others, with special attention to cognitive functioning.

b. High frequency interviewing to measure expectations with respect to stock markets, house prices, or health for example.

c. Use of mixed-mode designs combining Internet and mail surveys to determine optimal approaches to maximizing response rates and minimizing costs.

d. Hypothetical choice experiments related to important policy issues that are important for middle-aged and older Americans, such as Medicare Part D, retirement decisions, and savings and consumption.

e. Choice experiments with real pay-offs to increase understanding of decision-making processes, expectations formation, and preferences.

2. To develop an Internet version of the HRS that could be administered as substitute for core HRS. This would involve

a. Developing HRS-core modules that could be administered in successive sessions

b. Testing these modules and piloting new content with the ALP

c. Identifying mode and selection effects and developing ways to eliminate or correct for them.

3. To develop Internet surveying to supplement HRS, such as follow-up interviewing and targeted interviewing surrounding retirement, health events, and so forth.

4. To expand the use of Web TVs and explore other ways of giving Internet access to respondents lacking access, and to further work on selectivity and propensity scoring, with the aim of producing datasets with sample weights that make them representative of the general population.

5. To make the data generated by these activities available to the research community.

B. Background and Significance

B.1. General Context

When we started the current project, several methodological issues needed addressing. In addition, we aimed to conduct experiments exploiting the advantages of the Internet as a data collection environment. We will discuss these in our progress report (Section C.1). The proposed project aims to further integrate Internet interviewing into the toolbox of HRS and to take advantage of the unique possibilities of Internet to carry out new studies that help us better understand health and retirement issues.

We discussed major aspects of Internet data collection in the Background and Significance section of the proposal that led to the current project. Here, we mainly update issues pertaining to data collection over the Internet, mostly based on recent studies. Background and significance specifically relevant to each of the proposed new experiments is mainly given in Section D.

B.2. Collection of data over the Internet

Overall, the web survey literature describes a substantial body of research on dimensions like data quality - in particular about response rates, timeliness, cost, selection, and mode effects. The literature about access gaps in the context of web surveys is sparse.

B.2.1 Internet Access and response rates

Internet access in the U.S. is increasing at a steady pace. In April 2006, 73% of the US population used Internet (), up from 66% in January 2005. About 42% of Americans have now broadband connections at home. By contrast, telephone coverage is about 94% (Fricker et al, 2005). Americans without Internet access are more likely to be poor, poorly educated, elderly, black or Hispanic than people with Internet access. See for instance Couper et al. (2006), Robinson et al. (2003), and Bimber (2000).

Recruiting web survey participants. People without Internet access cannot participate in web surveys, leading to coverage error. Two approaches have emerged in the quest to represent the full population and not only those with Internet access. One solution is to draw a probability sample in a traditional way (e.g. using random digit dialing (RDD)) and provide Internet access to those who do not have it. Several panels (Knowledge Networks and the American Life Panel in the U.S., CentERpanel in the Netherlands) have given respondents without Internet access devices that allow them to complete surveys using their TV screen as a monitor and to hook up to the Internet using a telephone line. Because of the costs of supplying hardware, this approach is most effective in the case of a panel (in contrast to a cross-section).

The second approach does not rely on a probability sample. Instead, a volunteer panel is used and adjusted for selectivity afterwards by reweighting. Harris Interactive has a volunteer panel of 6 million people. Participants are recruited from Harris Interactive’s web site, from banner ads and other advertising. Harris Interactive conducts monthly RDD phone surveys containing a set of so called webographic questions, which are used to calibrate the web survey with the phone survey. See Schonlau et al. (2004) for more details. Other companies that have taken this approach, but without or with a limited adjustment for selectivity, include the NPD Group with 3 Million panelists globally and Carbonview research with 25,000 panelists in the US. Unusually for a web survey company, Carbonview recruits web survey respondents by approaching shoppers at malls (Enright, 2006). Other well-known panels include NFO research () and Greenfield online ().

Development of a standard contact methodology. When the Web survey is offered as part of a mixed mode strategy, an effective means for raising response rates is to first offer one mode and to offer non-respondents to take the interview using a different mode. (e.g. Dillman et al., 2001). When a single response mode is offered, respondents can be confused if pre-notification and reminder contacts are in a different mode. In a randomized experiment Schaefer and Dillman (1998) studied various combinations of mail and e-mail pre-notification, survey implementation and follow-up contact. Their survey was attached as an e-mail attachment. They achieve similar response rates when respondents are only contacted by e-mail as when they are only contacted by mail. The response rates were lower for e-mail surveys that had a mail pre-notification or a mail follow-up contact.

Unlike in Dillman’s “Total Survey Design strategy” for mail and phone surveys personalization of contact e-mails may not improve response rates, presumably because respondents are already inundated with personalized junk mail. Explicitly telling the respondent that he/she is part of a small group chosen for the survey may help the response, especially when combined with a deadline (Porter and Whitcomb, 2003).

Timeliness. Response speed is generally higher than in mail surveys (e.g. Tuten et al, 2002). Harris Interactive, for example, often keeps surveys open for only a few days. Carbonview Research receives 80% of the eventual responses within 3 days and a “full” response at around 12 days (Enright, 2006). However, respondents who respond after 3 days may be important because they use the Internet less often and may be more representative of the offline population.

B.2.2. Data quality

Response rates (Unit Nonresponse).The literature on response rates is inconclusive, mostly because response rates are often not directly comparable. Response rates are affected by the contact mode. For example, if respondents are contacted by mail and given the option to reply either by mail or on the Web they will tend to favor the mail option even when that option is only offered to them later (Schonlau, 2003).

Kapplowitz et al (2004) in an experimental study in a university setting find that a Web survey with a postcard pre-notification achieves almost the same response rate as a mail survey with a postcard pre-notification. However, in both cases the response rate is only about 30%.

In a survey of university and community educators Kiernan et al (2005) find a higher response rate over the Web (95%) than by mail (79%). In this case, all respondents were initially contacted by e-mail. Web survey participants provided longer and more substantive responses to quantitative questions.

Parks et al (2006) randomly assign college women to a telephone or a Web survey for collecting data on alcohol use and alcohol-related victimization. The response rate of Web survey respondents was higher (60%) than for the telephone survey (45.7%).

Fricker et al (2005) achieve a much higher response rate using phone interviews (97.5%) than using Web surveys (51.6%). In this study all respondents were initially contacted by phone and passed a screener.

Item non response. Web surveys may have somewhat lower item nonresponse rates than other modes. There is strong evidence that they tend to generate longer answers in open ended questions. Tuten et al (2002) review several papers on data quality issues. In a study of alcohol use, Link and Mokdad (2005) report very low item non-response among telephone and Web survey respondents compared to mail survey respondents. Fricker et al (2005) find less item non-response in a Web survey than in a telephone survey. Schaefer and Dillman (1998) find that their e-mail questionnaire had a higher completion rate (69%) than their identical mail survey (57%). Open-ended questions from the e-mail version contained an average of 40 words compared to 10 words for the paper version.

Mode effects. The survey mode (web survey, phone survey, mail survey) may affect the responses. In auditory modes (phone) there may be a tendency to choose the last response option for questions for which the respondent does not have a pre-formed opinion (recency effect). In visual modes (web survey, mail survey) the respondent may have a tendency to choose the first response option (primacy effect). The respondent may be satisfied with the first response option without evaluating later options. Knauper (1999) finds that recency effects are exacerbated by older age. For example, in a question about whether divorce should be easier or more difficult to obtain she finds that the reordering of response options unveils a recency effect of only 5% among 18-54 year old respondents but 36% among more than 70 year old respondents. In a randomized study of 15-year-old schools children on smoking Denscombe (2006) finds that there is little evidence of a mode effect between the web-based and paper-based questionnaires.

Time to complete the survey. Parks et al (2006) found that Web survey participants took slightly longer than respondents who did the same survey by phone (medians of 19.5 minutes versus 18.8 minutes). Fricker et al (2005) also reported that Web survey respondents took longer than phone survey respondents (21.6 minutes versus 20.8 minutes). The longer completion time is consistent with longer answers in open ended questions and less item nonresponse. Chang and Krosnick (2003b) compared both modes in a laboratory setting where subjects were randomly assigned to either self-administered interviews with a computer or to oral interviewing over an intercom system. They find completion time in the computer mode to be significantly less than in the intercom mode (17.3 minutes versus 26.6 minutes).

Social Desirability. Like mail surveys, Web surveys do not require the presence of an interviewer. For sensitive questions the presence of an interview might prompt a respondent to give responses that are socially more acceptable, thus distorting the answers towards socially desirable responses. See Parks et al (2006), Rietz and Wahl (1999), Chang and Krosnick (2002a) .

B.2.3. Web Survey Design for Elderly Respondents

Rogers and Badre (2002) review web site design for older adults. Design recommendations for older adults include the use of sans serif fonts (e.g. Arial), a font size of 14 points for body text and 18-24 points for headers, avoiding italics and avoiding the use of all capital letters.

Web sites for older adults should not require users to distinguish between colors of similar hue (especially in the violet, blue, and green range). The text and background on websites should be of high contrast (e.g. black on white background) rather than low (e.g. black on blue background). Informational cues like color or highlighting are especially helpful for older adults.

All in all, neither the literature nor our own project provides evidence that the quality of the answers in Internet surveys is lower than with other interview modes (mail, phone, or personal interviewing). On the contrary, there is some evidence that Internet interviewing does better in certain respects, e.g. avoiding social desirability and triggering longer and more complete open-ended answers. On the other hand, selection remains a concern in the near future, even though Internet penetration has been increasing steadily in the past decade. Particularly for the elderly respondents, Internet access can be expected to be far from complete, also in the next ten years. Providing non-Internet users access to the Internet using a web-TV or some other device is a promising development that needs to be explored further, and this is part of the proposed work. Still, it seems reasonable to expect that non-response among some groups may remain substantial, and research on reweighting, matching, and other ways to correct for unit non-response will remain necessary.

C. Preliminary Studies

Since this is a renewal application, we limit the overview of preliminary studies to a progress report of the (roughly) first four years of the current project.

C.1. Progress Report

This progress report is organized as follows. After describing the project organization, we discuss the main findings. In doing so, we distinguish two broad (and somewhat overlapping) domains. First, we consider studies that shed light on the properties of Internet interviewing vis-à-vis other modes under the heading “measurement and design.” This includes selectivity, reweighting, and mode effects. Next, we pay attention to a number of substantive topics for which the Internet is a particularly effective survey mode under the heading “content.”

C.1.1. Surveys and Organization

The original proposal for the project anticipated two Internet interviews with a subset of HRS respondents, eight interviews with respondents in our own special-purpose Internet panel, and four interviews with a telephone control group (the so-called CATI sample). The non-HRS sample is obtained from the University of Michigan’s Survey Research Center (SRC) respondents to the Monthly Survey (MS).

Anticipated sample sizes were 500 for the telephone interviews (which we will denote as CATI1, CATI2, etc.) and 1000 for the Internet interviews (which are denoted as MS1, MS2, etc.). For HRS, we planned a gross sample of 2,800 for the first Internet interview and 1,400 for the second one. Currently (end of October 2006), the following interviews have been conducted.

1. HRS Internet 1. This one was completed in 2003. Total number of observations: 2,180, response rate 80.2%.

2. HRS Internet 2. Because we wanted to monitor the progress of the Medicare Part D drug plan, we divided HRS Internet 2 into two phases. Phase 1 went in the field in February 2006 and measures respondents’ experience with the initial insurance offerings. We obtained 1338 observations (70.0% response rate). We are planning a phase 2 to be administered in early 2007 to take advantage of the period in which Medicare Part D beneficiaries can change plans.

3. CATI1. Started in 2003. 616 observations at this moment. This amounts to a response rate of 84.4 percent if we exclude as ineligible potential respondents who are deceased, ill, or for whom English as interview language poses a problem.

4. CATI2: Started in 2005. We have not contacted all respondents yet who have completed CATI1. At the moment, there are 403 respondents. This amounts to a response rate of 87.4 percent if we apply the same exclusions as before for respondents who are ill or deceased. If we exclude no contacts from the base (some of these may still be contacted) the response rate rises to 93.7 percent. If we also exclude respondents for whom an appointment for an interview is pending, the response rate rises to 98 percent.

As noted above, the original plan was to have four CATI waves. It turned out that most of the mode effect issues we wanted to study could be covered by the first two telephone interviews. In view of the limited use for further CATI interviews and the cost and operational advantages of having everyone on the Internet, that plan has been adjusted by offering CATI households without Internet access a Web TV, using the funds from other grants to create an American Life Panel (ALP[1]) ( ). This set-up follows similar set-ups of CentERdata ( ) and of Knowledge Networks (). In addition we have simplified the funding of the panel, by costing it at a fixed price per interviewee minute[2].

Of the 401 CATI households we have approached with the offer of providing them with a Web TV, 102 have agreed to join the ALP[3]. We expect to reach the target number of respondents to CATI2 in November 2006. After that, MS respondents without Internet access will be asked directly if they are interested in joining the ALP and receiving free Internet access. The current situation with respect to the various planned surveys is as follows:

1. MS1: Started in 2003 and we are still adding respondents when they become available from the Monthly Survey. Total number of observations as of now: 1,073

2. MS2: Started in 2004 and we are still adding respondents who have finished MS1. Current number of observations: 692

3. MS3: Started in 2005. Current number of observations: 536

4. MS4: Started in 2006. Current number of observations: 468

5. MS5: Started in May 2006. Current number of observations: 239

6. MS6: Started in August 2006. Current number of observations: 182

7. MS7: Early November 2006. Material consists of versions of the Day Reconstruction Method (DRM) pioneered by Kahneman, Krueger, Schkade, Schwarz, and Stone (2004).

8. MS8: Early December 2006. Material for MS8 consists of a questionnaire about the role of default options and Social Security Expectations.

Calculation of response rates is a bit complicated for each of these surveys, since we are still adding respondents to each of them. Compared to the original plans, the interviews have gone into the field later, partly because MS sample built up more slowly than anticipated. Respondents are given surveys in chronological order, so first MS1 hits its target, next MS2, etc.[4] One way to calculate response rates is to consider the number of respondents who are eligible for a survey and who have been invited (and are technically capable[5]) to do a survey. This leads to the following response rates: MS2, 73 percent; MS3, 81.6 percent; MS4, 67.3 percent; MS5, 64.9 percent; MS6, 84.1 percent. These are not final response rates; we expect each of the response rates to increase substantially, in line with the response rates for the CATI interviews.

Although the proposal for the current project anticipated a total of 8 interviews, we will add more interviews because some content of the Internet questionnaires so far overlaps with topics funded from different grants (including a supplement to study behavioral consequences of the introduction of Medicare Part D, and a grant for a Roybal Center for Economic Decision Making, which concentrates on decision making related to financial well-being in retirement). In particular, we expect to do more work on the effect of screen layout and visual displays as part of the current grant.

The data collected through the ALP are made available to the research community as public-use files, to be downloaded from the Web, subject to registration as a user (). The data collected from HRS respondents are made available to the research community as public-use files, to be downloaded from the Web, subject to registration as a user ().

C.1.2. Measurement and Design

C.1.2.1. Self-selection into Internet Samples and Weighting

Based on data from the first HRS Internet Interview conducted in 2003, several members of the research group wrote papers on self-selection into the Internet samples and on statistical methods correcting for the resulting sample selection bias based on propensity score weighting or variants of that approach.

Both Couper et al. (2006) and Weir (2004) examined several issues critical to the potential integration of Internet interviewing with other modes in HRS. Selectivity was studied with respect to having Internet access, expressing willingness to participate in an Internet interview when asked during 2002 core interviews, and participating in the Internet interview when asked. Internet access was positively related to education, cognitive abilities, and health and negatively related to minority race and age. These factors were generally less important for predicting who would participate conditional on having access. This suggests that the overall selectivity of Internet interviews should decline as Internet access becomes more widespread. Respondents who were more cooperative with core interview requests (no resistance, fewer telephone calls) were slightly more likely to have Internet access but considerably more likely to participate conditional on access. Couper et al. (2006) find that disparities in health and socioeconomic status persist after controlling for demographic differences in coverage and response. Weighting on demographics alone is thus unlikely to yield a representative sample in such surveys. Similar work is under way using the respondents from the MS surveys.

Similarly, Schonlau et al. (2004b) and Schonlau et al. (2006) investigated propensity scoring and matching as methods for dealing with selection bias in Web surveys. They find that this adjustment generally improves the estimation of the distribution of the target variables but does not completely eliminate bias.

It is not obvious what variables should be used to adjust for the differences between the Web survey respondents and the general population. Schonlau et al. (forthcoming) investigate the effect of selected webographic variables (cf. Section B.2.1), as suggested by Harris Interactive. They find that demographic variables are generally more important than webographic variables in discriminating between membership in a Web sample and a reference sample. However, webographic variables inform sample membership above and beyond demographic variables and are therefore useful.

C.1.2.2. Mode effects

Weir (2004) examined mode effects for three types of content: objective self-reports of health, subjective measures of affect, and evaluative measures of numeric ability. This was done by direct comparison of matched individuals responding to both the Internet survey and the HRS core interviews. The patterns of mode effects between Internet and interviewer-administered modes mirror what has been found for self-administered written questionnaires versus interviewer-administered modes. For objective reports of health, there were no significant mode effects when adjusted for trends between the 2002 and 2004 interviewer-administered modes. Consistent with the literature, subjective affect measures of loneliness and mastery did show evidence of mode effects, with greater proportions acknowledging negative affect over the Internet. Respondents scored significantly higher on evaluative measures of numeric ability over the Internet than they did in the core survey, most likely because of a combination of taking more time to think about the answer and of being able to use a calculator or computer to assist.

McFadden, Schwarz, and Winter (2006) analyze order effects in Internet and CATI surveys. The theoretical prediction is that in an auditory survey, a recency effect should be found (i.e., the last response option should be selected more often by respondents), while in a visual format—such as an Internet survey—there should not be any order effects in questions with just a few alternatives. The experiment used questions taken from the HRS Consumption and Activities Mail-Out Survey (CAMS) 2001. Respondents were asked what they would do with a 20 percent cash windfall, with three response options: “save all”, “save some and spend some”, and “spend all.” Confirming theoretical predictions, a small, but significant recency effect is found in the data from the CATI study, while no order effects are found in data from the Internet study.

Dominitz, Hurd, and Kapteyn (2006) analyze the subjective probabilities of survival that have been reported in the HRS since 1992 and that were also reported by respondents to the HRS 2003 Internet interview. While there exist clear, systematic differences between the distribution of responses in the Internet interviews and in other HRS interviews, these differences appear to arise from sample selection rather than mode effects. Respondents with Internet access in 2002 tend to report higher survival probabilities and to less frequently choose focal responses than do those without access. Further, survival probabilities reported by those with Internet access tend to be more strongly correlated across interviews. These differences are more notable in the younger cohort of War Baby (WB) respondents than in the HRS cohort, where Internet access is less prevalent and measures of numeracy seem to play a larger role in predicting intrapersonal variation in reported probabilities.

We have fielded several modules on assets in Internet surveys and the CATI control sample and compared the data with data from the regular HRS surveys. Kapteyn et al. (2004) compared ownership patterns and the distribution of amounts held for these three assets in HRS 2002 and HRS Internet 2003. As a robustness check, they also considered the first Internet interview in MS1, in 2003, and the corresponding phone interviews (CATI1, in 2004). As expected, large selection effects were found for all three assets: Respondents participating in the Internet panel (who all had Internet access) more often hold each of the assets and hold higher amounts than other HRS participants. More surprisingly, the authors also found large discrepancies between HRS 2002 and HRS Internet 2003 when controlling for selection by only considering respondents who participated in HRS Internet 2003: Both stock ownership rates and amounts held in checking and saving accounts reported in HRS Internet 2003 were much higher than in HRS 2002. No discrepancies were found for primary residence. Similar discrepancies were found between MS1 and CATI1, but selectivity in these samples could not be controlled for.

To investigate these discrepancies, questions on checking and saving accounts and stocks were also included in HRS Internet wave 2 (in 2006); see Kapteyn and van Soest (2006). In contrast to the earlier findings, no large differences were found between the reports in the HRS Internet interview and the reports in HRS 2004 for the same people. A possible explanation is that the questions in HRS Internet 2003 (and MS1) did not specify that all the other asset categories explicitly asked in HRS 2002 should be excluded. A phrase stating that this should be done was added in HRS Internet 2006. Further experiments with the second phase of HRS Internet wave 2 should help to investigate whether this explanation is indeed correct. If the explanation holds, it appears that mode effects do not play an important role in the measurement of assets.

In household surveys, quantities such as income, consumption, or wealth, or components of these quantities, are frequently elicited not by using open-end question formats but by giving respondents categorized response options such as range cards. Results from social psychology and survey research indicate that the choice of bracket values may influence responses. In research conducted in parallel to the HRS Internet Interviewing project, Winter (2002) collected data using controlled experiments administered in another Internet survey (the Dutch CentER Panel). He confirmed that bracketing effects arise in consumption questions with range-card response formats and that the resulting biases are of economically relevant magnitude.

Dominitz and Hung (2006) investigate differences in financial literacy, conditional on Internet usage. They find that HRS respondents with Internet access display higher levels of financial literacy than respondents without Internet access. Furthermore, younger cohorts display higher rates of financial literacy. Lusardi and Mitchell (2005) find that among HRS 2004 respondents only 34.3 percent of respondents correctly answered three questions related to stock risk, compound interest, and inflation. However, when Dominitz and Hung analyze responses using data from MS5, they find that 74.6 percent of respondents answer those same three questions correctly. Their analysis suggests that the difference in responses across the two surveys largely results from sample selection differences, but mode effects still remain. Which mode gives a more accurate picture of real-life decision-making is a worthy subject of future research.

C.1.3. Content

Both the ALP and the HRS Internet interviews have been used to explore a number of new issues in the measurement of preferences, beliefs, and cognitive functioning. We group the discussion by topic.

C.1.3.1. Measurement of Preference Parameters

Individual preferences are central to economic theories of behavior, yet there are few attempts to directly measure and study preferences. The graphics capability of the Internet greatly expands our ability to capture difficult concepts, such as time preference and intertemporal substitution.

To assess time preference and intertemporal substitution, respondents to the Internet survey answer one of three hypothetical question series on consumption before and after retirement. (See Kimball et al. (2006) for details.) In each series of four or five questions, we implicitly vary the interest rate and study its impact on an individual’s consumption path.

To infer preferences, we use the following model of consumption growth:

[pic] (1)

where r is the real interest rate, s is the elasticity of intertemporal substitution and ρ is the subjective discount rate. The three versions experiment with different ways to vary the interest rate and measure desired consumption.

The first Internet version with a discrete choice set is comparable to the 1999 HRS Mailout Survey, but it randomizes the presentation of choices and interest rates. Our analysis of the responses shows that the new question features do encourage more active decision making. For example, as the interest rate increases from 0 to 13.8 percent, half the Web respondents choose a steeper consumption path versus only one-quarter of the mail respondents. The estimates obtained are in line with Euler equation estimates based on the time series of aggregate consumption.

The other two versions of the questions use Java-enabled graphics to make the trade-offs between current and future consumption even more salient. At each interest rate, respondents use the mouse to actively form their desired consumption path. In addition to the final choice, these versions track the decision process as respondents manipulate the instrument.

C.1.3.2. Eliciting Subjective Probabilities and Expectations

Previous research finds that greater respondent uncertainty about subjective probabilities leads to more prevalent reports of focal values (e.g., 0, 50, or 100 percent chance) and is predictive of more conservative or risk-averse choices in savings and portfolio behavior (e.g., Lillard and Willis, 2001; Hill, Perry, and Willis, 2004). These analyses infer respondent uncertainty from HRS data on other phenomena. Delavande and Dominitz (2006) analyze direct measures of uncertainty about survival probabilities that were included in MS2. The measures are of two types—min-max and strength of beliefs. The min-max questions concern the lowest and highest possible values for the probability of survival to age 75 (or 85). The strength of belief questions, originally developed by Delavande (2005), concern the chance that the probability of survival to age 75 (or 85) exceeds x, for up to three values of x. Response rates to both types of questions are well over 90 percent. Responses generally appear to be coherent, especially in the Internet interviews. The strength of beliefs responses appear to vary more sensibly with individual attributes, survival horizon, and reported probabilities than do the min-max responses.

Delavande and Rohwedder (2006a) analyze the results of an experimental module in which respondents are asked to allocate 20 balls into 7 bins to reflect the percent chance that their SS benefits fall within a given interval. The motivation for this new format is to reduce respondents’ burden. The visual format is compared with the standard ‘percent chance’ format. Response rates and answers consistent with a probability distribution are higher with the visual format. Central tendencies and spreads of the distribution of beliefs are very similar across formats. Plausibly, respondents who are closer to expected claiming age, who have less wealth (and, thus, will rely more heavily on SS during retirement), who have lower expected probability of working past 62, and who have higher education (and, thus, are potentially better informed) tend to have distributions with smaller variance.

Delavande and Rohwedder (2006b) analyze data from two waves of the ALP on point estimates and probabilistic expectations about future claiming age and SS benefits, as well as about the probability of SS reforms. SS income is the largest source of income for about 65 percent of retired households. Thus, forming realistic expectations about future SS is central to successful retirement planning. Individuals are found to exhibit uncertainty about the age at which they will claim SS benefits: About 40 percent of the respondents provide a probability larger than 50 percent when asked about the chance to claim 2 years after their expected claiming age. Using a randomized experimental design, they compare respondents’ expectations about SS benefits conditional on claiming at the expected claiming age and unconditional on claiming age. Despite the uncertainty about claiming age described above, the conditional and unconditional distributions are strikingly similar. This could occur because respondents do not distinguish them conceptually or because they have little knowledge about the impact of claiming age on benefits.

C.1.3.3. Vignettes and Work Disability

Self-reports on whether people have an impairment or health problem that limits the amount or type of work they can do suggest large differences between countries and socioeconomic groups. The issue is whether these are genuine differences or differences in reporting styles. Vignettes— short descriptions of people with a certain work related health problem—can be used to identify this distinction. If respondents use different response scales, they will evaluate the same person differently.

We have fielded 15 identical vignettes with back pain problems, depression, and heart problems in the CentERpanel and in MS1. In the 51-64 age bracket, self-reported work disability in the Netherlands is estimated at 36 percent compared to 23 percent in the United States. The model using vignettes shows that if the Dutch would use the U.S. scales, their work disability would fall to 28 percent (a reduction of the difference by 60%). See Kapteyn, Smith, and van Soest (forthcoming, American Economic Review). Banks, Kapteyn, Smith, and van Soest (2005) emphasize the importance of pain in explaining work disability in the two countries.

Further research has focused on methodological issues with the vignettes, such as their sensitivity to order effects and the importance of how vignettes are worded. Another issue is how response scales are driven by reference group effects, particularly the number of friends and acquaintances on disability rolls. This has been studied using data from the CentERpanel by Van Soest, Kapteyn, Andreyeva, and Smith (2006) and similar questions are now in the field for the ALP. It is found that respondents who know more people on DI are significantly more likely to call a certain health condition work limiting than people who know fewer people on DI.

We have also designed and fielded vignettes on other issues, including several domains of general (not work-related) health (mobility, pain, sleep, cognition, depression, etc.) and satisfaction with several aspects of life (income, relations with family and friends, work or other daily activities, and life in general (“happiness”)). These questions are partly still in the field.

C.1.3.4. Collecting Consumption Data

The experience with the CAMS has proven the feasibility of collecting consumption data as part of a general-purpose survey. The instrument asked about 32 spending categories (6 big-ticket items and 26 non-durable items). Analyses of data quality show very low item non-response (almost exclusively single digit), population estimates of total spending that compare fairly closely to the Consumer Expenditure Survey (Hurd and Rohwedder, 2005a), and age patterns in spending that are consistent with age patterns of income and wealth change (Hurd and Rohwedder, 2006b).

One important innovation of the CAMS is to allow respondents to choose the reference period for which they report their spending in specific categories. For example, respondents can choose whether they prefer to report the dollar amount they spent last week, last month, or during the last 12 months. The resulting distribution of spending shows reasonable patterns overall but produces a few, yet fairly large outliers, apparently because respondents inadvertently enter the amount in the wrong periodicity field (e.g., a monthly amount in the field for amount spent “last week”). Hurd and Rohwedder (2006c) have conducted experiments over the Internet in the ALP to study the issue and to explore alternative designs. They found that the small number of large outliers is not specific to the particular design of CAMS, but that they occur in any self-administered format, unless specific procedures are put in place to address them during the interview. Instead, allowing respondents to choose the reference period has an important advantage over forced reference periods (see Section D.1.2).

C.1.3.5. Collecting Retrospective Histories

One of the great difficulties prominent longitudinal surveys like HRS, SHARE, and ELSA face concerns how to deal with “initial conditions”—the lives of respondents before the baseline year of a survey. Knowing respondents’ health or economic status beginning only at the first year of the survey may not be sufficient, since the entire prior histories of health and economic trajectories may matter for current decision making.

Analytically, the absence of any information on pre-baseline histories means that in practice researchers have had to rely heavily on an important untestable assumption—that baseline conditions sufficiently summarize individuals’ histories. If they do not, new events that unfold during the panel may simply be the delayed consequence of some prior part of an individual’s history. If they are indeed a delayed consequence, these events occurring within the panel cannot be used analytically to measure the effects of exogenous influences. Therefore, the value of information obtained from health histories may be great.

In the current R01, we conducted several experiments on collecting retrospective life information. One set of experiments dealt with collecting health histories during childhood. We included questions in a childhood retrospective health module that included the existence and timing of a list of the most prevalent and important childhood illnesses. This list was obtained by consultation with experts in the field and with the principal investigators of the 1946 British cohort study, with the primary criteria being that the childhood disease turned out to be important for later life outcomes of diseases.

We constructed childhood health histories using calendar life history (CLH) methods. These CLH created a time line in which dates are marked around significant salient life events. The specific markers used included markers for house moves, marital events of parents, and entry into different levels of schooling before age 17.

The results obtained were very promising. First, we compared prevalence rates of these diseases obtained retrospectively with the actual prevalence of the diseases recorded at the time these respondents were children. These matches were very close. Second, we were also able to test the presence of a major source of bias in collecting retrospective histories. One issue that arises is that individuals may “back cast” new health events during adulthood and attribute it to childhood health conditions. Besides being asked about childhood health in the HRS Internet panel, these respondents had previously been asked the same question six years earlier in the normal HRS survey. Revisions between these two reports six years apart were not related to the onset of new diseases between these two reports.

Based on the success of these experiments, a childhood health retrospective history is now being introduced into the PSID, ELSA, and SHARE. These childhood health retrospectives follow very closely the history we placed in the Internet panels.

D. Research Design and Methods

Central issues in the HRS are determinants of health, of saving for retirement, and of the timing of retirement. The likely determinants involve preferences, perceptions/beliefs (including expectations), objective circumstances, and the cognitive capabilities needed to make the often complicated choices that are required. These all need to be measured. The Internet offers unique possibilities to address a large number of measurement and substantive issues in a novel and cost-effective way. We have brought together a team representing the broad array of disciplines needed to tackle the measurement and substantive issues at hand.

We present the planned research for the next five years under the following broad headings:

1. Measurement and design: Under this heading we will (1) propose experiments that help us to improve estimation of objective quantities, in particular consumption, income, and wealth, (2) suggest approaches to monitor and improve representativity of HRS Internet and ALP with the purpose of facilitating generalizability of results to the population in cases where that is desirable, and (3) conduct experiments that improve comparability of response scales across respondents, using anchoring vignettes.

2. Preferences, beliefs, choice, and cognitive functioning: To understand decisions about health and well-being at older ages we need a good understanding of preferences, beliefs, choice, and cognitive functioning. The combined environment of ALP and HRS provides an ideal laboratory for this.

3. Integrating Internet Interviewing into the HRS: We outline a number of steps to make Internet interviewing an integral part of the HRS toolbox. This has two components: (1) exploiting the Internet to collect new types of information from HRS respondents, and (2) sketching an approach toward gradually developing a full Internet -based instrument for administering the HRS core interview to respondents, who would be willing (or who would prefer) to answer the HRS questionnaire over the Internet.

In broad overview, the activities during the new grant period will include the design of the ALP experiments, ongoing interviewing of the ALP panel, design of the HRS Internet interviews and fielding of the HRS Internet interviews. We anticipate approximately 1000 ALP interviews every two months about topics to be discussed in this section. We aim for some 3000 HRS Internet interviews on average per year. The topics and survey instrument will be informed by the ALP experiments.

D.1. Measurement and Design

D.1.1 Monitoring and Improving Representativity

Couper, Kapteyn, Schonlau, Van Soest

Internet penetration is increasing over time. Table 1 illustrates this for the HRS. The table presents the number of respondents who said they have Internet access in 2004 and 2006, respectively. The numbers for 2006 pertain to completed interviews up to October 10. We observe large differences across cohorts. For instance in 2004 11.9 percent of the AHEAD cohort (age 80 or older in 2004) report Internet access, in contrast to 57.5 percent for the Early Baby Boomers cohort (EBB; 51–56 in 2004). Within each cohort, we see a gradual increase in Internet penetration, typically by a couple of percent. The numbers for the older cohorts need to be interpreted with care, because these may partly reflect differential mortality between those with Internet access (who are more likely to be of higher SES) and others.

Table 1: Internet access among HRS cohorts

|Cohort (birth years) |2004 |2006 |

| |Number of obs. |Percent |Number of obs. |Percent |

|AHEAD (before 1924) |334 |11.9 |254 |12.6 |

|CODA (1924 – 1930) |385 |23.6 |327 |24.7 |

|HRS (1931 – 1941) |3142 |36.5 |2720 |37.4 |

|War Babies (1942 – 1947) |1216 |57.0 |948 |57.5 |

|EBB (1948 – 1953) |1776 |57.5 |1323 |61.1 |

|Total |6853 |37.5 |5608 |38.7 |

A reasonable forecast going forward is that Internet penetration in the HRS age range will grow further. For the general population, Internet penetration is now 73 percent (see Section B.1), so for younger cohorts it is even higher.

In the current project, we have performed a number of studies of selectivity and the efficacy of reweighting procedures in creating representative samples. As reported in Section C.1.2.1, reweighting with a basic set of demographics could not achieve full representativity with respect to all variables of interest. In the newly proposed project, we plan to monitor whether reweighting procedures become more effective remedies for non-representativity of HRS when Internet penetration increases. This will take the form of replicating our earlier propensity score adjustments for the HRS Internet respondents going forward, targeted at the applications envisaged in this proposal. It seems likely that reweighting will become more effective when Internet penetration increases, given that we have found (see Section C.1) that selectivity is mainly associated with whether a respondent has Internet access, not with willingness to do an interview conditional on Internet access.

The ALP aims at being representative of the American population over 40, partly by giving Web TVs to respondents without Internet access. The sample is stratified into those with and without Internet access, and Internet respondents are over-represented. To correct for this and to generalize the results of the experiments in ALP to the population of interest, one study focus will be on how much Web TV respondents are different from the other respondents, over and above differences that can be explained by demographics and perhaps webographics (See Sections B.2.1 and C.1.2.1). In addition, as any panel, ALP experiences unit non-response and attrition. We will use propensity scoring to correct for this, both using the available information on respondents and nonrespondents and comparing with an external source (i.e., CPS). We will also use the ALP to explore the causes and correlates of non-response and attrition in Internet panels. Very little is known about this, since most of the panels (except for CenterERpanel) are proprietary and little research is available on attrition and how to maintain high response rates.

For both HRS and ALP, we will provide weights that can be used in cases where population representativity is desired and (in the case of HRS) where the Internet interview is not accompanied by interviews using alternative modes. The weights will be updated regularly and the procedures used to construct them will be documented extensively. For HRS, the construction of weights will use the rich information that is available for all HRS respondents. For ALP, we will merge the dataset with microdata from CPS.

D.1.2. Internet Experiments to Inform Survey Methods for Collecting Consumption, Income and Wealth Data

Hurd, Rohwedder, Winter

Consumption, income and wealth are aggregates of their components. Although it is generally believed that a more accurate total is obtained by asking about many components rather than just a few, we do not know of any systematic study of the optimal number of components. We will conduct experiments where we vary the number of components of consumption, income, and wealth that we query and study the variation in the total. In particular we will investigate:

Level of Aggregation. Dedicated expenditure surveys such as the CEX ask about spending in several hundred different spending categories. This level of detail is not an option in a general-purpose survey. Instead respondents are queried about more aggregated categories of spending. CAMS wave 1 asked about 32 categories, including cues to help respondents think of what to include. We will design experiments to be administered in the ALP to study what is the optimal level of aggregation. Among survey specialists the view is that more categories will produce a larger total, and a larger total is likely to be closer to the true value because respondents are likely to omit spending on small categories. It is likely that increasing the number of categories will increase the total up to some point, but it is not obvious that this will continue. In fact the total could even decrease. For example, many respondents would not be able to recall how much they spent on peaches or on lettuce (or whether they purchased those items some days ago) whereas they may have a good idea of total spending at the grocery store. On the other hand, there could also be overshooting when the number of categories becomes large in that respondents may be tempted to affirm that they spent money on a particular item in the recall period, because it is something they usually purchase.

Length of Recall Period. The Consumer Expenditure Survey asks respondents about their spending over the last three months in the recall interviews, and administers diaries to obtain information about spending on high frequency purchases, such as food. The longer the recall period the larger the risk of misreporting amounts, both because the respondent may forget to include some purchases or may be confused whether a particular purchase was within the recall period. On the other hand, the shorter the recall period the higher the risk that infrequent purchases will not be captured. For example, asking about spending on home repairs last week will yield reports of zero dollars for most households. As a result an aggregate measure of spending on home repairs will be based on a small number of observations and tend to be inaccurate. Building on the insights obtained from our earlier experiments in the ALP, we will pilot designs that first elicit how often a household tends to spend money on a certain spending category (“every week,” “every month,” or “less often than every month”), and then adapt the question about the amount spent so that the recall period matches the reported spending frequency. Similar procedures will be followed with respect to income components.

Outliers. The predominant objective in collecting spending, wealth or income data is to obtain a measure of the total so as to permit household level analysis. Every single one of the elicited components may contain some outliers, implying an outlier for the total. The Internet has the possibility of prompting respondents when unusually high or unusually low values are encountered. We will test several strategies of identifying outliers during the interview combined with follow-up questions asking the respondents to verify the entered amount. These include range checks by component and showing aggregates and asking respondents if these are plausible.

Financing of big-ticket items. Purchases of vehicles and housing are difficult to ask about in a mail survey because of the many different ways that people may choose to finance these. Spending should only include the interest paid plus depreciation of the asset. Any payments on the principal of a mortgage or loan should be counted as saving. Therefore eliciting the monthly mortgage payment is not sufficient, because respondents will report the sum of interest and principal. With car purchases there is the additional complication of possible trade-ins so that the purchase price of the car may not reflect the net outlay. The Internet allows designing sufficiently sophisticated skip patterns to first determine the financing arrangements that a household has chosen and then to follow-up with the detailed questions that are adapted to the household’s specific situation.

D.1.3. Comparing Response Scales Using Anchoring Vignettes

Kapteyn, Smith, Van Soest

Internet panels are very effective for studying the properties of anchoring vignettes and for applying the vignette methodology to new substantive applications. For instance, the work disability vignettes discussed in Section C.1.1.3 were collected in three waves of the Dutch CentERpanel, in August, October, and December of 2003 and in the ALP as part of the current R01. Each of these waves took into account empirical results in the previous wave, so question format could be optimized and new treatments could be tested. Anchoring vignettes in Internet surveys can serve as a test bed for vignettes in CAPI or CATI surveys. For example, HRS and SHARE have implemented the work disability vignettes first tested in CentERpanel and ALP. ELSA will do so in the near future.

As with any new approach, there still are many aspects of anchoring vignettes that deserve more study, before their properties are fully understood. At least as important, however, we propose to carry out these experiments in parallel in different countries, namely the United States, The Netherlands (the CentERpanel or its successor MESS, see ) and Ireland, where we are involved in building a new Internet panel patterned after CentERpanel and ALP. The main reason for performing the same experiments in different countries lies in the very motivation for the use of anchoring vignettes—their use in making response scales comparable across countries. This fits in immediately with the fact that the HRS has now been replicated in more than 20 countries, with the aim of learning from international differences.

We intend to study a number of fundamental issues. The first one is response consistency—the assumption that a respondent will evaluate a vignette in the same way as he or she would evaluate him/herself, which is crucial for using vignettes to identify and correct for response scale differences. Objective measures can be used to test this assumption. A nice example is a study we are doing with an Internet survey of Irish students, where we ask how much they drink and then ask whether they consider their drinking behavior to be “mild,” “moderate,” “a cause for concern,” “excessive,” or “extreme.” The respondents are also shown vignettes of students with a certain drinking pattern and are asked the same evaluation of the vignette person’s drinking behavior. By matching the number of drinks of persons described in vignettes with the self-reported number of drinks, we can see if the verbal label attached to own drinking matches the label attached to the vignette person. Our first results suggest that this match is fairly close, thus confirming response consistency. In another experiment however, where we use SHARE data on self-reported cognitive capabilities and compare that to the outcomes of cognitive tests administered in the SHARE survey, the match is not as close.

Similarly, we generally find that correcting response differences across countries leads to very plausible results for work disability, whereas if we consider various health dimensions, the outcomes are harder to interpret. Why do vignettes seem to do such a good job of making scales comparable in some cases and yet appear to do less well in other cases? One important possibility is that the vignette description is incomplete; for instance, in experiments with the work disability vignettes, we found that it makes a significant (though not very large) difference if one adds a job description to the health problems or if we instruct the respondent to assume the vignette person has about the same age, education, and career history as the respondent him/herself.

We thus propose a number of experiments investigating the properties and usefulness of anchoring vignettes. These will include tests of response consistency, validation by comparing to objective external measures, and variations in the amount of detail in the vignette descriptions. We also plan to continue our work testing new applications of the vignette methodology. Thus far, our applications have concentrated on work disability, health status, and life satisfaction or happiness. In this new research, we will apply the vignette methodology to drinking behavior among adults, ADL and IADL scales, psychosocial variables, and perceptions of poverty and inequality.

The traditional ADL and IADL scales all have an intrinsically subjective component. The use of the word ‘difficulty’ in the typical ADL question illustrates this point. Little is currently known about whether differences across countries in stated ADL problems constitute a real difficulty or a difference in response scales. As a result, we also do not know how much differences in disability trends across countries are also partly the result of different response scales.

Psychosocial factors have been hypothesized to be closely related to the onset of several diseases, especially heart disease, in part because they are markers for the impact of stress. Most of these measures also contain the possibility of important threshold differences, especially across nations. For example, a common question and one contained in ELSA, HRS, and SHARE meant to measure control at work is “At work, I feel I have control over what I do in most situations.” Similar measures exist for control of family life and social situations. Within country, these measures have been shown to be related to heart disease problems, but little is known about their ability to explain cross-national differences in health outcomes.

D.1.4. Collecting Retrospective Histories

Smith

Based on the success in the previous grant (see Section C.1.5), we plan to conduct additional experiments with retrospective histories in the new research grant. One reason to do so is that ELSA, SHARE, and KLoSA are planning on fielding significant retrospective modules in future waves. For example, in the case of ELSA, the retrospective module will include in addition to standard marital and fertility histories, residential histories (conditioned on residing for six months or more), education histories, and employment histories (once again conditioning on a six-month stay duration). These employment histories obtain information on occupation, full time or part time work, starting and ending salary, and reason for any job switching.

This project will use the ALP and HRS Internet samples to experiment with the construction of life histories using CLH methods. These CLH methods create a time line in which dates are marked about significant salient life events, such as dates of marriage, births of children, moves, and job changes. These markers can then be used to assist respondents in recalling the occurrence and dates of other events like their health histories. These calendars can be presented in a simple-to-understand graphic format that is much more transparent and easier to understand than what is possible, say, in a telephone interview. In the two Internet samples, we will also experiment with the amount and types of information given to the respondent (no markers, partial markers, full markers, etc.).

For the HRS respondent sample, we have an additional option of checking on the quality of answers to history questions. By the time we interview them, HRS respondents will have been surveyed for up to eight rounds over a 14-year period. For example, during these 14 years, many of them experienced employment events that were recorded during the survey. Given that they were reported contemporaneously, the quality of the contemporaneous reports should be high. We will select those respondents who are in our HRS Internet sample and explore alternative methods of retrieving this information retrospectively. The HRS Internet sample will also be useful because respondents’ records of the timing of events can be cross-checked against their earnings histories available from the SS earnings match.

D.1.5. High frequency and event related interviewing

Schwarz, Dominitz

The HRS has included probabilistic questions on expected mutual fund returns in its core expectations module since 2002. Dominitz and Manski (2006) find that the data yield promising results on the relationship between expectations and portfolio choice. However, these data provide little evidence on how expectations are formed and are probably of limited potential value for estimating models of portfolio choice. We propose to design questions that elicit beliefs on recent returns and on prospective near-term and mid-term returns. Then, with monthly interviews, we can measure how beliefs are updated in response to new information, as indicated by the time series of individual reports on prospective and retrospective returns. In conjunction with data on investment activity, these surveys promise to yield critical data for the analysis of portfolio choice behavior.

We will also develop high frequency interviews that monitor doctor visits, medical tests, and/or prescription drug use behavior among ALP respondents, or that react to new policy events[6]. The interviews will be developed for both randomly-selected respondents and respondents who are selected based on events reported in regular interviews. For example, respondents who report the onset of a major health event such as a heart attack will be followed with at least monthly interviews to collect data on both objective and subjective outcomes. On the latter, we will include not only the usual self-assessed health questions but also a day reconstruction method (DRM) questionnaire to measure well-being. An important goal of these targeted interviews is to measure the evolution of well-being in response to major health events.

In its original form, the DRM was administered in person via pencil-and-paper surveys that took 45 to 75 minutes to complete (Kahneman et al., 2004). Such an instrument cannot be easily included in the regular HRS or supplemental interview. Internet interviewing offers the opportunity to design shorter interviews that further ease respondent burden by reducing the complexity of the instrument. We have begun experimentation with Internet DRM interviews using the ALP. We developed a long questionnaire of 30-40 minutes in length and a shorter version of 10-15 minutes by asking about a randomly-selected portion of the day, yielding the partial day reconstruction method (PDRM). We will give the PDRM questionnaire to randomly-selected ALP respondents, with others completing the full questionnaire. We will compare the results to each other and to previous findings from in-person DRM questionnaires.

D.2. Measurement of Preferences, Beliefs, Choice, and Cognitive Capabilities

D.2.1. Measurement of Retirement Preferences

Van Soest, Kapteyn

Individuals and couples who approach and enter retirement have to make important decisions about labor supply and retirement, health insurance and health expenditures, and consumption and saving. In structural models of economic behavior in a life cycle context, these decisions are functions of preferences about, for example, income versus leisure, health versus consumption, risk aversion, and the extent of discounting, as well as opportunities and expected future opportunities and constraints. Data on actual behavior are usually incomplete in describing opportunities now and in the future. Since actual choice opportunities are often quite limited, observed choice may not say much about underlying preferences. Stated preference (SP) data can be collected to overcome these problems and obtain accurate and efficient estimates of preferences. SP data have also been shown to lead to results that are broadly consistent with actual behavior and are, thus, a valuable tool in better identifying economic choice models (see Louviere, Hensher and Swait, 2000).

Using the CentERpanel, Van Soest, Kapteyn, and Zissimopoulos (2006) have shown that hypothetical retirement scenarios where respondents are given trajectories of work, part-time work, or retirement with corresponding earnings and pensions can be used to identify the effects of financial incentives on early retirement and phased retirement. Similar data are being collected in the ALP. We propose to extend this work in various directions. First, we will look at couples and use SP data to analyze collective bargaining models for joint retirement of husband and wife (cf., e.g., Apps and Rees, 1998). Such models typically outperform the traditional unitary model (Browning and Chiappori, 1998), but are also much harder to identify (Chiappori, 1988). SP data can be particularly helpful here, since respondents can be asked about their own choices, what their partner would choose, and what the most likely household decision would be. Second, we will explicitly incorporate future uncertainty in the scenarios (e.g., in the form of wage uncertainty, inflation risk for future benefits, or the possibility of health shocks). This also links these experiments to the information on subjective expectations that we aim to collect, and to the SP part on savings and consumption aimed at, among other things, measuring risk aversion (see Section D.2.3 below). Third, we want to make retirement scenarios more realistic by incorporating employer-provided health insurance and job attributes and by making a distinction between phased retirement at the career job versus bridge jobs at another employer (cf. Ruhm, 1990)

D.2.2. Choice of Health Insurance Plans

Van Soest, Winter, McFadden, Hurd

Americans are increasingly concerned about their future health expenditures. An important question for public policy is whether governments and private firms will be able to offer the insurance contracts that cover individual health expenditure risks and whether these contracts will be purchased by vulnerable groups of the populations (in particular, those in poor health and with low incomes and little or no wealth). From a research perspective, an important aspect of this larger question is how private demand for such insurance contracts is formed. Do people make rational decisions when it comes to health insurance? Does their demand vary with such personal factors as socioeconomic and health status, as well as health risk? A leading example is the new Medicare Part D program; other applications on which we will focus are the demand for long-term care (LTC) insurance and for insurance that covers the cost of assisted living arrangements more generally.

Specifically, we will conduct hypothetical choice experiments that involve health insurance products. In these experiments, survey respondents are presented with choices among two or more hypothetical insurance contracts that differ in various dimensions, such as premium and the specific form of coverage offered. In the example of Medicare Part D, contract features that may be experimentally varied include the premium and whether a plan has a deductible, offers coverage in the “gap”, and covers generics only or generics and brand-name drugs.

We plan to develop hypothetical choice experiments for both the ALP and the HRS Internet samples, extending earlier work by Winter et al. (2006) and Heiss, McFadden, and Winter (2007). Using the ALP allows us to develop and test new forms of hypothetical choice experiments. Ultimately, we will implement these experiments in the HRS Internet sample so that hypothetical choice data can be linked with existing HRS background variables. We will also collect factual information over the Internet, taking advantage of the fact that data bases can be accessed during the interview. For example, when asking about prescription drug names over the Internet, software can check the name entered by the respondent against a data base and provide support in the spelling with automatic fills or offer a list of common dosages of the entered drug from which the respondent can choose the one she uses.

D.2.3. Intertemporal Consumption Choices

Kimball, Shapiro

To further develop Web survey methods for measuring preference parameters, both the intensive and extensive margins are important. It is important to refine the measurement of the central time preference and intertemporal substitution parameters and to gain experience with Web survey techniques as applied to the measurement of additional preference parameters.

There are three key directions in which to push the measurement of time preference and intertemporal substitution. First, we have so far constrained agents to flat consumption in the first (pre-retirement) and second (post-retirement) periods. People’s choices might be different if they were allowed to choose a more flexible path of consumption over time. The Web format makes it possible to have respondents manipulate the graph of a consumption path flexibly with a mouse.

Second, while we have documented the value of shifts in the choice set with the implicit interest rate as a mechanism to force people to make active choices rather than passively keeping the same consumption growth rate (see Section C.1.3.1), it is important to compare the power of this mechanism to the effects of simpler mechanisms, such as instructing respondents in the preamble to the question that “many people find that they want to change what they do in the new situation.”

Third, the time-preference/intertemporal substitution questions we have used so far have the bias that inaction from any cause can look like a low elasticity of intertemporal substitution. It is important to compare these results to results from questions in which inaction implies a high elasticity of intertemporal substitution. This can help separate out cognitive issues in understanding the question from the true underlying elasticity of intertemporal substitution. This can be achieved as follows: Instead of imposing an interest rate and asking for a consumption profile, we can impose a consumption profile and then find the implicit interest rate at which an agent would trade-off small amounts of money when starting from that consumption profile. To the extent that the expressed willingness to trade-off small amounts of money is relatively unaffected by the initial consumption profile, it would imply a high elasticity of intertemporal substitution.

To gain experience with Web survey techniques applied to other preference parameters, labor supply elasticities—including retirement-age elasticities—are an appropriate set of parameters to focus on. Not only does the HRS focus on the determinants of retirement, it has also included some limited labor supply elasticity questions. Kimball and Shapiro (2006) show that HRS data on what people would do if they won a large lottery imply a large income effect on labor supply. Several important questions arise in the face of this suggestive data. First, while this suggests that people react strongly to income shocks, do they, in fact, respond strongly to the incentives implied by changes in the marginal (as opposed to average) wage? Second, do hypothetical choice tasks yield income and substitution effects that approximately cancel? Third, what happens when the size of a lottery or of a marginal wage change is varied—that is, “What is the dose-response mapping?”

D.2.4 Preference Elicitation through Experiments

Hung, Dominitz, Van Soest

In addition to various hypothetical experiments in preference elicitation discussed above, we propose to use controlled experiments to elicit preference parameters. Among other things this offers the ability to cross-validate the measurement outcomes obtained with the various approaches. Traditionally, experiments have been carried out in laboratory settings with small numbers or experimental subjects (often students). More recently, experiments have been conducted outside the laboratory (e.g., on the Internet (e.g., Bellemare and Kroeger, 2006, Bellemare et al., 2005; Bossaerts and Plott, 2004)). Next to representing a much more representative pool of subjects, using the ALP also has the advantage that a rich set of background variables is available. We propose to implement several economic experiments to gather data on behaviors and preferences that can be used by researchers to analyze retirement savings behaviors. Some examples are the following.

Risk Preference Elicitation. Understanding risk preferences is fundamental to understanding retirement investment decisions. Risk preference experiments will involve a simple choice task over risky alternatives. Paired with survey interviews eliciting preferences, this experimental design supports direct comparison between stated preferences and revealed preferences. For instance, the ALP has already collected two waves of responses to hypothetical gambles over lifetime income as developed by Barsky et al. (1997) on the HRS. 

We propose to elicit risk preferences using the lottery-choice decisions from Holt and Laury (2002) and will experiment with visual presentations to make the meaning of probabilities clearer, such as the pie charts used by Von Gaudecker, Van Soest, and Wengstroem (2006). Similar experiments have been conducted in Germany using a sample of subjects drawn from the German Socioeconomic Panel (Dohmen et al., 2006).

Time Preference Elicitation. Our analysis of time preferences will take advantage of the unique structure of the ALP. In particular, each experimental session will conclude with an offer to defer receipt of payoffs to a later date, at an experimentally varied rate of interest. To identify whether individual choices are consistent with a model of hyperbolic discounting, we will include additional choice tasks. In addition to the choice between immediate receipt of payment and a one-month delay of payment, we will also offer a choice between receipt of payment one month ahead and receipt of payment two months ahead. Those who choose the latter option will be given the opportunity to change the decision one month later, but perhaps at a penalty. With judicious design of the experiment, some hyperbolic discounters would, for example, be induced to initially select the two-month delay and then, after one month, would take a lower payoff than was originally offered. The structure of the ALP should reduce concerns about the failure to make future payments, and we will include questions about the motivation for observed choices to learn whether subjects express such concerns.

“Social” Preference Elicitation. We propose to investigate social preferences such as inequity aversion, reciprocity, and altruism to gain insight into individuals’ support of social security systems. This involves games such as the Ultimatum game and Dictator game that have been played numerous times in economic laboratories and, more recently, over the Internet (see Camerer and Fehr, 2006). Much of the evidence on social preferences is derived from student subject populations. Recent work by Bellemare et al. (2005) shows that students are less inequality-averse than other subgroups in the population. The ALP presents an opportunity to investigate social preferences in the U.S. population. We will target the experiments specifically at measuring intergenerational solidarity and support for old age social insurance.

D.2.5. Increasing reliability and validity of health utility elicitation

Ubel

Utility measurement is a crucial determinant of the output of decision analyses and cost-effectiveness analyses. There are three generally accepted ways to measure health related utility: rating scale, standard gamble, and time trade-off. We have conducted two studies within the HRS population that demonstrate some of the flaws of utility elicitation. In both studies, we elicited rating scale and time trade-off (TTO) utility values over the telephone from a random subset of 1,031 HRS participants.

In the first part of the study, we asked people to report their overall health on a 0-100 rating scale. Across participants, we randomly varied the definition of “100” on the scale to represent either “perfect health,” “perfect health for someone your age,” or “perfect health for a twenty year-old.” Our results demonstrated that when people are asked to rate their health on a standard rating scale, they adjust their response based on the norms they expect for someone their own age. People with the scale labeled as “perfect health” or “perfect health for someone your age,” gave similar ratings (73.1 and 72.9 respectively, p = n.s.), whereas people with the scale labeled “perfect health for a twenty year-old” gave lower ratings (with a mean of 65.0, p < .001 compared to both other groups). People interpret “perfect health” as meaning “perfect health for someone my age.” This recalibration threatens the validity of rating scale utility elicitations, because different participants are interpreting the endpoints of the scale in different manners.

In the second component of our study, we elicited TTO utility values from these same participants. We asked them to imagine they had 10 years left to live, and that they were living with their current level of health (which for most participants was substantially below perfect). We then asked them how many of these months they would give up in order to live in perfect health for the remainder of their life. If the utility measure is valid, people’s self-reported health ought to correlate with health related utility. However, we found almost no correlation. In fact, among those participants who were below the median in measures of cognitive ability (either numeracy, serial sevens, or word memory recall task), there was essentially no correlation between self-reported health and health related utility. Among those above the median, we found very low correlations, explaining less than 2% of the variance. From these data, we conclude that telephone administration of TTO utility elicitations does not yield valid responses.

We want to test whether the validity of utility elicitations will be increased by the use of the Internet. We have developed an Internet utility elicitation tool, which provides graphical representations of either standard gamble (SG) or TTO elicitation tasks, in order to make it easier for people to understand the questions they are being asked. The SG tool uses a pictograph to visually represent the probabilities of cure versus death. The TTO tool uses a slider bar to indicate the amount of time participants would choose to live with the suboptimal health state, and also indicates how much time the participant is choosing to give up. Both have undergone extensive userability testing, and have been explored for comprehensibility in think-aloud protocols.

The tool also includes consistency checks and feedback, to make sure people understand the task. We have not tested these tools in a general population asked to consider the utility of their own health. In addition, we have primarily tested these tools in younger populations, and have yet to determine whether the tools work well in an older population.

We will test the validity of TTO and SG measures using our interactive, graphical elicitation tool compared to a text-only Internet elicitation that closely mirrors a telephone script. We expect that the text-only tool will perform better than a telephone interview, because people can reread the text and better understand the task. However, we hypothesize that performance will be even higher with the graphical tool. We predict that correlations between health and utility will be significantly higher for participants randomized to the graphical elicitation tool than for those receiving the text-only elicitation. In addition, we predict that there will be a positive correlation not only for participants who are above the median in cognitive measures, but also for those below the median. To test these hypotheses, we will conduct a 2X2 randomized Internet experiment: SG versus TTO utility elicitation, and text versus graphical elicitation.

D.2.6. Measuring Expectations of Retirement Wealth and Expenditures

Dominitz, Manski, Van Soest, Delavande, Rohwedder, Haider

Numerous surveys have elicited point expectations of retirement income, especially SS benefits. These data, either point forecasts or qualitative assessments of often poorly defined outcomes, seem to be of limited use for understanding economic decision making (Dominitz and Manski, forthcoming).

We will build on recent research eliciting subjective probability distributions to develop new methods for measuring expectations of total pension income, other forms of retirement wealth, and retirement expenditures. Empirical analysis of these data, in combination with data on current savings and investment outcomes, promises to yield more credible findings on retirement savings decision making than would standard revealed preference analysis of the outcomes alone. See Dominitz and van Soest (forthcoming) and Manski (2004).

Measuring Retirement Wealth Expectations. Building on recent innovations in survey design, we will elicit probabilistic expectations of private-sector defined-benefit pension income (e.g., employer-provided pensions), wealth accumulation in dedicated retirement savings accounts (e.g., IRA and 401(k)), and wealth accumulation outside of retirement accounts. Elicitation of pension-benefit expectations will also include questions on the timing of benefit take-up (cf. Section C.1.3.2). Elicitation of wealth accumulation expectations will also include questions on expected contributions and on the expected rate of return to retirement account contributions, building on recent efforts to measure mutual fund return expectations in the HRS and elsewhere (Dominitz and Manski, 2005, 2006).

One goal in conducting this type of research is to assess uncertainty about total financial resources in retirement. This total is the sum of numerous components described by a joint probability distribution that seems very difficult to summarize, much less report coherently. We will develop questions eliciting expectations of income aggregated across various sources. We will assess the impact of variation in question-ordering and in the level of aggregation on response rates, internal consistency of responses, and the extent to which beliefs tend to match up to current population outcomes.

Measuring Retirement Expenditure Expectations. By eliciting expectations of retirement expenditures, we may directly address many important policy questions. For example, how much do households engage in precautionary savings to pay for large out-of-pocket medical expenditures? Also, how much do households anticipate reductions in consumption at retirement (Hurd and Rohwedder, 2003) and how much do observed reductions arise from unanticipated shocks (Banks et al., 1998; Haider and Stephens, forthcoming)?

Hurd and Rohwedder (2003) find that a large fraction of respondents expect spending to drop at retirement and that on average these expectations are in line with observed population outcomes. As in the case of expected retirement wealth, we will add probability elicitation to measure uncertainty about consumption during retirement rather. To assess the sensitivity of these beliefs to the retirement date and the circumstances that bring about retirement, we will design a line of questioning that elicits these expectations conditional on health or employment shocks (e.g., onset of disability or job loss) leading to retirement at an early age.

Evidence exists that eliciting retrospective reports of total expenditures is problematic, but the question of the optimal level of disaggregation has not been resolved (see, e.g., Browning, Crossley, and Weber, 2003 and Section D.1.2 above). We know of no comparable evidence on disaggregation of prospective expenditures. We will coordinate closely with the analysis proposed in Section D.1.2 to make sure that questions about prospective consumption are consistent with retrospective consumption questions, so that optimal levels of aggregation can be compared. Given the importance of potentially large out-of-pocket medical expenditures to both policymakers and individual decision makers, these expenses will be a central focus.

D.2.7. Assessment of cognitive and decision-making abilities

Bruine de Bruin, Parker, McArdle, Peters

Good decisions require the ability to understand relevant information, to process it, and to use that information in coherent decisions (Finucane et al., 2005; Grisso et al., 1995) – respectively crystallized intelligence, fluid intelligence, and decision-making competence (Bruine de Bruin et al., 2005; Cattell, 1941, 1987; Horn, 1988; Parker & Fischhoff, 2005). These cognitive abilities vary considerably across individuals and decrease with age (Finucane et al., 2002; 2005; McArdle et al., 2002). This variance may help explain behavioral outcomes, as well as individuals’ understanding of key economic parameters, such as discount rates, risk aversion, and elasticities (e.g., Bruine de Bruin et al., 2005; Frederick, 2005; Parker & Fischhoff, 2005).

Indeed, the measurement of cognitive ability has long been a part of the HRS, in recognition that a “mismatch between cognitive capacity and cognitive demands” may be crucial in understanding health and retirement decisions. Therefore, we propose three sets of measures, for (a) fluid intelligence; (b) crystallized intelligence, and (c) basic decision-making skills. A short, adaptive-testing scale, which is based on such number series as those used in the Woodcock-Johnson tests (McArdle et al, 2006; Woodcock, et al., 2001), was the focus of a recent HRS experimental module and will provide a metric of fluid abilities. Similar adaptive instruments are currently being programmed for crystallized abilities. Both sets will include Internet-ready instruments.

A third set of measures assesses specific decision-making skills. These skills are presumably subserved by more general cognitive abilities, but are more proximal predictors of behavioral decision outcomes (Bruine de Bruin et al, 2005; Parker & Fischhoff, 2005). A measure of numeracy (Lipkus et al., 2001) provides a high-level assessment of probabilistic reasoning, Frederick’s (2005) “cognitive reflection” scale assesses an individual’s cognitive flexibility in the face of non-intuitive problems, and a measure of Decision-Making Competence (DMC) reveals adherence to different normative principles of decision making (Bruine de Bruin et al., 2005; Parker & Fischhoff, 2005). Short forms of three DMC tasks are currently being piloted in MS6. These scales assess an individual’s resistance to sunk-cost considerations, consistency in risk perception, and overconfidence.

In general, the three sets of measures target four empirical goals: (1) to improve econometric analyses by providing adequate measurement of, and control for, a range of important cognitive and decision-making abilities, (2) to explore the relationships between cognitive and decision-making abilities and health and economic outcomes, as well as key economic parameters, such as discount rates, risk aversion, and elasticities, (3) to promote the development of adaptive testing procedures for evaluating cognitive and decision-making abilities that take advantage of the dynamic, interactive interface provided by Internet administration, and (4) to understand how cognitive and decision-making abilities are affected by increasing age.

Building on the experiments, and on synergies with ongoing work in related projects, particular attention will be paid to developing measures for fluid intelligence and decision-making competencies, which, to date, are absent from the current set of measures.

D.3. Integrating Internet Interviewing into the HRS

We believe that Internet interviewing will be central to the future of large surveys such as HRS because of its promise of lower cost and its potential for innovative methods of data collection. It is, however, unlikely that Internet alone will supplant more traditional modes of mail, telephone, and in-person interviewing—all of which are now used extensively in the HRS. Rather, the foreseeable future is one of mixed-mode interviewing. We propose here a program of development and substantive and methodological experiments to integrate Internet into the mode mix of the HRS.

Our research so far and a reading of the literature suggests: (1) Internet access is still only available to a minority of older Americans and is highly selective; (2) Mode effects vary by topic area; (3) Access is much broader among younger cohorts than older, suggesting that this mode will become increasingly viable in the future.

These facts suggest the following lines of future work. One, to rigorously test the use of Internet surveys as an alternative mode to mail surveys, i.e., to design mixed-mode experiments to determine what combination of mail and Internet produces the best response rate and data quality at lowest cost, and to continue to explore mode effect differences. Two, to continue the use of stand-alone Internet surveys to obtain unique content and to determine best methods of elicitation of difficult concepts. Third, to develop Internet-administrable versions of core HRS content that can be tested on non-HRS samples and that could serve as an alternative to telephone administration.

D.3.1 Mixed-mode Experiments on Mail Surveys

The first mixed-mode experiment combining mail and Internet surveys will be conducted in the Fall of 2007. The HRS anticipates conducting three different topical mail surveys at that time: the fourth wave of the Consumption and Activities Mail Survey (CAMS), the second wave of the Prescription Drug Survey (PDS), and a new survey using health vignettes to provide data for international calibration of self-reported health and disability (cf. Sections C.1.3.3 and D.1.3). Of the three topics, vignettes are the best-suited to a mixed-mode experiment at this time. Vignettes have been administered via Internet, notably in the CentERpanel and ALP. Data on mode effects obtained from the HRS experiment will be valuable for interpreting those results. Because the PDS targeted Medicare beneficiaries (most of whom are over 65) to study the implementation of Part D, it has relatively low proportions of people with Internet access. The Internet holds great promise for the measurement of consumption and time use. Content experiments will be conducted in early 2007 in Phase 2 under the current grant.

The 2007 mixed-mode experiment will therefore be focused on the vignette mail survey. Anticipated sample size is 6000, of which 2700 will have Internet access. Those with Internet access will be divided into five equal-sized treatment groups. One group will be a control group that will receive the same mail-only protocol as those without Internet access. A second group will be approached to do the survey on the Internet only, with no option to do it by mail. A third group will be approached first to do the survey by Internet, but then will be offered the mail option if they have not responded after one month. A fourth group will get the reverse treatment of mail first, followed by an Internet alternative. The fifth group will be approached from the outset with the option of doing the survey either by mail or by Internet. All other aspects of the protocols will be identical (advance letter containing a twenty-dollar respondent payment as is customary in HRS, with follow-up reminders at two, four, and six weeks).

This experiment will yield data from which a best-practice mixed-mode approach can be determined for future use in mail surveys, including CAMS, in 2009 and 2011. It will also provide data for testing of mode effects between mail and Internet.

D.3.2 Stand-alone Internet Surveys with HRS Respondents

We plan to conduct stand-alone Internet surveys on HRS respondents at the rate of one survey per respondent between each wave of HRS. These will take a variety of forms, utilizing the new content described in the previous sections of this proposal, and also continuing tests of mode effects with the HRS core interviews. Following the 2008 wave of HRS, stand-alone Internet surveys will be conducted with Internet-eligible respondents. These can begin as early as mid-2008 with those who have completed their core interviews, and will be completed by early Spring of 2009 because many of the same respondents will be contacted for mixed-mode mail/Internet surveys in the Fall of 2009. Much of the content of these interviews will have been previously developed and tested in the ALP surveys.

D.3.3 Developing an Internet Instrument for the HRS Core (and Beyond)

While the use of Internet surveys as an alternative to self-administered mail-back surveys is relatively straightforward, using the Internet mode as alternative to an interviewer-conducted core survey presents several challenges. However, we believe that the potential benefits of developing and pilot testing a full scale HRS core questionnaire for Internet interviewing are sufficient to justify the effort.

Programming. Integrating Internet with other modes raises two related programming challenges. One is to develop sample management systems that allow the seamless exchange of information from preload (prior wave data) or partially complete interviews between modes. The second is to design a computer-assisted Internet interview (CAII) for the HRS that embodies the same complex skip logic that the current CATI/CAPI systems have but which is easily used by an unassisted respondent rather than a trained interviewer. Developing this capability takes place in close collaboration between SRC and RAND, whose MMIC software is underlying the Internet applications in the current project, and which can handle multiple modes relatively easily. Using this approach, we will achieve important economies of scale and scope in the design and programming phases of introducing Internet interviewing.

In order to develop the CAII counterpart to the HRS CATI/CAPI instrument, we would begin by breaking the questionnaire into modules which could each be completed in a single sitting not longer than about 30 minutes. The entire core could then be administered in a sequence of sittings over a period of days or weeks. The average length of an HRS telephone interview is about 80 minutes, suggesting that three or at most four modules of 30 minutes or less would be sufficient. Experience with the CentERpanel suggests that administration over multiple sittings is feasible. The so-called DNB household survey ( ) collects about two and a half hours worth of financial information over five 30-minute interviews. The interviews are typically spread out over a period of a couple of months. We plan to use the ALP to conduct tests of the HRS CAII to see whether we can succeed in getting individuals and couples to navigate through the entire instrument. Once we are confident that we have a viable CAII for a given module, we would undertake careful testing for mode effects. This could be done by administering telephone, personal and Internet interviews to the same subjects, using ALP panel members and, ultimately, randomly selected HRS subjects.

Mode Effects. From its inception, the HRS has utilized a dual-mode design that uses personal and telephone interviewing interchangeably, building on prior research in survey methods that had established the absence of important mode effects. Numerous additional tests for the absence of mode effects between telephone and in-person interviewing have been conducted by the HRS project over the years, most recently comparing scores on cognitive measures where again small effects are found (McArdle, Fisher, and Kadlec, 2006)

As described earlier in Section C.1, a major goal of the first phase of the Internet project has been to test for mode effects between Internet and traditional modes for a variety of types of HRS content. Consistent with most research in the survey methods literature, we find that mode effects tend to be small when objective or factual information is sought. The great majority of HRS content is of this type, from disease diagnoses to amounts in checking accounts. However, significant mode effects are found between self-administered (mail or Internet) and interviewer-assisted (personal or telephone) methods for questions concerning well-being, quality of relationships and other psycho-social measurements. Our judgment is that in the latter case better data is obtained using self-administered methods such as the psycho-social “leave behind” questionnaire that was introduced into the HRS in 2006. Self administration may also be better in situations such as the CAMS survey where respondents may wish to consult their records before answering questions about expenditure items by, for example, looking at utility bills or credit card statements. On the other hand, tests of cognitive performance using self-administered tests may be sensitive to self- versus interviewer administration for a number of reasons including difficulty in monitoring the use of aids or even the possibility that someone other than the respondent does the test. In Section D.2.7 we discussed experiments aimed at developing cognitive tests that are suitable for administration over the Internet.

An Internet Alternative to the Core HRS. Under current plans, half the HRS sample will receive the enhanced face-to-face interview (with biomarkers and physical measures) in 2010, and half will be interviewed primarily by telephone. Assuming that an acceptable CAII alternative has been prepared and tested, at no additional cost to this project it can be offered to the Internet-ready HRS sample members scheduled for telephone interviews, with, as always, a control group who would receive the conventional telephone approach. We would expect three sorts of outcomes from the Internet offer. Some would successfully complete all the modules, at great cost savings to HRS, even if an added financial incentive were offered for Internet completion. Some would complete only some of the modules. They would need to be contacted by interviewers to complete the interview. Finally, some would never begin and would also need to be contacted by interviewers. HRS would risk little by integrating Internet into its mode mix in this way. As Internet access expands, the potential cost savings will grow.

Bringing the HRS to Other Samples. An important byproduct of developing a CAII that is integrated into a seamless multi-mode implementation of the HRS survey is that all or part of the HRS could be given to members of other samples. There are many possible applications of this capability. For example, there are a number of samples of individuals on whom specialized and expensive measurements have been obtained, such as medical examinations, brain imaging, blood testing, genetic analysis, or in-depth psychological assessments. Often these are opportunity samples and little is known about the social, economic and health histories, status, or behaviors which are measured in the HRS. With a HRS CAII in place, it would be relatively easy and cheap to provide measurements of HRS variables for members of such samples, thus expanding the cross-domain power of data on aging populations.

E. Human Subjects

Protection of Human Subjects

The University of Michigan has described its human subjects protection plan as part of the subcontract to RAND. Here we only describe the plan regarding the subjects participating in the American Life Panel

1. Risks to the Subjects

a. Human Subjects Involvement and Characteristics

The Monthly Survey of Consumers includes women and minorities at the rates these groups occur in the population. Respondents to this survey must be at least 18 years old. Respondents to the MS follow-up interview will be told that the University of Michigan is undertaking a joint project with RAND and asked if they would object to sharing their contact information and some demographic data (gender, age, education, income, race or ethnicity, and marital status) with RAND so that they can be contacted later to be asked if they would be willing to actually participate in an Internet survey.  Respondents with limited or no Internet access will be told that RAND is willing to provide them with Internet access and the necessary equipment to respond to the interview over the Internet.

b. Sources of Materials

For both the HRS Internet and ALP respondents, identifying information will be removed from the data set and stored separately for purposes of future data collection. A unique identifier will be assigned to the participant record so that over time data may be linked for analytical purposes.

The data collected, via Internet, mail or telephone, will be for the research purposes described in this proposal. They will also be made available as de-identified secondary analysis files to the public.

c. Potential Risks

We foresee no risks to the subjects. As with any study, there is a potential for identification of a participant - which we assess as a rare event. We use stringent data collection protocols designed to ensure participant confidentiality - including limited access to identifiers, secure storage, electronic security (such as private servers, limited access, encryption, etc). Additionally, all project personnel pledge confidentiality.

2. Adequacy of Protection Against Risks

a. Recruitment and Informed Consent

Participants in the American Life Panel will be recruited by letter after having given permission in the MS to be contacted by RAND, following a statement that contact and demographic information will remain confidential. The respondent contact information and demographic data are encrypted for transmission from the University of Michigan to RAND and are stored in an encrypted format on a secure server at RAND.

The initial letter to potential ALP respondents who already have Internet access will describe the goals of the study and provides the web address, login name, and password for the respondent to access the survey. The letter will also describe the measures that are taken to ensure confidentiality of responses, and will inform respondents that the data that are collected will be made available to the research community in anonymous form for the purpose of bringing insight and information to policy makers. The letter will also state: “Participation in this study is completely voluntary. Even if you agree to participate now, you may change your mind at any time during the course of the study. You may also refuse to answer any questions you don’t wish to answer.” Follow-up contacts may be made by letter, e-mail, and/or phone. A participant who chooses to log in to the survey will encounter a screen with a consent script. Subjects will complete the actual survey via the Internet. Telephone technical support will be available.

The initial letter to potential ALP respondents who lack Internet access contains similar information with regard to the purpose of the study, the use of the data, and the voluntary nature of participation. In addition, this letter will describe the equipment and services that will be provided by RAND to enable Internet access at no financial cost or risk to respondents. In particular, the letter will also inform these potential respondents that RAND will pay for the equipment and all service fees; the equipment will remain the property of RAND and cannot be resold; and, if the equipment or other components are stolen, lost or damaged, it will be on RAND's insurance and not be a loss to the subject.

b. Protection Against Risk

At the Survey Research Center and at RAND, we maintain high security standards - including limited access to individual identifiers, use of encryption, secure servers, etc. Additionally, all personnel involved in the project annually pledge confidentiality as a condition of employment. Electronic data are stripped of individual identifiers prior to data processing. These are maintained on a secure server and are accessed only when required for the conduct of the study. Participants are assigned a unique identifier which is attached to the electronic data, to allow tracking data and results over time.

3. Potential Benefits of the Proposed Research to the Subjects and Others

While we do not anticipate any direct benefits of participation in this research, we anticipate societal benefits related to policy reforms through the continued provision of accurate data to researchers and policy makers. In addition, discoveries could lead to improved data collection techniques, possibly reducing the burden of participation in social science research.

4. Importance of the Knowledge to be Gained

This research will determine the viability of internet interviewing as an alternative to mail and telephone methods, potentially lowering the cost and reducing the burden on respondents of obtaining data. Secondly, it will develop new methods using internet technology to improve the quality of data obtained and improve the experience for participants.

The risks to subjects are extremely low. Participation is voluntary and can be terminated at any time.

5. Data and Safety Monitoring Plan

The current data collection from participants to the American Life Panel takes place according to a Data Safeguarding Plan (DSP), approved by the RAND Human Subjects Protection Committee. Upon funding of the current proposal we will request renewal of the current DSP.

Inclusion of Women and Minorities

Participants in the American Life Panel will be recruited from a nationally representative telephone survey, the Monthly Survey of Consumers. We do not intentionally exclude any sex/gender or racial/ethnic group.

Inclusion of Children

Participants in the American Life Panel are aged 40 and above, thus children are not included in the study.

F. Vertebrate Animals

Not applicable.

G. Select Agent Research

Not applicable.

H. Literature Cited

Apps, P.F. and R. Rees (1988), “Taxation and the Household”, Journal of Public Economics, 35, 355-369.

Baker, D.W., Parker, R.M., Williams, M.V., & Clark, W.S. (1998). Health literacy and the risk of hospital admission. Journal of General Internal Medicine, 13, 791-798.

Baker, D.W., Parker, R.M., Williams, M.V., Clark, W.S., & Nurss, J. (1997). The relationship of patient reading ability to self-reported health and use of health services. American Journal of Public Health, 87, 1027-1030.

Banks, J., Kapteyn, A., Smith, J.P., and van Soest, A., (2005). Work Disability is a Pain in the *****, Especially in England, The Netherlands, and the United States 2005. RAND Labor and Population Program. RAND Working Paper WR-280.

Banks, James, Richard Blundell, and Sarah Tanner (1998). “Is There a Retirement-Savings Puzzle?” American Economic Review, 88, 769-788.

Baron, J. (2000). Thinking and deciding. New York, NY: Cambridge University Press.

Barsky, B., Juster, F. T., Kimball, M. and Shapiro, M., (1997). “Preference Parameters and Behavioral Heterogeneity: An Experimental Approach in the Health and Retirement Study,” Quarterly Journal of Economics (May), 537--579.

Bellemare, C. and S. Kroeger (2006), “On representative social capital,” European Economic Review, forthcoming.

Bellemare, C., Kroeger, S. and A. van Soest (2005), “Actions and beliefs: estimating distribution-based preferences using a large scale experiment with probability questions on expectations,” IZA discussion paper 1666.

Bimber, B. (2000) Measuring the Gender Gap on the Internet.  Social Science Quarterly, 81, 3, 868-875.

Bossaerts, P.L. and C. R. Plott (2004) “Basic Principles of Asset Pricing Theory: Evidence From Large-Scale Experimental Financial Markets," Review of Finance, 8: 135-169.

Browning, M. and P.-A. Chiappori (1998), “Efficient Intra-Household Allocations: A General Characterization and Empirical Tests”, Econometrica, 66, 1241-1278.

Browning, M., T. F. Crossley, and G. Weber (2003). “Asking Consumption Questions in General Purpose Surveys”, Economic Journal, 113, F540-F567.

Bruine de Bruin, W., Parker, A.M., & Fischhoff, B. (2005). Decision-making competence: Measures of process and outcome. Paper presented at the Annual Meeting of the Society for Judgment and Decision Making, Toronto, Canada.

Camerer, C.F. and E. Fehr (2006) "Measuring social norms and preferences using experimental games: A guide for social scientists," in Foundations of Human Sociality: Experimental and Ethnographic Evidence from 15 Small-scale Societies. Oxford University Press, forthcoming

Cattell, R.B. (1941). Some theoretical issues in adult intelligence testing. Psychological Bulletin, 38, 592.

Cattell, R.B. (1987). Intelligence: Its structure, growth and action. Amsterdam: Elsevier.

Chang, L. C., Krosnick, J. . Comparing Self-administered Computer Surveys and Auditory Interviews: An Experiment A.Source The American Association for Public Opinion Research (AAPOR) 57th Annual Conference, 2002.Chiappori, P.-A. (1988), “Rational Household Labor Supply”, Econometrica, 56, 63-89.

Coen, T., Lorch, J., (2005). The effects of survey frequency on panelists' responses; White paper, ESOMAR. Available at .

Couper, M.P., Kapteyn, A., Schonlau, M., and Winter, J. (2006), “Noncoverage and Nonresponse in an Internet Survey.” Social Science Research.

Couper MP, Tourangeau R, Kenyon K. (2004). Picture this!: exploring visual effects in Web surveys. Public Opinion Quarterly, Winter, 68, pp 255–266.

Couper MP, Traugott MW, Lamias MJ. (2001). Web survey design and administration. Public Opinion Quarterly, Summer, 65, pp 230–253.

Dawes, R.M., & Hastie, R. (2001). Rational choice in an uncertain world: The psychology of judgment and decision making. Thousand Oaks, CA: Sage Publications.

de Walt, D.A., & Pignone, M.P. (2005). Reading Is Fundamental: The

Relationship Between Literacy and Health. Archives of Internal Medicine,

165, 1943-1944.Delavande, A. (2005). “Measuring Revisions to Subjective Expectations: Learning about Contraceptives,” working paper.

Delavande, A. and S. Rohwedder (2006a), Using Visual Aid to Elicit Expectations about Future Social Security Benefits: Results from a module in HRS Internet IW 2, forthcoming

Delavande, A. and S. Rohwedder (2006b), Uncertainty about Future Social Security Benefits, Working Paper, forthcoming.

Denscombe M. (2006). Web-based questionnaires and the Mode Effect: an evaluation based on completion rates and data contents of near-identical questionnaires delivered in different modes. Social Science Computer Review, 24(2), pp 246–254.

Dillman, D. G. Phelps, R. Tortora, K. Swift, J. Kohrell, and J. Berck. (2001). Response Rate and Measurement Differences in Mixed Mode Surveys Using Mail, Telephone, Interactive Voice Response and the Internet. Working Paper, WSU.

Dohmen, T.J., Huffman, D., Schupp, J., Sunde, U., and Wagner, G. G. (2006) "Individual Risk Attitudes: New Evidence from a Large, Representative, Experimentally-Validated Survey." Centre for Economic Policy Research Discussion Paper No. 5517.

Dominitz, J. and Hung, A. A., (2006) “Financial Literacy in HRS and ALP,” working paper forthcoming.

Dominitz, J., M. Hurd, and A. Kapteyn (2006). “The Quality of Subjective Probability Data in the HRS: CAPI, CATI, and Internet Interviews.” RAND Working Paper, Santa Monica, CA.

Dominitz, Jeff, and Charles F. Manski (2005). “Measuring and Interpreting Expectations of Equity Returns” NBER Working Paper 11313.

Dominitz, Jeff, and Charles F. Manski (2006). “Expected Equity Returns and Portfolio Choice: Evidence from the Health and Retirement Study”, presented at the annual meetings of the European Economic Association, Vienna, Austria.

Dominitz, Jeff, and Charles F. Manski (forthcoming). “Measuring Pension-Benefit Expectations Probabilistically”, Labour.

Dominitz, Jeff, and Arthur van Soest (forthcoming). “Analysis of Survey Data”, in Steven Durlauf and Lawrence Blume, eds., The New Palgrave Dictionary of Economics, 2nd Edition, Palgrave Macmillan.

Enright, E. (2006) Make the Connection: Recruit for online surveys offline as well. Marketing News, 40, 6, pp. 21-22.

Finucane, M.L., Mertz, C.K., Slovic, P., Schmidt, Elizabeth S. (2005). Task complexity and older adults' decision-making competence. Psychology and Aging, 20, 71-84.

Finucane, M.L., Slovic, P., Hibbard, J.H., Peters, E., Mertz, C.K., & Macgregor, D.G. (2002). Aging and decision-making competence: An analysis of comprehension and consistency skills in older versus younger adults. Journal of Behavioral Decision Making, 15, 141-164.

Frederick, S. (2006). Cognitive reflection and Decision Making. Journal of Economic Perspectives, 19, 25-42.

Peters, E., Västfjäll1, D., Slovic, P., Mertz, C.K., Mazzocco, K., & Dickert, S. (2006). Numeracy and Decision Making. Psychological Science, 17, 407-413.

Fricker S, Galesic M, Tourangeau R, Yan T. (Fall 2005). An experimental comparison of Web and telephone surveys. Public Opinion Quarterly, 69, pp 370–392.

Haider, Steven, and Melvin Stephens, Jr., (forthcoming). “Is There a Retirement-Consumption Puzzle: Evidence Using Subjective Retirement Expectations”, Review of Economics and Statistics.

Heiss, J., McFadden, D., Winter, J. (2007): “Mind the gap! Consumer perceptions and choices of Medicare Part D prescription drug plans.”, in preparation for the Annual Meeting of the American Economic Association, Chicago, January.

Hill, D, Mike Perry, Bob Willis “Do Internet Surveys Alter Estimates of Uncertainty and Optimism about Survival Chances?” Working Paper.

Holt, C.A. and S. Laury (2002) "Risk Aversion and Incentive Effects in Lottery Choices" (with S. Laury), American Economic Review, 92: 1644-55.

Huggins, VJ, Krotki, K (2001) “Implementation of nationally representative web –based surveys” Proceedings of the Annual Meeting of the American Statistical Association.

Hurd, Michael D. and Susann Rohwedder, (2006a). “Empirical Analysis of Saving and Dissaving at Older Ages: the Advantages of Using Consumption Data,” RAND L&P Working Paper, forthcoming.

Hurd, Michael D. and Susann Rohwedder, (2006b). “Life-cycle Consumption and Wealth Paths at Older Ages,” RAND L&P Working Paper, forthcoming.

Hurd, Michael D. and Susann Rohwedder, (2006c), “Methodological Issues in Collecting Consumption Data: Results from the Consumption and Activities Mail Survey and Experiments in the American Life Panel,” RAND L&P working Paper, forthcoming.

Hurd, Michael, and Susann Rohwedder (2003). “The Retirement-Consumption Puzzle: Anticipated and Actual Declines in Spending at Retirement”, NBER Working Paper 9586.

Kahneman, Daniel, Alan B. Krueger, David A. Schkade, Norbert Schwarz, and Arthur A. Stone (2004). “A Survey Method for Characterizing Daily Life Experience: The Day Reconstruction Method,” Science, 306, 1776-1780.

Kahneman, D., Slovic, P., & Tversky, A. (1982). Judgment under uncertainty: Heuristics and biases. New York, NY: Cambridge University Press.

Kaplowitz MD, Hadlock TD, Levine R. (2004). A comparison of Web and mail survey response rates. Public Opinion Quarterly, Spring 68, pp 94–101.

Kapteyn, A., J.P. Smith, and A. van Soest, “An International Comparison of Work Disability”, American Economic Review, forthcoming.

Kapteyn, Arie, Pierre-Carl Michaud, James Smith, and Arthur van Soest, (2006). “Effects of Attrition and Non-Response in the Health and Retirement Study”, RAND manuscript, May, 2006.

Kapteyn, A. and A. van Soest (2006), Asset Ownership and Asset Amounts: Comparing HRS 2002, HRS Internet 2003, HRS 2004 and HRS Internet 2006, paper presented at the Internet Interviewing conference in Ann Arbor, September 2006.

Kapteyn, A., J. P. Smith, A. van Soest. (2004). Self-reported Work Disability in the US and The Netherlands, , RAND L&P working paper WR-206 , American Economic Review, forthcoming.

Kiernan NE, Kiernan M, Oyler MA, Gilles C. (2005). Is a Web survey as effective as a mail survey? A field experiment among computer users. American Journal of Evaluation, 26(2), pp 245–252.

Kimball, Miles S., Claudia R. Sahm, and Matthew D. Shapiro, (2006.) “Measuring Time Preference and Intertemporal Substitution with Web Surveys,” Preliminary Report.

Knauper, B (1999) The Impact of Age and Education on Response Order Effects in Attitude Measurement. Public Opinion Quarterly, 63(3), 347-370.

Lillard, L.A., and R.J. Willis (2001), “Cognition and Wealth: The Importance of Probabilistic Thinking,” prepared for the Third Annual Joint Conference for the Retirement Research Consortium, Washington D.C., May 17-18.

Link MW, Mokdad, AH. (2005). Effects of survey mode on self-reports of adult alcohol consumption: a comparison of mail, Web and telephone approaches. Journal of Studies on Alcohol, March, 66(2), pp 239–245.

Lipkus, I.M., Samsa, g., & Rimer, B.K. (2001). General performance on a numeracy scale among highly educated samples. Medical Decision Making, 21, 37-44.

Manski, Charles F. (2004). “Measuring Expectations”, Econometrica 72, 1329–76.

McArdle, J. J., Fisher, G. G., & Kadlec, K. M. (2006) Latent variable analysis of age trends of cognition in the Health and Retirement Study, 2001 - 2004.  (under review)

McArdle, J.J., Ferrer-Caja, E., Hamagami, F., & Woodcock, R.W. (2002). Comparative longitudinal structural analyses of the growth and decline of multiple intellectual abilities over the life span. Developmental Psychology, 38, 115-142.

McArdle, J.J., Fisher, G.G., Rodgers, W., Horn, J.L., Woodcock, R.W. (2006). An Experiment in Measuring Fluid Intelligence over the Telephone in the Health and Retirement Study (HRS). Unpublished manuscript.

Lusardi, A. and Mitchell O. S., (2005) “Financial Literacy and Planning: Implications for Retirement Wellbeing,” Michigan Retirement Research Consortium Working Paper 2005-108.

McFadden, D., Schwarz, N., Winter, J. (2006): “Measuring perceptions and behavior in household surveys.” Unpublished manuscript.

Montagnier, P., Muller, E., Vickery, G.. (2002) The Digital Divide: Diffusion and Use of ICTs . IAOS Conference on Official Statistics and the New Economy.

Parker, A.M. & Fischhoff, B. (2005). Decision-making competence: External validation through an individual-differences approach. Journal of Behavioral Decision Making, 18, 1-27.

Parks KA, Pardi AM, Bradizza CM. (2006). Collecting data on alcohol use and alcohol-related victimization: a comparison of telephone and Web-based survey methods. Journal of Studies on Alcohol, 67(2), pp 318–323.

Peters, E., Västfjäll, D., Slovic, P., Mertz, C.K., Mazzocco, K., & Dickert, S. (2006).  Numeracy and decision making.  Psychological Science, 17(5), 408-414.

Porter SR, Whitcomb ME. (2003). The impact of contact type on Web survey response rates. Public Opinion Quarterly, 67, pp 578–588.

Reyna, V. F. (2004). How people make decisions that involve risk: A dual-processes approach. Current Directions in Psychological Science, 13, 60–66.

Rietz and Wahl (1999) in Tuten, TL, Urban DJ, Bosnjak, (2002). M Internet Surveys and Data Quality: A review.

Ruhm, C. (1990): “Bridge Jobs and Partial Retirement”, Journal of Labor Economics, 8(4), 482-501.

Schaefer D R, and Dillman D A, “Development of a Standard E-mail methodology: results of an experiment,” Public Opinion Quarterly, Vol 62, 1998, pp 378-397.

Schonlau, M., A. VanSoest, A. Kapteyn (forthcoming a) Webographic survey questions, RAND Working Paper.

Schonlau, M., A. VanSoest, A. Kapteyn (forthcoming b) Beyond demographics: do additional `Webographic’ questions facilitate differentiating between Web and Phone survey respondents? RAND Working Paper.

Schonlau, M, “Propensity weighted web surveys”. (forthcoming b) Encyclopedia of Survey Research Methods, SAGE .

Schonlau, M., van Soest, A., Kapteyn A, Couper, M. (2006) Selection bias in Web surveys and the use of propensity scores, RAND Working Paper,WR-279.

Schonlau, M., van Soest, A., Kapteyn, A., Couper, M. P., Winter, J. (2004b): “Adjusting for selection bias in web surveys using propensity scores: The case of the Health and Retirement Study.” Proceedings of the American Statistical Association, Survey Research Methods Section (CD-ROM), Alexandria, VA: American Statistical Association (2004).

Schonlau M, Asch BJ, Du C. (2003). Web surveys as part of a mixed mode strategy for populations that cannot be contacted by e-mail. Social Science Computer Review. 21(2): 218-222.

Schonlau, M., Fricker R., Elliott, M. (2002). Conducting Research Surveys via Email and the Web, Santa Monica, CA: RAND.

Simon, H.A., (1978). Rationality as process and product of thought. American Economic Review, 68, 1-16.

Tuten, TL, Urban DJ, Bosnjak, (2002). M Internet Surveys and Data Quality: A review.

van Soest, A, Arie Kapteyn, Tatiana Andreyeva, James P. Smith (2006), “Self Reported Disability and Reference Groups”, RAND Labor and Population Working Paper, WR-409.

Von Gaudecker, H.-M., Van Soest, A., and E. Wengstroem (2006), “Preference heterogeneity (and portfolio choice),” paper presented at the RTN-AGE workshop in Paris, May 2006.

Weir, David R. , (2004). “Modes, Trends, and Content: A Comparison of the 2003 HRS Internet Survey with HRS 2002 and 2004 Core Survey Responses for Evaluative, Subjective, and Objective Content” paper prepared for the RAND-HRS Internet Project Meeting, Traverse City, MI, July 2004.

Wiebe (2002) Nonresponse Error and Mode Effect Error in Web surveys , Presentation at the ZUMA Workshop on web surveys. Mannheim, Germany.

Winter, J., Balza, R., Caro, F., Heiss, F., Jun, B., Matzkin, R., McFadden, D. (2006): “Medicare prescription drug coverage: Consumer information and preferences.” Proceedings of the National Academy of Sciences of the United States of America, 103(20), 7929–7934.

Winter, J. (2002): “Bracketing effects in categorized survey questions and the measurement of economic quantities.” Discussion Paper No. 02-35, Sonderforschungsbereich 504, University of Mannheim.

Woodcock, R.W., McGrew, K.S., & Mather, N. (2001). The Woodcock-Johnson –III Tests of Cognitive Abilities. Itasca, IL: Riverside Publishing Company.

Van Soest, A., Kapteyn, A. And J. Zissimopoulos (2006): “Using Stated-Preference Data to Analyze Preferences for Full and Partial Retirement”, RAND L&P working paper 345.

Yates, J.F. (1990). Judgment and decision making. Englewood Cliffs, NJ: Prentice Hall.

I. Multiple PI Leadership Plan

Not applicable.

J. Consortium/Contractual Arrangements

See Attached.

K. Resource Sharing

The data collected through the ALP are made available to the research community as public-use files, to be downloaded from the Web, subject to registration as a user (). The data collected from HRS respondents are made available to the research community as public-use files, to be downloaded from the Web, subject to registration as a user ().

L. Letters of Support

Letters enclosed:

Wandi Bruin de Bruin, Carnegie Mellon

Steven J. Haider, Michigan State University

Charles F. Manski, Northwestern University

John J. McArdle, University of Southern California

Daniel McFadden, University of California, Berkeley

Ellen Peters, Decision Research

Peter Ubel, University of Michigan

Joachim Winter, University of Munich

Consortium Letters:

University of Michigan

Institute for Social Research

-----------------------

[1] See for details.

[2] This applies going forward. It does not, of course, apply to the current grant.

[3] This relatively low response rate opens the possibility of further selectivity issues and is part of the research proposed in the new grant

[4] There is an exception to this rule. In January 2006 we have decided to administer MS4 (before MS2) to the new respondents who had completed MS1; the reason is that MS4 contains material relating to the new part D Medicare drug benefit, which has partly been used as a pilot for similar material in HRS Internet 2.

[5] It turns out that not every potential Internet respondent that we receive from the MS sample actually has Internet access.

[6] The analysis of choices around Medicare Part D drug benefits is an example of the power of Internet panels to accommodate analyses of new policies quickly.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download