Simultaneous Raking of Survey Weights at Multiple Levels



Simultaneous Raking of Survey Weights at Multiple LevelsStas Kolenikov, Abt SRBI and Heather Hammer, Abt SRBIAbstractThis paper discusses the problem of calibrating survey weights to data at different levels of aggregation, such as households and individuals. We present and compare three different methods. The first does the weighting in two stages, using only the household data, and then only the individual data. The second redefines targets at the individual level, if possible, and uses these targets to calibrate only the individual level weights. The third uses multipliers of household size to produce household level weights that simultaneously calibrate to the individual level totals. We discuss advantages and disadvantages of these approaches, including their requirements in terms of access to the control total data and software. We conclude by outlining directions for further research.MotivationIn social, behavioral, health and other surveys, weight calibration is commonly used to correct for non-response and coverage errors CITATION Kott2006 \t \l 1033 (Kott, 2006) CITATION Kott2009 \t \l 1033 (Kott, 2009) CITATION DevilleSarndal1992 \l 1033 (Deville & Sarndal, 1992). The essence of the method is to adjust the survey weights so that the weighted totals (means, proportions) agree with the externally known benchmarks. The latter may come from the complete frame enumeration data (population registers available in some European countries) or other large scale high quality surveys (such as the American Community Survey (ACS) in the USA).One commonly used implementation of calibration algorithms is iterative proportional fitting, or raking CITATION DemingStephan1940 \l 1033 (Deming & Stephan, 1940) CITATION Kolenikov2014 \l 1033 (Kolenikov, 2014). In this algorithm, the calibration margins are adjusted one at a time (i.e., effectively post-stratified), with variables being repeatedly cycled, until the desirable degree of convergence is achieved. Implementations of raking may differ. In the simplest implementations, only adjustments of proportions may be feasible, and, as shown later in this paper, this may limit the survey statistician’s ability to produce accurate weights. Many real world populations exhibit hierarchical structure that sampling statisticians can use (or simply find unavoidable). Persons in non-institutionalized populations are nested in households; patients are nested within hospitals; students are nested in classrooms which are in turn nested in schools. Calibration target data may exist at these multiple levels. This paper demonstrates how raking can be implemented to utilize these data. The running examples in the paper are households and individuals, which are often the last two stages of selection in general population surveys. The survey data that can be used for calibration may include the number of adults in the household and the household income at the household level; and age, gender, race and education at the individual level.In the demonstration, we describe and exemplify three approaches to survey weighting:A two-stage process in which the household weights are produced first by calibrating only to the household targets using the base weights as input to calibration. Then the individual weights are produced using the first stage calibrated household weights as inputs and calibrating to the individual targets only.The individual weights are produced in a single pass using both the individual and household targets, but the latter are redefined at the individual level (e.g., number of individuals that live in households with exactly two adults). Here, the household weights can be produced by dividing the individual weights by the number of eligible adults in the household.The household weights are produced in a single pass using the expansion multipliers (i.e., household size) from the household level to the individual level. The targets can remain at the level at which they were defined. Here, the individual weights can be produced by multiplying the household weights by the expansion multipliers that were used in calibration.These three approaches have their advantages and disadvantages. Approach 1 may be the simplest to implement, however the household weights will not benefit from the accuracy gains afforded by calibration to the individual targets. Also, the weights produced by a two-step procedure are likely to be more variable, reducing efficiency of the survey estimates CITATION KornGraubard99 \l 1033 (Korn & Graubard, 1999). Approaches 2 and 3 may or may not produce weights at the “other” level that are accurate for their targets. Specifically, the implied household weights from approach 2 may or may not match the household targets, and the implied individual weights from approach 3 may or may not match the individual targets. Approach 2 requires access to the large scale microdata. While the number of individuals residing in households of different sizes can be inferred from the household level data (if there are 10 million households with one adult, and 15 million households with two adults, we know that there are 10 million individuals residing in households with one adult, and 30 million individuals residing in households with two adults), there is no real way to transform, for example, information on household income unless it is also available by household size. Households with income $50,000 to $75,000 may have any number of residents. If the available raking calibration package only supports raking to proportions, then approach 3 cannot be implemented.The remainder of the paper compares and contrasts these three approaches. The next section introduces a numerical example based on ACS data. Then calibration is done using the three approaches, and the paper concludes with a short discussion of the findings. The Stata 12 statistical package CITATION Sta11 \l 1033 (StataCorp. LP, 2011) is used for data management and analysis, and a third party raking package written by one of the authors CITATION Kolenikov2014 \l 1033 (Kolenikov, 2014) is used for calibration. The complete Stata code is provided in the Appendix.The analysis assumes a general population survey, however; specialized populations can be handled by appropriate screening of the survey sampling units and subsetting the frame/population data to define the targets. Data set upThis demonstration of the three approaches uses 1 year ACS 2012 data downloaded from the website CITATION Rug10 \l 1033 (Ruggles, Alexander, Genadek, Goeken, Schroeder, & Sobek, 2010). The variables used in the data simulation and analysis are listed in Table 1.Table 1. ACS variables used in examples.serialHousehold serial numberpernumPerson number in sample unitrelateRelationship to household head [general version]hhincomeTotal household incomeageAgesexSexraceRace [general version]educdEducational attainment [detailed version]The full ACS data set was subset to include only adults ages 18 and above, totaling 2,294,898 individuals in 1,207,415 households. The resulting (unweighted) data set is treated as the finite population under study. The following derived variables were produced from the variables listed in Table 1:Household size (number of adults) with 4 categories: 1, 2, 3, 4 or more.Race with 3 categories: White only, Black/African American only, otherEducation with 5 categories: below high school, high school/general education diploma, some college/associate degree, bachelor’s degree, graduate/professional degreeTotal household income with 5 categories: under $20,000, 20,000 to under $40,000, $40,000 to under $65,000, $65,000 to under $100,000, $100,000 and aboveAge group with 5 categories: 18-29, 30-44, 45-54, 55-64 and 65 and aboveAn initial simple random sample of size 5,000 households was drawn from the data, and one adult was randomly selected from each household. To produce non-trivial deviations from the population distribution of the key variables, a simple response model was produced as a logistic regression model with coefficients given in Table 2. Response propensities had a mean of 0.230 and ranged from 0.129 to 0.323. In real world surveys, response propensities need to be estimated (rather than being known as in this simulation example), and these estimated response propensities usually have more variability.Table 2. Response model: Prob[response] = (1+exp(x’β))-1VariableCategory / transformationLogistic regression coefficientRaceWhite0.25RaceBlack, Other0EducationBelow high school-0.4EducationHigh school, some college0EducationBachelor’s degree+0.1EducationGraduate degree+0.3IncomeLn( income + 20,000 )0.1Intercept-0.3The population and sample counts and proportions are given in Table 3. Population totals listed in this table are used as raking targets subsequently.Table 3. Population (calibration targets) and sample counts and proportions.VariableCategoryPopulation totalPopulation %Sample countSample %Households1207415100%1137100%Household size1 (one adult)38847032.17%39334.56%Household size2 (two adults)62935352.12%58851.72%Household size3 (three adults)13180110.92%1129.85%Household size4 (four or more adults)577914.79%443.87%Household income1 Under $20,00022467718.61%20718.21%Household income2 $20,000–under $40,00025235620.90%24021.11%Household income3 $40,000–under $65,00024997820.70%25422.34%Household income4 $65,000–under $100,00021940818.17%21118.56%Household income5 $100,000–above26099621.62%22519.79%Individuals2294898100%1137100%Household size1 (one adult)38847016.93%39334.56%Household size2 (two adults)125870654.85%58851.72%Household size3 (three adults)39540317.23%1129.85%Household size4 (four or more adults)25231910.99%443.87%Household income1 Under $20,00030789613.42%20718.21%Household income2 $20,000–under $40,00042995118.74%24021.11%Household income3 $40,000–under $65,00048413621.10%25422.34%Household income4 $65,000–under $100,00047118320.53%21118.56%Household income5 $100,000–above60173226.22%22519.79%GenderMale108553147.30%46440.81%GenderFemale120936752.70%67359.19%RaceWhite only181470779.08%95383.82%RaceBlack/African American only2278269.93%1028.97%RaceOther25236511.00%827.21%EducationBelow high school29973013.06%1069.32%EducationHigh school/GED65660828.61%31527.70%EducationSome college69794730.41%35531.22%EducationBachelor's degree39994317.43%20918.38%EducationGraduate/professional degree24067010.49%15213.37%Age18-2939525017.22%16614.60%Age30-4452879223.04%26723.48%Age45-5443767219.07%20718.21%Age55-6442880718.69%22619.88%Age65+50437721.98%27123.83%The resulting sample of respondents has the sample size of 1137, and demonstrates some minor imbalances from the population proportions.Approach 1: raking in two stepsThe first approach to weighting at multiple levels is to produce weights sequentially, first for households, then for individuals. Base household weights are used as inputs for household level raking. Raked household weights multiplied by the household size are used as inputs for person level raking. Household size may be capped to avoid extreme weights, and in this example, household size was capped at 4, consistent with the categorical variable of household size.Raking converged successfully in 7 and 6 iterations, respectively. The raked weights for both households and individuals reproduce their respective targets from Table 3 within numeric accuracy. Descriptive statistics for the Approach 1 weights are given in Table 6, along with those for other approaches.Approach 2: raking individual weights using redefined targets for householdsThe second approach relies on redefining the population targets for households at the individual level. In other words, rather than specifying the number (or proportion) of households with income under $20,000 in the population, the targets are defined as the number of adults who live in such households. Only one pass of raking is required that uses all the calibration variables at once. The base individual weights that combine both stages of selection (the household selection and selection of an adult within the household) can be used as input weights. The household weights are derived from the raked individual weights under this approach as the ratio of the raked individual weights to the household size, capped at 4 to avoid extremely small weights.Raking converged successfully in 14 iterations. All of the proper individual level control totals (gender, race, education, and age), as well as the household targets expressed at individual levels, were reproduced within numeric accuracy, and are thus not reported. Weight summaries are reported later in Table 6 in the Discussion section. Note that Table 4 reports the results for household level variables whose convergence is not guaranteed. While household size is generally on target (as it is one of the raking margins, and for values from 1 to 3 was calibrated to the correct total), household income is not that accurate. These problematic values are shown in bold, italicized red.Table 4. Household weights from Approach 2. VariableCategoryPopulation totalPopulation %Weighted countWeighted %Households1207415100%Household size1 (one adult)38847032.17%388469.9632.03%Household size2 (two adults)62935352.12%629353.0051.90%Household size3 (three adults)13180110.92%131801.0110.87%Household size4 (four or more adults)577914.79%63079.765.20%Household income1 Under $20,00022467718.61%233052.5719.22%Household income2 $20,000–under $40,00025235620.90%241094.2019.88%Household income3 $40,000–under $65,00024997820.70%255299.7121.05%Household income4 $65,000–under $100,00021940818.17%222660.1318.36%Household income5 $100,000–above26099621.62%260597.1021.49%Approach 3: raking household weights with multipliersThe third approach rakes household level weights, and uses the individual level targets via the household size multipliers. Individual level weights are then obtained as the product of household level weights and number of adults in the households (capped at 4, as in other approaches). The household base weights can be used as raking inputs.In simple raking, the sum of (individual level) weights for, say, less than high school education, is equated to the number of people with this education level in the population. In the extended version of raking with multipliers, the former sum is replaced by the sum of household level weights being raked, multiplied by the household size, with the sum taken only over individuals in the sample with the education level being processed. The (household) weights for these cases are then aligned by the ratio of the population total to the aforementioned sum so that the weighted sum of household sizes for this education level is equal to the population control total. The Stata code CITATION Kolenikov2014 \l 1033 (Kolenikov, 2014) was designed to allow this raking modification.Although raking converged in 15 iterations, warnings were produced. Control total inputs summed to different values with the household level variables number of adults and income having control totals that summed to 1207415, while the remaining individual level variables had control totals that summed to 2294898. Another warning stated that control totals for the number of adults and income, i.e., the household level variables, did not match the targets. Table 5 provides the details, with these problematic values shown in bold, italicized red. As shown in Table 5, the marginal proportions have been reproduced perfectly, meaning that the overall scale is the problem. The scale issue is an artifact of the raking implementation in CITATION Kolenikov2014 \l 1033 (Kolenikov, 2014) where the scale of the weights is determined by the last raking variable. In this case, the last variable was age group, which is an individual level variable, and the weights inherited this variable’s scale overall. Had the last raking variable been a household level variable with control totals summing up to the number of households, we may have observed the reverse, with household targets matching both in absolute and relative terms, and individual targets being missed in absolute terms (but accurate in terms of the marginal proportions).Table 5. Household weights from Approach 3.VariableCategoryPopulation totalPopulation %Weighted countWeighted %Households1207415100%Household size1 (one adult)38847032.17%392084.2832.17%Household size2 (two adults)62935352.12%635208.5352.12%Household size3 (three adults)13180110.92%133027.2910.92%Household size4 (four or more adults)577914.79%58328.704.79%Household income1 Under $20,00022467718.61%226767.4018.61%Household income2 $20,000–under $40,00025235620.90%254703.9320.90%Household income3 $40,000–under $65,00024997820.70%252303.8020.70%Household income4 $65,000–under $100,00021940818.17%221449.3718.17%Household income5 $100,000–above26099621.62%263424.3021.62%Individual level weights produced weighted distributions that matched the control totals within numeric accuracy, and results for them are not reported.DiscussionTable 6 reports summary statistics for the raked weights with problematic values shown in bold, italicized red. Table 6. Weight summary statistics.Approach 1Approach 2Approach 3StatisticHousehold PersonHousehold PersonHousehold PersonMean1061.932018.381066.582018.381071.812018.38Total120741522948981212703.722948981218648.82294898Min902.03619.75607.45607.45628.71628.71Max1422.419170.152447.729790.872225.128900.47Standard deviation95.411133.19244.671169.59238.841120.75Apparent DEFF = 1 + CV21.0081.3151.0531.3361.0501.308As mentioned in Section 4, household weights from Approach 2 are not sufficiently accurate. Table 6 shows that their sum does not match the population total number of households. While this problem can be easily corrected with rescaling, Section 4 also reported that the household proportions could not be matched with these weights, which is more problematic.Although the variability of individual level weights is comparable across the three methods, the household weights from Approach 1 appear underdispersed compared to the other two methods. Their apparent design effect is also implausibly small. Clearly, these weights, unlike the household weights from Approaches 2 and 3, do not benefit from the non-response adjustments afforded by the person-level characteristics. As expected, they do not correct the sample enough, and estimates of household characteristics based on them are likely to be biased. The individual level weights are slightly less variable in Approach 3, but it is difficult to say whether this result is generalizable.In this simple, controlled simulation setting with a known response mechanism and calibration variables that are a superset of the variables determining non-response, it is reasonable to expect that perfect convergence can be achieved if one is theoretically possible. Thus any deviations from the fully accurate representation of the population figures should be seen as problematic. Approaches that do not perform well in this setting should be expected to produce greater biases in real world applications. From this point of view, the limited evidence of this example suggests that Approach 3 (raking at a higher household level with household size expansion multipliers) provides the most accurate results. Its immediate application only missed the scale of household weights while reproducing all marginal proportions exactly. Approach 3 was used in calibrating the final survey weights for the Wave 3 of the National Survey of Children’s Exposure to Violence (NatSCEV III) CITATION Finkelhoretal09 \l 1033 (Finkelhor, Turner, Ormrod, & Hamby, 2009). NatSCEV is the most comprehensive national survey of the incidence and prevalence of children’s exposure to violence in the U.S. Each of the three repeated cross-sectional surveys has been conducted with computer-assisted telephone interviewing (CATI). NatSCEV III used a multiple frame design that included cell and landline RDD frames, an ABS frame, a listed landline frame, and a pre-screened probability sample of households with children. In this survey, the weights were calibrated to a mix of the household level variables (landline and cell phone use, household size, income), parent level variables (education, employment status), and child level variables (age, gender, race and ethnicity).As a conclusion and take-away, Table 7 summarizes the main features of all three methods. Bold entries signify unique features of a given approach. Approach 2 requires access to microdata needed to define the targets for household level variables in terms of individual targets. In case the official statistics, tools from the national statistical offices do not produce these targets (number or proportion of people who live in households with specified characteristics). Approach 3 is only implementable if the raking software can use population totals (rather than proportions) as controls, and if it can use multipliers (household size) rather than simple sums of weights in a category to adjust raked weights. Whereas all three methods seem to deal with individual level data without any issues, household level weights had unique quirks in each of the methods. Approach 1 does not seem to move the household weights enough, and they fail to incorporate information that is contained in the individual level variables that drive individual level weights in the other methods. Approach 2 missed some of the targets, both in absolute and relative terms. Approach 3 missed some of the targets in absolute terms, but provided accurate representation of proportions, meaning that a final pass through these weights to bring them to the right scale is called for.Table 7. Information and software requirements for the three approachesApproach 1:two stages of rakingApproach 2:individual weights using expanded HH targetsApproach 3:HH weights using multipliersSources of HH targets:Aggregated tablesMicrodataYesYesNoYesYesYesSources of individual targets:Aggregated tablesMicrodataYesYesYesYesYesYesRaking software capabilitiesRaking to proportionsRaking to totalsRaking using multipliersYesYesN/AYesYesN/AN/AMust supportMust supportPerformanceExact HH targets *Exact individual targets *Yes ?YesNoYesYes ?Yes* Assuming raking converges? Non-response biases in household weighted analysis may remain if non-response is due to individual level variables? Sensitive to the order of variables in raking. While the weighted totals were different from their targets, a scaling correction can bring the targets to the right level.While this paper provides a very limited analysis of three feasible options, unanswered questions remain. One of the most important questions that this paper does not address is variance estimation. Even in software that fully supports correct variance estimation with complex survey data and probability weights, na?ve standard error computations may produce standard errors that are too small if the weights are treated as probability weights fixed by the design. In reality, the final weights produced by raking are random. They depend on the random unit response process, and they correct for random (although often systematic) imbalances in the sample compared to the population structure. This extra randomness must produce higher sampling variances. At the same time, calibration procedures produce more efficient estimates under no non-response CITATION DevilleSarndal1992 \l 1033 (Deville & Sarndal, 1992). The complex interplay of these opposite effects on the ultimate survey error needs to be properly accounted for. The most common practice is to produce replicate weights CITATION Shao96 \l 1033 (Shao, 1996) where non-response adjustment and calibration procedures are carried out within each replicate.A common practice in practical weight production is weight trimming, where extremely large weights are decreased to reduce their influence, and extremely small weights are increased so that the corresponding observations contribute non-negligible amount of information to the final figures. Trimming is aimed at increasing the effective sample size by reducing weight variability. However, this reduction comes at a price of increasing biases. Overall, the effect on the mean squared error of the estimates is unclear. Moreover, the effect of trimming as a source of bias in the context of weighting at multiple levels is also unclear.Appendix: Stata codeThis appendix provides the complete Stata code used in the above examples. It assumes that the ACS data with the necessary variables have been downloaded on the reader’s computer. ACS data in Stata format can be downloaded from website at Minnesota Population Center, CITATION Rug10 \l 1033 (Ruggles, Alexander, Genadek, Goeken, Schroeder, & Sobek, 2010). The raking package by CITATION Kolenikov2014 \l 1033 (Kolenikov, 2014) can be downloaded from ; the exact link can be found by typing findit ipfraking and following the instructions inside Stata.* (i) preambleversion 12clearcapture log closelog using SMIF-Weighting-Kolenikov-Multiple-Levels, replaceset seed 112233* (ii) load the ACS 2012 datause ACS2012_mult_level_raking_example* unique identifiersisid serial pernum* adult non-institutionalized populationkeep if age >= 18 & relate <= 12* (iii) intermediate variablesbysort serial (pernum): gen int num_adults = _Nlabel variable num_adults "HH size (# of adults)"recode num_adults (1=1) (2=2) (3=3) (4/20=4 "4 or more"), gen(num_adults4)label variable num_adults4 "HH size (# of adults), capped at 4"recode race (1=1 "White only") (2=2 "Black/African American only") ///(3/9=3 "Other"), generate(race3)label variable race3 "Race, 3 categories"recode educd (1/61=1 "Below high school") (63/64=2 "High school/GED") ///(65/99=3 "Some college") (100/110=4 "Bachelor's degree") ///(111/116=5 "Graduate/professional degree"), gen( educ5 )label variable educ5 "Education, 5 categories" recode hhincome (-20000/19999=1 "Under $20,000") ///(20000/39999=2 "$20,000 to under $40,000") ///(40000/64999=3 "$40,000 to under $65,000") ///(65000/99999=4 "$65,000 to under $100,000") ///(100000/1e9=5 "$100,000 and above"), gen( hhincome5 )label variable hhincome5 "Household income, 5 categories"recode age (18/29=1 "18-29") (30/44=2 "30-44") (45/54=3 "45-54") ///(55/64=4 "55-64") (65/100=5 "65+"), gen(age5)label variable age5 "Age, 5 categories"* (iv) exclude renters, non-inmates, and HH without the HH headdrop if num_adults == 1 & relate == 12bysort serial (relate pernum): gen byte _weird = !(relate[1] == 1)drop if _weirddrop _weirdbysort serial (relate pernum): assert relate[1] == 1total num_adults* if relate==1matrix list e(b), f( %12.2f)* (v) review the analysis variablesd serial pernum relate hhincome num_adults age sex race educd ///num_adults4 hhincome5 race3 educ5 age5* (vi) checksforeach x of varlist hhincome num_adults age sex race educd ///num_adults4 hhincome5 race3 educ5 age5 {assert !mi(`x')}* (vii) define calibration targets for ipfrakinggen byte _one = 1svyset _n [pw=_one]* HH targetsforeach x of varlist num_adults4 hhincome5 {* scaled at household level: # of households of this kindtotal _one if relate==1, over(`x', nolab)matrix ACS2012_hh_`x' = e(b)matrix rownames ACS2012_hh_`x' = `x'* scaled at individual level: # of people who live * in the households of this kindtotal _one, over(`x', nolab)matrix ACS2012_per_`x' = e(b)matrix rownames ACS2012_per_`x' = `x'* for later tabulationssvy : tab `x' if relate==1, count format(%10.2f) seest store hh0_`x'_countsvy : tab `x' if relate==1, format(%6.4f) seest store hh0_`x'_prop}* individual targetsforeach x of varlist sex race3 educ5 age5 {* scaled at individiual leveltotal _one , over(`x', nolab)matrix ACS2012_per_`x' = e(b)matrix rownames ACS2012_per_`x' = `x'* scaled at HH level: use num_adults4 as multipliermatrix ACS2012_hhsc_`x' = e(b)matrix rownames ACS2012_hhsc_`x' = `x'matrix coleq ACS2012_hhsc_`x' = num_adults4* for later tabulationssvy : tab `x', count format(%10.2f) seest store per0_`x'_countsvy : tab `x', format(%6.4f) seest store per0_`x'_prop}* (viii) sample* seed was set on top of the file* sample householdsgen byte hh_head = relate == 1gen rr = uniform()sort hh_head rr serial pernumassert hh_head == 1 if _n > _N - 5000gen byte sampled_hh = (_n > _N - 5000)bysort serial (relate) : replace sampled_hh = sampled_hh[1]* sample one person per householdgen rr2 = uniform()bysort serial (rr2 pernum): gen byte sampled_person = (_n==1) * sampled_hhcount if sampled_person == 1assert r(N) == 5000count if hh_head == 1gen baseweight_hh = r(N)/5000* the actual data sampling stepkeep if sampled_person* remove intermediate sampling variablesdrop sampled* rr* hh_headgen baseweight_per = baseweight_hh * num_adults4* (ix) non-responsegen response_propensity = 1/(1+exp( -( -0.3 + 0.25*1.race3 ///- 0.4*1.educ5 + 0.1*4.educ5 + 0.3*5.educ5 - 0.1*ln(hhincome+20000))))gen byte responded = (uniform() < response_propensity)sum respondedscalar response_rate = r(mean)* (xi) approach one: two passes, produce HH weights first, * then produce person weightsgen weight1_hh_base = baseweight_hh / scalar(response_rate) if respondedipfraking [pw=weight1_hh_base], gen( weight1_hh_final ) meta ///ctotal(ACS2012_hh_num_adults4 ACS2012_hh_hhincome5) gen weight1_per_base = weight1_hh_final * num_adults4ipfraking [pw=weight1_per_base], gen(weight1_per_final) meta ///ctotal(ACS2012_per_sex ACS2012_per_race3 ACS2012_per_educ5 ACS2012_per_age5) * (xii) approach two: single pass, produce individual weights * with person-level targetsgen weight2_per_base = baseweight_per / scalar(response_rate) if respondedipfraking [pw=weight2_per_base], gen(weight2_per_final) meta ///ctotal(ACS2012_per_num_adults4 ACS2012_per_hhincome5 ///ACS2012_per_sex ACS2012_per_race3 ACS2012_per_educ5 ACS2012_per_age5) gen weight2_hh_final = weight2_per_final / num_adults4* (xiii) approach 3: single pass, produce HH weights with multipliersgen weight3_hh_base = baseweight_hh / scalar(response_rate) if respondedipfraking [pw=weight3_hh_base], gen( weight3_hh_final ) meta ///ctotal(ACS2012_hh_num_adults4 ACS2012_hh_hhincome5 ///ACS2012_hhsc_sex ACS2012_hhsc_race3 ACS2012_hhsc_educ5 ///ACS2012_hhsc_age5) gen weight3_per_final = weight3_hh_final * num_adults4* (xv) unified checksforvalues k=1/3 {* household variablessvyset [pw=weight`k'_hh_final]foreach x of varlist num_adults4 hhincome5 {svy : tab `x', count se format(%10.2f) deffest store hh`k'_`x'_countsvy : tab `x', se format(%6.4f) deffest store hh`k'_`x'_prop}* individual variablessvyset [pw=weight`k'_per_final]foreach x of varlist sex race3 educ5 age5 {svy : tab `x', count se format(%10.2f) deffest store per`k'_`x'_countsvy : tab `x', se format(%6.4f) deffest store per`k'_`x'_prop}}foreach x of varlist num_adults4 hhincome5 {est tab hh*_`x'_count, b(%10.2f) modelwidth(15)est tab hh*_`x'_prop, b(%6.4f) modelwidth(15)}foreach x of varlist sex race3 educ5 age5 {est tab per*_`x'_count, b(%10.2f) modelwidth(15)est tab per*_`x'_prop, b(%6.4f) modelwidth(15)}foreach x of varlist weight*final {sum `x'di r(sum)di 1 + ( r(sd) / r(mean) )^2}* (xxx) all donelog closeexitBibliography BIBLIOGRAPHY Deming, E., & Stephan, F. (1940). On a Least Squares Adjustment of a Sampled Frequency Table When the Expected Marginal Totals are Known. Annals of Mathematical Statistics, 11(4), 427-444.Deville, J., & Sarndal, C. (1992). Calibration estimators in survey sampling. Journal of the American Statistical Association, 87(418), 376-382.Finkelhor, D., Turner, H. A., Ormrod, R. K., & Hamby, S. L. (2009). Violence, abuse, and crime exposure in a national sample of children and youth. Pediatrics , 124(5), 1-14.Kolenikov, S. (2014). Calibrating survey data using iterative proportional fitting (raking). The Stata Journal, 14(1), 22-59.Korn, E., & Graubard, B. (1999). Analysis of Health Surveys. New York, USA: Wiley.Kott, P. S. (2006). Using calibration weighting to adjust for nonresponse and coverage errors. Survey Methodology, 32(2), 133-142.Kott, P. S. (2009). Calibration Weighting: Combining Probability Samples and Linear Prediction Models. In D. Pfeffermann, & C. Rao (Eds.), Handbook of Statistics: Sample Surveys: Inference and Analysis (Vol. 29B, pp. 55-82). Amsterdam, The Netherlands: Elsevier.Ruggles, S., Alexander, J., Genadek, K., Goeken, R., Schroeder, M., & Sobek, M. (2010). Integrated Public Use Microdata Series: Version 5.0 [Machine-readable database]. Retrieved June 13, 2014, from , J. (1996). Resampling methods in sample surveys. Statistics, 27, 203-254.StataCorp. LP. (2011). Stata: Release 12. Statistical Software. College Station, TX, USA. ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download