National Center for Education Statistics (NCES) Home Page ...



Weight, Weight, Please Tell Me!(Even for people who have never really used SPSS!)Gregg Bell, Ph.D.The University of Alabamagbell@cba.ua.eduPresented at the 2013 NCES 26th Annual Management Information Systems (MIS) Conference.February 13, 2013Working paper, not for quotation or publication, please!Weight, Weight, Please Tell Me!(Even for people who have never really used SPSS!)Presented at the 2013 NCES 26th Annual Management Information Systems (MIS) Conference.Working paper, not for quotation or publication, please!Abstract:Weights are essential to the correct analysis of almost every NCES data file. This short course will explain a little of the why and a lot of the how with regards to the use of weights with publicly available education data. Particular attention will be paid to the National Household Education Survey (NHES) and examples will be presented using a variety of software, including both popular commercial software and freeware.Introduction: A simple example of weights:Weights are generated to compensate for different response rates for different groups when it is known (or suspected) that members of different groups respond similarly to other members of the group, but differently from members of other groups.If gender, for instance, is considered a potential source of bias, the researcher may use gender as a weight generating factor.For example, if we know that the male/female breakdown for the population is 50/50 and the sample we collect has response rates of 83% for males and 75% for females, then sample weights could be calculated as:1/.83 = 1.204819 and 1/.75 = 1.333333 So for a population of 10,000 people, we may assume that the population actually includes approximately 5,000 males and 5,000 females.If, for our sample, we get 4,150 males and 3,750 females, adjust these using the weights to get4150*1.204819 = 4999.9993750*1.333333 = 4999.999Weights are always created as multiplicative factors so they can be used with other factors in a larger weighting scheme.Consequences of Ignoring WeightsA simple grade prediction model will demonstrate that differences may be observed when probabilistically sampled data are analyzed in three different ways. As a foil to this demonstration, an ordinary least squares regression model will be created and used to predict students’ grades using independent variables for race, income and television. The model used to illustrate the impact of weights is: Grades = Race, Income, TVIn the first analysis, no weights will be used. In the second analysis, the only Final Child weight will be used. In the third analysis, both the Final Child weight and the replicate weights will be used (this is the correct analysis). AM Statistical Software performs the analyses using ordinary least squares regression for the first two analyses and a Jackknife method for the last, replicated regression analysis. AM Statistical Software Beta Version 0.06.03. (c) was developed by The American Institutes for Research (A.I.R.) and Jon Cohen.The DataThe analyses use data from the National Household Education Surveys Program of 2007, Parent and Family Involvement in Education Survey (PFI-NHES:2007).The model Grades = Race, Income, TV will be specified using the following variables:SEGRADES: Overall, what are the child’s grades across all subjects?ResponseValueMostly A’s1Mostly B’s2Mostly C’s3Mostly D’s4CBLACK – Is the child Black or African American?ResponseValueYes1No2TVWKDYNU – How much time does the child spend watching television or videos on a typical weekday? ResponseValue1-161-16HINCOME – What is the total household income?ResponseValue$5,000 or less1$5,001-$10,0002$10,001-$15,0003$15,001-$20,0004$20,001-$25,0005$25,001-$30,0006$30,001-$35,0007$35,001-$40,0008$40,001-$45,0009$45,001-$50,00010$50,001-$60,00011$60,001-$75,00012$75,001-$100,00013Over $100,00014Note: the following results are generated solely for the purpose of demonstrating the differences observed when data of this type are analyzed in different ways. The reader should attempt no further interpretation of the models presented here as none of the underlying assumptions of the models has been checked.The AM output for the three analyses is:Model: SEGRADES = CBLACK HINCOME TVWKDYNURegression: No WeightsParameterEstimateSEtp > |t|Constant2.8600.08135.1200.000CBLACK -0.0780.042-1.8790.060HINCOME -0.0220.004-5.4360.000TVWKDYNU0.4280.01823.6080.000Regression: Final Weight OnlyParameterEstimateSEtp > |t|Constant2.9130.11824.6110.000CBLACK-0.0510.059-0.8630.388HINCOME-0.0330.006-5.9420.000TVWKDYNU0.3940.02416.1540.000Replicated Regression: All WeightsParameterEstimateSEtp > |t|Constant2.9130.13022.4340.000CBLACK-0.0510.063-0.8070.422HINCOME-0.0330.005-6.3240.000TVWKDYNU0.3940.02316.9220.000AM Statistical Software Beta Version 0.06.03. (c) The American Institutes for Research and Jon CohenFor simplicity, define the three weighting levels as:Level 1 – No weights are used.Level 2 – Only Final Weight is used.Level 3 – Both Final Weight and Replicate Weights are used.The value in the table below represents the weighting levels that produce a change when moving from the previous level. For instance, a value of “2” indicates that a change is noted when the method of analysis moves from level 1 (no weights) to level 2 (final weight only). A value of “2, 3” indicates a change for that statistics when moving among all 3 levels of weighting.ParameterEstimateSEtp > |t|Constant22, 32, 32, 3CBLACK222, 32, 32, 3HINCOME22, 32, 32, 3TVWKDYNU22, 32, 32, 3Summarizing, as we move from the use of no weights to the use of only the final weight to the use of both the final weight and the replicate weights (the correct analysis), the regression coefficients change with the use of the final weight, but remain the same when the replicate weights are used. This demonstrates that parameter estimates can be correctly calculated using only the final weight. The final weight alone does not, however, allow for the correct calculation of the standard error necessary for inference. This requires some form of variance estimation such as replication.Using Data from Complex SamplesThe first task with any database is to determine if weights are needed in the analysis. If weights are required, we must then determine what type of weights are involved. If the only weights required are cell weights (aka “final weight,” “child weight,” “household weight,” “hospital weight,” etc.) then the analysis can be done in SPSS or any other standard statistical package that allows weights. If, in addition to the final weight, the sample requires replicate weights (which, in my experience, are always called “replicate weights”) specialized software such as SAS, Wesvar, AM, or Stata must be used. An Example of Data Analysis using NHES data.Before you begin: Create a folder named “datafiles” on your “C:\” drive.Download the AM Statistical SoftwareTo Download the AM Software:Point your browser to the registrationClick on “Download”Login with your newly created passwordDownload the “AM Software Version: 0.06.04 Beta”Download the “AM’s transfer component” Once these preliminary tasks are complete, we can begin the process of correctly analyzing the data.The Big StepsDownload the data fileBring the data into SPSS for cleaningMove the data to AMPerform the correct analysisDownloading the DataTo download this (and much more) data, point your web browser to: For our purposes today, hover over “Surveys and Programs” and then choose National Household Educational Survey (NHES). Now choose “NHES:2007 data now available.” This will take you to the download page for not only the 2007 data, but also earlier surveys as well. We will use the Parent and Family Involvement in Education (PFI) file. Click on the “PFI data” link and agree to the terms of service. The folder will now download.Open the newly downloaded folder and move the file to the “C:\datafiles” folder you created earlier. Bring the data into SPSSOpen SPSS and click “Cancel” on the first screen that appears.Now, back at the web browser, click on the SPSS setup file.When the screen below appears:Click in it (to activate it) “Select all” using Ctrl-ACopy the entire document using Ctrl-CActivate the SPSS window by clicking in it.Paste this text into a new syntax file.Scroll to the top of the SPSS screen (where you just pasted the syntax code) and change the “D:\” to “C:\”Now that you have created the syntax file, the next step is to run it. To do this, click anywhere in the syntax file and then “select all” by pressing Ctrl-A. This will highlight the entire syntax window (shown below). Now click on the green arrow to run the program. The program will run for a few seconds. The SPSS spreadsheet should now be filled with data. The next step is to clean the data of missing value codes which will otherwise masquerade as response data. In order to do this we must refer to the codebook. The codebook is located on the same web page as the data downloads and SPSS Setup files. Scroll down to the User’s Manual section and click on PFI Codebook.This will open a searchable .pdf file. To open the search window, click in the codebook window to activate it and then press Ctrl-F. When the search window appears, enter the first variable we want to investigate, “SEGRADES.” Press enter to search for the variable. The codebook for SEGRADES reveals that only codes 1 through 4 actually indicate the students’ grades. Codes “5” and “-1” indicate that either the child’s school does not give these letter grades or the data value is missing. In either case, we do not want the values “5” nor “-1” to be included in our analysis. Continue to search the codebook for the rest of the variables that will be used in the analysis and note the values of each variable that need to be deleted. Note that you need to scroll to the top of the codebook for each search. This ensures that the entire document is searched. It is also helpful to sometimes search for only part of the variable name. For instance, better results may be found using the search term “grades,” rather than “segrades.”To delete the unwanted values using SPSS, choose “Select Cases” under the “Data” menu.Choose “If condition is satisfied” and then click on the blue “If…” button.Type in the if condition based on the values you chose to delete for each variable. For our example we want to delete data for which SEGRADES = 5, SEGRADES = -1, and TVWKDYNU = -1.Click “Continue.”Choose “Copy selected cases to a new dataset” and name the new dataset “NoMissings.”Click “OK.” Under the “File” menu, choose “Save As” and save the new dataset to the “datafiles” folder you created earlier. Close the SPSS windows. This will exit SPSS. Open the AM software by double clicking on it (the icon may be named “AMbeta”).Now choose “File, Import, SPSS.sav file.”Locate the “NoMissings” dataset and open it using AM.Your screen should look like this:For the purposes of this demonstration we want to create a regression model that predicts grades using race and television. Specifically, the model we will create will be:SEGRADES=CBLACK, HINCOME, TVWKDYNUThe data are the result of a complex sample. Thus, we must determine the weights to be used in the correct analysis. In this case, both the final child weight and the replicate weights must be used. In general, the following search terms will produce results: “JK,” “BRR,” “Fay,” and “Taylor,” and of course, “weights.”Search the User’s Manual, Volume 1Click in the User’s Manual window to activate it. Press Ctrl-F to search the document. A search of the term “JK” finds the following paragraph which tells us that the replication method should be specified as “JK1.” Hold on to that information, we will use it a bit later.Since the data uses replication weights, we will need to use replication methods for our analysis. Back in the AM software, choose Statistics, Replication Procedures for Basic Statistics, Regression. To specify the model, first search for each variable using the search window at the bottom of the screen. Once the variable is located, right click on it and drag the variable from the variable list to the appropriate box. Once the model variables are all in their places, drag the weights to their boxes. Note: in most databases, the weights are at the end of the variable list. It is easier to simply scroll down to weights and drag them over rather than searching for them individually.Choose the “JK1” option for the Replication Method.Choose Plain Text Output or Spreadsheet Output for the output options. Note: both of these options suffer from formatting issues, but the spreadsheet option suffers much less than the plain text option.Click “OK” OUTPUT! REFERENCESAM Statistical Software. (2011). Manual. Retrieved July 1, 2011, from , W. S., & Belfield, C. R. (2006). Early childhood development and social mobility. The Future of Children, 16(2), 73-98.Brick, J.M., Waksberg, J., Kulp, D., & Starer, A. (1995). Bias in list-assisted telephone samples. Public Opinion Quarterly, 59(2), 218–235.Brogan, D. J. (1998). Pitfalls of using standard statistical software packages for sample survey data. Retrieved on July 1, 2011, from , R.J., & Lepkowski, J.M. (1993). Stratified telephone survey designs. Survey Methodology, 19(1), 103–113.Conway, S. (1982). The weighting game. Paper presented at the Market Research Society Conference, Metropole Hotel, Brighton.Deming, W.E., & Stephan, F.F. (1940). On a least square adjustment of a sampled frequency table: When the expected marginal totals are known. Annals of Mathematical Statistics, 11, 427–444.Deming, W. E. (1943). Statistical adjustment of data. New York: Wiley.Dorofeev, S., & Grant, P. (2006). Statistics for real life sample surveys: Non-simple-random samples and weighted data. New York: Cambridge University Press.Efron, B. (1979). Bootstrap methods: Another look at the jackknife. Annals of Statistics, 7, 1-26.Hagedorn, M., Roth, S.B., O’Donnell, K. Smith, S., & Mulligan, G. (2008). National Household Education Surveys Program of 2007: Data File User’s Manual, Volume I. (NCES 2009-024). Washington, DC: National Center for Education Statistics, Institute of Education Sciences, U.S. Department of Education. Hagedorn, M., O’Donnell, K., Smith, S., & Mulligan, G. (2008). National Household Education Surveys Program of 2007: Data File User’s Manual, Volume III, Parent and Family Involvement in Education Survey.(NCES 2009-024). Washington, DC: National Center for Education Statistics, Institute of Education Sciences, U.S. Department of Education.Hagedorn, M., Roth, S. B., Carver, P., Van de Kerckhove, W., & Smith, S. (2009). National Household Education Surveys Program of 2007: Methodology Report. (NCES 2009-047). Washington, DC: National Center for Education Statistics, Institute of Education Sciences, U.S. Department of Education. Kim, J. K., Navarro, A., & Fuller, W. (2000). Variance estimation for 2000 Census Coverage Estimates. Proceedings of the Survey Research Methods Section, American Statistical Association, Alexandria, VA.Kish, L. (1995). Questions/answers from The Survey Statistician, 1978-1994. Paris: The International Association of Survey Statisticians, Section of the International Statistical Institute.Kish, L. (1965). Survey sampling. New York: John Wiley & Sons.Kim, J, Murdock, T. & Choi, D. (2005). Investigation of parents' beliefs about readiness for kindergarten: An examination of National Household Education Survey (NHES: 93). Educational Research Quarterly, 29(2), 3-17.Korn, E. L., & Graubard, B. I. (1995). Examples of differing weighted and unweighted estimates from a sample survey. The American Statistician, 49, 291-295. Landis, R.J., Lepkowski, J.M., Eklund, S.A., & Stehouwer, S.A. (1982). A statistical methodology for analyzing data from a complex survey: The first national health and nutrition examination survey (DHHS Pub. No. 82-1366). Vital and Health Statistics, Series 2, No. 92. Washington, DC: National Center for Health Statistics. Moser, C. A., & Kalton, G. (1971). Survey methods in social investigation (2nd ed.). London: Heinemann Educational Books. Quenouille, M. (1949). Approximation tests of correlation in time series. Journal of the Royal Statistical Society B, 11, 18-84.Shao, J., & Tu, D. (1995). The jackknife and bootstrap, Springer series in statistics. New York: Springer-Verlag.Sharot, T. (1986,) Weighting survey results. Journal of Market Research Society, 28(3), 269-284.Wild, C. J., Seber, & George, A. F. (2000). Chance encounters: A first course in data analysis and inference. New York: John Wiley & Sons. ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download