D32ogoqmya1dw8.cloudfront.net



Department of Sociology and AnthropologyCarleton CollegeStatistical Tools for Quantitative ReasoningPeter D. BrandonSoan 280Leighton Hall 229 Winter Term 2009Phone: 222-7199Email: pbrandon@carleton.eduDATA SOURCES FOR ASSIGNMENTS, FINAL PAPERS, AND TEAM POSTERSNotes: *You cannot use the data sets for assignments for either final papers or team posters. *You cannot use the data set your team uses for the poster for your final paper. *The hints are guides only you can make other discoveries from which to argue.DATA SETS FOR TEAM POSTERS OR FINAL PAPERS AIDS Survival in Australia (AidsinAustralia.dta)(Hint: What factors could you assert make death less likely?)The variables include state (NSW, QLD, VIC and other), sex (M, F), date of diagnosis, date of death, status (A for alive, D for dead), transmission category (as explained in the table below), age (in completed years), died again whether died, days of survival, years of survival, identification number, exposure time to the AZT drug, and whether or not the patient received the AZT drug. Note that state, sex, status, and transmission are string variables. CodeDescription of transmissionHsmale homosexual or bisexual contacthsidas above and also intravenous drug userIdfemale or heterosexual male intravenous drug userHetheterosexual contacthaemhaemophilia or coagulation disorderbloodreceipt of blood, blood components or tissuemothermother with or at risk of HIV infectionotherother or unknownThe data were assembled in January 1992, but to allow for delays in notification of deaths the effective ending date is six months earlier. The file includes all patients diagnosed prior to July 1991, with their status as of that date. There are 2843 patients and 1761 deaths. All dates are coded as elapsed days since Jan 1, 1970, which also happens to be the way Stata stores dates. There are 29 cases that were diagnosed after death and are coded as zero survival. Divorce in America (Divorceinamerica.dta)(Hint: Are black males more likely to divorce than others?)This data set is assessed on its strength and weaknesses. There is no link to a data description for confidentiality reasons. However, it is a rich data set that still allows you to evaluate its strengths and weaknesses. The data set has 3,371 couple observations and 18 variables as described below. Variable NameVariable DescriptionidUnique respondent's idmarnumIndicator of first marriagecensorCensoring indicator (1=censored,0=divorced)hiseducHusband's education (in years of schooling)hereducWife's education (in years of schooling)heblackIndicator for whether the husband is African AmericansheblackIndicator for whether the wife is African AmericanageAge of husband (at marriage)agediffAge difference between husband and wifeduratDuration of marriage in yearsfailMarriage ends in divorcehusbandeducHusband graduated from high schoolwifeeducWife graduated from high schoolheoldHusband is at least 3 years older than his wifemixmar1 if a mixed racial marriagewhitemarrCouple are both whiteblackmarrCouple are both blackeducdiffYears of education differs by at least 2 years between spousesRecidivism (Recid.dta)(Hint: Are minorities more likely to return to U.S. prisons?)The dataset considered here is analyzed in Wooldridge (2002) and credited to Chung, Schmidt and Witte (1991). The data pertain to a random sample of convicts released from prison between July 1, 1977 and June 30, 1978. Of interest for our purposes is whether they returned to prison. The information was collected retrospectively by looking at records in April 1984, so the maximum possible length of observation is 81 months. To learn more about these data and to evaluate the data source, refer to the citations above. Black=1 if blackAlcohol=1 if alcohol problemsDrugs=1 if drug historySuper=1 if release supervisedMarried=1 if married when incarc.Felon=1 if felony sentenceWorkprg=1 if in N.C. pris. work prg.Property=1 if property crimePerson=1 if crime against personPriors# prior convictionsEducyears of schoolingRules# rules violations in prisonAgein monthsTservedtime served, rounded to monthsFollowlength follow period, monthsDuratmax(time until return, follow)backinprison=1 if duration right censoredWomen and Partnering Arrangements (Livingarrangement.dta)(Hint: Poor white women are the most likely to cohabit than any other group of people?)The dataset considered here is drawn from the 2001 Survey of Income and Program Participation, Wave 1 data. Go to the Census Bureau to learn more about the strengths and weakness of the SIPP and its nature. In this data set there are 23 variables and 5,925 observations of adult men and women. The SIPP is a rich source of data. Again, please go to the Census Bureau website to learn more about the SIPP; it is well documented! Variable nameVariable labelnkid014Number of kids in house age 0 to 14nkid017Number of kids in house age 0 to 17idPerson identifier base_sex1 if sex = base_ageAge measured in yearsbase_raceRace of respondentsbase_msMarital status of respondentbase_stateState residing inemp_statusEmployment statusemp_njobsNumber of jobs currently workingemp_disabPerson is unable to work due to disabilityhh_hnfnumber of families in householdhh_fnpnumber of persons in familyhh_fkindKind of familyhh_fnkidsNumber of children in householdhh_htenurehousing tenurehh_hpubhsyesno1 or 0 living in public housinginc_htotinctotal household incomeinc_hpovyesno1or 0 household poverty thresholdinc_hIsPovyesno1or 0 household is in povertyinc_hRelPovrelpov relative household poverty statuseduc_highQual1Highest level of educationcohabitingCohabit rather than marryAllocation of time to house work (Timeuse.dta)(Hint: Is there gender equity in the allocation of time to household work?)This time use dataset is from Australia. It is the 1997 Australian Time Use Study. It is a famous data set that can be used to examine time allocations of Australian, i.e., how they spend their time. Go to the Australian Bureau of Statistics website more many details about this data set. In this data set there are 19 variables and 4,926 observations of adult men and women. VariableVariable labelrandidprandom identifier at person levelfamtypefamily typeareacdcapital city and balance of statesexsex of personmarstatmarital statuscoubirthBirthplacecurstucurrently studyinganybenReceived gov’t pensions, benefits or allowancetotincweekly incomefulstatfull-time/part-time statusstatworkstatus in employmentuhwhours worked per weekhoutypehhousehold structurefamsunumber families in householdpersnumber of persons including childrendepkidsnumber of dependants in householdemphhlabour force status of reference personempsphhlabour force status of spousepinddom2Average time spent doing houseworkOut-of-School Child Care for School-Aged Children (SchAgechildcare.dta)(Hint: Do women single mothers leave their children to care for themselves after school?)The dataset considered here is drawn from the 2001 Survey of Income and Program Participation. It is from the Child Care module data. Go to the Census Bureau to learn more about the strengths and weakness of the SIPP and its nature. In this data set there are 30 variables and 43,388 observations of women with dependent children. The SIPP is a rich source of data. There are many questions you could ask as well as the one above. Because this is a larger data set with more variables, please open the data set to examine the variables in the file. Labor unions in America (CPS88.dta)(Hints: Do union workers earn premium wages? What determines union membership in America?)CPS88 comes from the Current Population Survey. It is a random sample, with replacement, of 1,000 observations from a sample of males with non-missing information on all the 11 variables in the data set. VariableVariable labelAGE(you know what) LNWAGELog of wage OCC1Dummy variable for occupational category (see below)IND1Dummy variable for industrial category (see below)UNION1 if union member, 0 otherwiseGRADEhighest educational grade completedMARRIED1 if married, 0 otherwisePARTT1 if part-time worker, 0 otherwisePOTEXPYears of potential experienceEXP2POTEXP squaredWEIGHTSampling weightHIGH"Highly" unionized industry (IND1 equals 1,2,3,4,5,10,11, or 14)Categories for OCC1 are:Categories for IND1 are:Categories for IND1 are: 1 Managers and administrators1 Natural resources9 Finance, Insurance, Real Estate 2 Professionals2 Durables10 Education 3 Nurses and other non-doctors3 Non-durables11 Health and Welfare 4 Clerical4 Construction12 Business services 5 Sales people5 Transportation13Personal and other services 6 Service workers6 Communication and utilities 7 Manual workers7 Wholesale trade 8 Craft workers8 Retail tradeAffording non-parental child care (Childcare.dta)(Hints: Do more educated mothers spend more on non-parental child care?)The dataset considered here is drawn from the 1996 Survey of Income and Program Participation. It is from the Child Care module data. Go to the Census Bureau to learn more about the strengths and weakness of the SIPP and its nature. In this data set there are 30 variables and 3, 242 observations of mothers with dependent children. Some use non-parental child care. Note that the variable labels are not on the file. You can add them easily in STATA. VariableVariable labeltotsibsTotal number of siblings in the familyfamincFamily incomeeduksunMother’s educationavccpceAverage price of child care per hourhersalreMothers monthly salaryfamsizeNumber of persons in familyraceRace of motherpovlevWhether below poverty or notdumysthLive in SouthdumywstLive in WestdumymwsLive in Mid-WestdumynreLive in North-EastherwkhrsMother’s weekly work hourschcrcsMonthly expenditure on hours of child caretotcchrsTotal number of hours per month spend in non-parental child careB:DATA SETS FOR ASSIGNMENTSB.1.a CPS78.dtaThis data set consists of 550 randomly selected employed workers from the May 1978 current population survey conducted by the U.S. Department of Commerce. This is a survey of over 50,000 households conducted monthly, and it serves as the basis for the national employment and unemployment statistics. Data are collected on a number of individual characteristics as well as employment status. This data extract contains information on twenty-one variables for the 552 employed workers in the sample. 1.ED = years of education2.SOUTH = 1 if lives in south3.NONWH = 1 if nonwhite4.HISP = 1 if Hispanic5.FE = 1 if female6.MARR = 1 if married with spouse present (in household)7.MARRFE = 1 if married female with spouse present8.EX = years of labor market experience (= AGE-ED-6)9.EXSQ = years of labor market experience squared10.UNION = 1 if working on a union job11.LNWAGE = natural logarithm of average hourly earnings12.AGE = age in years13.NDEP = # of dependent children under 18 in household14.MANUF = 1 if working in manufacturing industry15.CONSTR = 1 if working in construction industry16.MANAG = 1 if occupation is managerial or administrative17.SALES = 1 if occupation is sales worker18.CLER = 1 if occupation is clerical worker19.SERV = 1 if occupation is service worker20.PROF = 1 if occupation is professional/technical workerB.1.b CPS85.dta. Same principle as above. Number of Observations: 5341.ED = years of education2.SOUTH = 1 if lives in south3.NONWH = 1 if nonwhite4.HISP = 1 if Hispanic5.FE = 1 if female6.MARR = 1 if married with spouse present (in household)7.MARRFE = 1 if married female with spouse present8.EX = years of labor market experience (= AGE-ED-6) (minimum = 0 imposed ex post)9.EXSQ = years of labor market experience squared10.UNION = 1 if working on a union job11.LNWAGE = natural logarithm of average hourly earnings12.AGE = age in years13.MANUF = 1 if working in manufacturing industry14.CONSTR = 1 if working in construction industry15.MANAG = 1 if occupation is managerial or administrative16.SALES = 1 if occupation is sales worker17.CLER = 1 if occupation is clerical worker18.SERV = 1 if occupation is service worker19.PROF = 1 if occupation is professional/technical workerB.2 Whether Women Work and How Much They Get Paid (Womenandwork.dta) This is the famous Mroz data file taken from the 1976 Panel study of Income Dynamics, and is based on the data for the previous year, 1975. This data file contains 753 observations on married white women aged 30-60 in 1975 for 19 variables. The first 428 observations are those for women whose hours of work in 1975 were positive, while the final 325 observations are those for women who did not work for pay in 1975. The first variable, LFP, is a labor force participation dummy variable that equals 1 if the woman's hours of work in 1975 were positive; otherwise, it equals zero. WHRS is the wife's hours of work in 1975, while KL6 and K618 indicate the number of children in the household under age six and between ages six and 18, respectively. WA is the wife's age in years, WE is the wife's educational attainment in years of schooling, WW is the wife's 1975 average hourly earnings in 1975 dollars, and RPWG is the wife's wage reported at the time of the 1976 interview, in dollars. The HHRS variable is the husband's hours worked in 1975, HA is his age, HE is his educational attainment in years of schooling, and HW is his 1975 wage in 1975 dollars. FAMINC is the family income in 1975 dollars; hence to calculate the wife's property income, one must subtract the product of WW and WHRS from FAMINC. MTR is the wife's marginal tax rate evaluated if her hours of work were zero. MTR is taken from published federal tax tables (it excludes state and local income taxes but includes any applicable social security benefits). WMED is the wife's mother's years of schooling, and WFED is the wife's father's years of schooling. UN is the unemployment rate in the county of residence, in percentage points, while CIT is a dummy variable that equals 1 if the family lives in a large city (a Standard Metropolitan Statistical Area, SMSA); otherwise, it equals zero. Finally, AX is the wife's previous labor market experience, in years. 1. LFPA dummy variable = 1 if woman worked in 1975, else 02. WHRSWife's hours of work in 19753. KL6Number of children less than 6 years old in household4. K618Number of children between ages 6 and 18 in household5. WAWife's age6. WEWife's educational attainment, in years7. WWWife's average hourly earnings, in 1975 dollars8. RPWGWife's wage reported at the time of the 1976 interview (not the same as the 1975 estimated wage). To use the subsample with this wage, one needs to select 1975 workers with LFP=1, then select only those women with non-zero RPWG. Only 325 women work in 1975 and have a non-zero RPWG in 1976.9. HHRSHusband's hours worked in 197510. HAHusband's age11. HEHusband's educational attainment, in years12. HWHusband's wage, in 1975 dollars13. FAMINCFamily income, in 1975 dollars. This variable is used to construct the property income variable.14. MTRThis is the marginal tax rate facing the wife, and is taken from published federal tax tables (state and local income taxes are excluded). The taxable income on which this tax rate is calculated includes Social Security, if applicable to wife.15. WMEDWife's mother's educational attainment, in years16. WFEDWife's father's educational attainment, in years17. UN Unemployment rate in county of residence, in percentage points. This taken from bracketed ranges.18. CITDummy variable = 1 if live in large city (SMSA), else 019. AXActual years of wife's previous labor market experience ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download