The assignment is dueThursday 1/27/05 at the beginning of ...



The assignment is due Friday 2/20 by 5 p.m. Submit your assignment via email to evenwe@mimaioh.edu by the deadline. Name the file hw1_xx_yy – where xx and yy are the uniqueids for the two team members. Late assignments will be penalized at the rate of 20 percentage points for every day (or part thereof) that the assignment is overdue. All team members will receive the same grade unless someone convinces me that I should do otherwise. Provide a type-written response to all the questions. Paste the relevant portion of the stata log (both the stata commands and the output) beneath the relevant part of each question in this word document and then provide a type-written explanation (or leave adequate space for handwritten explanations beneath relevant stata code and results). Be sure to include enough Stata code that I can determine exactly how you generated your data, variables, and results. If I am unable to determine what you did, I will assume that it is wrong.The objective of this homework is to provide with you with a real world example where you can apply many of the results we’ve learned about OLS regression using STATA. In the process, you will also learn something about whether state or federal workers are overpaid relative to their private sector counterparts. Note that this data does not include fringe benefits so we cannot comment on their overall level of compensation without adding information about their pensions, health insurance, vacation pay, and other fringe benefits.The data sets you will use for this exercise are contained in g:\eco\evenwe\marchcps. I want you to combine three years of data (2012-2014) to assure sufficient sample size for your assigned region and public sector employee (see the append command to learn how to append data sets) and merge in the consumer price index by year (see dmerge). The CPI data is in g:\eco\evenwe\cpi\cpi_annual.dta and includes variables named cpi and year Use the CPI to convert your earnings into 2014 dollars. TeamRegion/Public sector group to analyzeTeamAssignment1Northeast/state5Northeast/local2Midwest/state6Midwest/local3South/state7South/local4West/state8West/localFor the assignment, based on the relevant variables listed in italics, restrict your sample to people employed in prior year (workyn)wage and salary workers in prior year (ljcw)workers between 21 and 60 (a_age)private and public sector employees (ljcw). Depending on your team assignment, keep only local, or state workers in your sample of public sector employees for the remainder of the assignment. The census region that your team has been assigned (gereg)The dependent variable of interest is annual earnings (ern_val) after you convert it to 2014 dollars. For control variables, create the following:Education dummies (based on a_hga)ed_11= less than a high school degree ed_12=high school degreeed_15=some college, but less than a bachelors degreeed_16 =bachelors degreeed_17=more than bachelors degreeSt_emp: a dummy variable indicating whether a worker is a state employee (ljcw) Race (prdtrace)white onlyblack onlyall other racesA dummy for Hispanic status (pehspnon)A female dummy (a_sex)A spline variable for age allowing for the effect of age to vary in the following ranges: 21-30; 30-40; 40-50; 50-60. Be sure to make the spline variables so that the earnings function is continuous in the age variable.Dummies indicating the occupation of the longest job held last year (wemocg)Number of weeks worked in prior year (wkswork)Dummies for state of residence (gestcen) [Note: this will be only the subset of states that are in the census region you have been assigned.]QUESTIONS (relevant stata routines are provided in italics.)1. (10 points)Summarize your data (sum) and verify that you don’t have missing data and that the mean/min/max values make sense. Describe any changes you made to the data based on what you observed. Estimate the sample size and the population for the two assigned groups of workers. Keep in mind that with 3 years of data, you will need to adjust sample weights before estimating the relevant population. [tabstat or table]Estimate the weighted mean of earnings for your two assigned groups of workers [tabstat or table] . The relevant weight in the March CPS is marsupwt. 2. (10 points) (Use the Box-Cox procedure to determine whether real earnings or the log of real earnings should be the dependent variable in a regression that includes all of the control variables listed in 1-10 above. Explain how you decided which functional form is appropriate. For the remainder of this problem set, use the preferred function form for earnings for any further regression analysis. Any reference to “earnings” means the preferred functional form for earnings. 3. (30 points)Estimate an unweighted regression model of earnings as a function of only a dummy for whether the person is a public sector worker. How does the coefficient on this dummy compare to the difference in the relevant unweighted measure of earnings for the two groups? Using this simple regression, show that the mean of predicted wages for the entire sample, the public sector sample, and the private sector sample matches the actual means. [see predict command for reg and use the table command to compare the means of the variables.] Repeat a-b using weighted regression. Use the first-order conditions (FOC) from OLS to demonstrate why the mean of predicted wages for the public sector must match the actual mean for the public sector in the unweighted regression. (Hint: consider FOC conditions for the coefficients on intercept and the public sector dummy for this.]Estimate a regression model of earnings using all the control variables in 1-10 above. Use matrix language in Stata to verify that “the regression line passes through the mean”. Present your results and discuss how they demonstrate the desired property. 4. (15 points) Estimate an unweighted real earnings regression with all the controls listed in 1-10 with the exception of the education dummies. Also, include a dummy variable for whether the worker is a public sector worker.Repeat (4a) but add controls for the education dummies. For future reference, this will be referred to as the complete specification from 4b. Summarize your coefficients for the public sector coefficient and coefficients on the educations dummy for specations 4a and 4b (coefficients, t-stats) in a single table using outreg2. You may copy/paste the output from outreg2 into this document. [Be sure to use outreg2 for this part of the exercise because I want you to have the ability to summarize regression results for the remainder of the semester.]Based upon what we know about omitted variables bias, why does the coefficient on the public dummy change in the observed direction when the education dummies are added to the regression? Perform an auxiliary regression to show that the bias in the coefficient on the public dummy caused by the omission of the education dummies matches the theoretical prediction. [I want you to provide a numerical estimate of the bias in the coefficient on the public dummy using the theoretical expression for the omitted variables bias.] 5. (10 points) Using the least educated group as the reference group for education and the complete specification in 4b, test the null hypothesis that the intercept in the earnings equation is identical across all education groups. Interpret the result (i.e. indicate whether the null is rejected and at what critical level). (see test command in stata.) Repeat the test in 5a using the most educated group as the reference group and compare the results with those in (a). Be as precise as possible in your comparison. Why should you have expected this?Show how the coefficients from the first specification (5a) could have been used to estimate the coefficient on the high school graduate dummy that you estimated in the second specification (5b). Explain.6. (20 points) Using the complete specification you estimated in (4b): Examine the coefficients on the age spline variables and discuss the implications of the coefficients for how the marginal effect of age on earnings changes as workers age. In particular, is the marginal effect of age rise or fall as a person moves through the age categories? Is the marginal effect negative in some ranges and positive in others? [Hint: you have to be careful to consider marginal effects with spline variables. In a given age category, the marginal effect of age could be sum of several coefficients. Be careful!] Using the complete specification in 4b as the starting point, test the null hypothesis that the effect of public sector employment is identical for men and women while allowing for different intercepts by gender but constraining all other coefficients to be equal across gender. Explain what the results tell you about the relative advantage or disadvantage of public sector employment for men versus women. [Note – interactions are useful here.]7. (10 points) Test the hypothesis that all coefficients including the intercept (using the complete specification in 4b) are equal for public and private sector employees. Discuss the implications of your test static. 8. (30 points) Use matrix commands to perform a Oaxaca-Blinder decomposition of the wage gap between private and public sector workers using the complete specification employed in 4b (without the public dummy). You may use the coefficients for the private sector to perform the decomposition of the “explained” portion. Use the results to identify how much of the wage gap between public and private sector workers can be accounted for by differences in all of the control variables? how much of the wage gap between public and private sector workers can be accounted for by differences in the level of education?Note: While there are canned routines available for doing Oaxaca-Blinder decompositions in Stata, I want you to learn how to use the matrix programming features. So you are required to use matrix (not canned routines) for this problem.Use the Oaxaca routine available in Stata to verify your results in parts a & b. Discuss how the results compare. Based on your results in a-c, is there statistical evidence that public sector workers are over- or under-paid? Discuss. Some useful tips when using matrix in stataAfter a regression, you can import the coefficient estimates into a matrix (call it beta1), the variance-covariance matrix into v1, and the matrix of means for a list of variables as follows:regress wage age;matrix beta1=get(_b); *puts coefficients in a row vector;matrix v1=get(VCE); *gets variance-covariance matrix for beta1;matrix accum xx=age, means(xbar); *puts means of age in a matrix called xbar; (Note: xbar1 will automatically include a column of ones in the last column. )You can create a vector of means named xbar2 for public sector works only withmatrix accum xx=age if public==1, means(xbar2);You can use matrix commands to manipulate the matrices. For example, to create the predicted mean at xbar,matrix ybar=xbar*beta1’;You can also extract subvectors of a matrix. For example,xbar2=xbar[1,2]creates a matrix containing the element in the first row and second column of xbar. Alternatively,xbarfem=xbar[1,"age"]creates a matrix containing the elements corresponding to the column with the age variable in it.For other matrix commands, see the chapter on matrix programming in the Stata manual or go to .You can see the matrices and their dimensions by typing:matrix list xbar ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download