Sites.bu.edu



Study Questions for the QM222 TestPart I: Some Thought Questions The vast majority of hotel guests do not take the time to rate their hotel experience on . What sort of bias do you think this might induce in Tripadvisor’s hotel ratings and in what direction (positive or negative)? Explain why.People who have very negative or very positive hotel experiences are probably more likely to rate their hotel on Tripadvisor than guests who had uneventful stays that met their expectations. This is a form of selection bias in the Tripadvisor hotel ratings. If more people have very negative stays instead of very positive stays, Tripadvisor’s ratings would be too low. If more people have very positive stays instead of very negative stays, Tripadvisor’s ratings would be too high. In either case, we will miss the ratings of uneventful stays.A study finds companies that engage in management practices that are conventionally viewed as risky, such as mandating the use of interdisciplinary teams, tend to perform better than companies that don’t. The study concludes that risky practices improve firm performance in that industry. Describe how survivor bias could have been responsible for the findings.Risky management practices probably either have big positive consequences or big negative consequences. If there are big negative consequences, the company is more likely to go out of business. This means that the companies that are still in business are probably the ones that got lucky. There is survivor bias making the companies with risky practices look much better than they actually were on average.Part II:Here is a correlation table from data on US News’ 2004 rankings of MBA programs and various variables about each program in the “MBA programs” tab.?04 Ave. GMAT04 Acceptance RateAve. Salary & Bonus (Dec.2010 prices)04 Employment at Graduation04 Ave. GMAT104 Acceptance Rate-0.79506341Ave. Salary & Bonus 0.79725018-0.576677628104 Employment at Graduation0.48986355-0.5143419910.4425785581What is the correlation coefficient between GMAT score and acceptance rate? In 1-2 sentences, describe what this correlation coefficient tells us.The correlation coefficient between GMAT and acceptance rate is -0.795. This tells us that schools with lower acceptance rates tend to have higher average GMAT scores; there is a fairly strong negative correlation between the GMAT scores and acceptance rate.A junior statistician examining the correlation table you just made concludes that there is a stronger association between GMAT score and employment than between acceptance rate and employment. He bases this on the fact that the correlation between GMAT score and employment is positive, and the correlation between acceptance rate and employment is negative. What do you think of the statistician’s conclusion? Explain in 1-2 sentences.No! The sign of the correlation (negative or positive) cannot tell us the strength of the relationship! Actually both correlations are similar in magnitude (around 0.5). The reason they have opposite signs is that while higher GMAT scores are associated with higher employment rates, lower acceptance rates are associated with higher employment rates. So it makes sense that GMAT and employment rate have a positive correlation, while acceptance rate and employment rate have a negative correlation.Here is a regression of salaries on GMAT Scores using Data Analysis.(It is Excel output, but this is the same information as Stata output.)SUMMARY OUTPUTRegression StatisticsMultiple R0.79725R Square0.635608Adjusted R Square0.631321Standard Error10731.03Observations87?CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Intercept-17221121456.71592-8.025965.02E-12-214873-12954904 Ave. GMAT403.38133.1280466312.176422.5E-20337.5135469.2484a. In 1-2 sentences, interpret the coefficient on GMAT Scores: i.e. exactly what do we learn from it?An increase of 1 point of a school’s average GMAT score leads to a $403 higher predicted average starting salary.b. What information do you get from both this regression coefficient and the correlation between GMAT and salaries? Higher salaries are associated with higher GMAT scores. c. What information do you get from this coefficient that you don’t get from the correlation between GMAT and salaries? The correlation doesn’t tell us how much higher salary is when GMAT score is higher. The regression coefficient ($403 per GMAT point) tells us the slope of the line.d. Predict Salaries for someone with a GMAT= 680. Show your calculations. Predicted Salary = -172211+ 403.381*680 = $102,088Part III. Fuel EfficiencyThe US has regulations requiring a minimum value for the average MPG of the passenger cars that a manufacturer makes. For SUVs (including light trucks), the regulations are less strict. An environmental group wanted to test whether per pound, SUVs had a worse (lower) MPG. They collected data on 45 vehicles, calculated their MPG per 1000 pounds and used this as the dependent (left hand side) variable. They estimated the following regression: MPG/’000lb = 9.671 - 1.213 SUV (22.65) (-5.99)R2 = .831 adj. R2= .829 SEE = 7.98where SUV is a dummy/indicator variable for SUVs (so SUV=1 if the vehicle is an SUV, and SUV=0 if a passenger car)What is the average MPG/’000lb of SUVs? Of cars? Explain how you know, showing any calculations.SUVs (SUV = 1) MPG/’000lb = 9.671 - 1.213(1) = 8.458 Cars (SUV = 0) MPG/’000lb = 9.671 - 1.213(0) = 9.671 If the researcher had made a dummy/indicator variable for "cars" instead of "SUVs," what would the equation be? (Fill in intercept (constant) and coefficient on cars) MPG/’000lb = 8.458 + 1.213 CARS The regression equation is now in the form of: MPG/’000lb = b0 + b1 CARS where CARS is a dummy/indicator variable for cars The intercept b0 is what we get if CARS=0, or in other words the predicted MPG/’000lb for SUVs (see part a) = 8.458The coefficient on CARS b1 is the difference in MPG/’100lb between CARS and SUV’s, or +1.213 (with opposite sign from previous regression).Part IV: Brookline Housing Data and Goodness of FitWe have run the following three regression to predict price of Brookline Condos. Here are the regressions and their R-squared. Regression A:Regression StatisticsMultiple R0.86546822R Square0.749Adjusted R Sq0.7488035Standard Error131,746 Observations1085?CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Intercept12934.1249705.7121.332630.18293354-6110.0131978.25Size407.4513337.16665956.853730393.3892421.5134Regression B:Regression StatisticsMultiple R0.66516099R Square0.44243915Adjusted R Square0.44192432Standard Error 196,372 Observations1085?CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Intercept-41574.56519922.79-2.086780.037141-80666.2-2482.93Rooms116705.4263981.03729.315341.5E-139108894124516.8Regression C:Regression StatisticsMultiple R0.196R Square0.038Adjusted R Sq0.038Standard Error257,887 Observations1085?CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Intercept541,4528,753 61.90524276.4558626.9HighRise-128,72219,573 -6.67.48E-11-167127-90316.8Interpret each the coefficient (in words) that anyone would understand.A. An extra square foot increases the price of a condo by 407.45 dollars.B. An additional room increases the price of a condo by $116,705.C. High rise condos are on average $128,722 cheaper than condos that are not high rises.Which of the three regressions gives the most precise predictions of price? How can you tell? Regression A (Size). It has the lowest SEE. We use the SEE to make confidence intervals around our prediction, so a lower SEE means that our confidence intervals are smaller and predictions are more precise. (It also has the highest R2 )Using the most precise regression, make a prediction of the sale price for a 1500 SqFt, 4-room condo in a high rise building. Predicted price: Using the regression equation A:predicted_price=12934.12+407.4513*1500= $624,111Without doing any new calculations, circle the correlation that is the highest:Correlation of size and price THIS IS THE CORRECT ANSWERCorrelation of rooms and priceCorrelation of high rise location and priceIn 1 sentence, explain how you know the answer:It has the highest R2, and R2 = [correlation]2 if the regression has just two variables.Regression C gives you an estimate of how much higher condo prices in high-rises than elsewhere.What is your estimate of how much higher prices are in high rises? Show the calculations that led to your answer.On average, prices of high rise units are $128,722 lower than prices of other types units. This comes from looking at the coefficient on high-rises.What is your 68% confidence interval around that estimate? Show the calculations.Here, we look at the coefficient on high-rise units:-128,722We need to make a confidence interval around this COEFFICIENT. (Recall, the question asks us to estimate the different in prices between high-rise and non-high-rise units).To make a confidence interval around the coefficient, we need to use the standard error of the coefficient. Then, our 68% Confidence interval is:-128,722 – 1*StdErr(b1) = -128,722 – 19,573 = -148,295-128,722 +1*StdErr(b1)= -128,722 + 19,573 = -109,149From -148,295 to -109,149 Compared to condos with fewer rooms, on average, do condos with more rooms sell for: (circle one)moreIf your answer was “more” or “less”, with what percent confidence did you reject the hypothesis that the sale price was about the same? Show your calculations.The coefficient on rooms is positive.We test the null hypothesis that the coefficient on rooms is 0. We can calculate this as:t= (116705.426-0)/ 3981.037 = 29.31We could also just read the t-statistic off of the Excel output!Since |t|>2, we can reject the hypothesis that the sale price was about the same with p<0.05. That means we have at least 95% confidence.The p-value from the excel output means we can reject the null hypothesis with p= 1.5E-139. That p-value is almost zero! That means we have almost 100% confidence (99.99999999999999999999999999999….. confidence).Your intern collects more data indicating whether each condo featured hardwood floors. She created a new indicator variable that equals 1 if the house has hardwood floors and 0 otherwise. She runs a regression and gets the following result:Sale Price = 425,000 + 200,000*HardwoodFloorsWhat is the average sale price of a home with hardwood floors? Show the calculations that led to your answer.Sale Price = 425,000 + 200,000*1=625,000What is the average sale price of a home without hardwood floors? Show the calculations that led to your answer.Sale Price = 425,000 + 200,000*0=425,000Imagine that your intern had instead defined the variable to be NoHardwoodFloors and set it equal 1 if the house does not have hardwood floors. She runs a new regression that looks like this:Sale Price = b0 + b1 *NoHardwoodFloors What is the value of b0 in the new regression? 625,000 What is the value of b1 in the new regression? -200,000Show calculations that explain how you derived these answers:We know that b1 has to be -200,000. Units with Hardwood floors are $200,000 more expensive than units without, so units without must be $200,000 less expensive then units with.We know that b0 is the predicted price for a unit with Hardwood Floors, since it is the predicted price for a unit when NoHardwoodFloors = 0:Sale Price = b0 + b1 *(0) = b0From Question 9a, we saw that the predicted price for a unit with hardwood floors was 625,000.Part V. MoviesYou have collected data on 1832 movies released in the United States between 1995 and 2011. You have information on the following variables:Lifetime GrossThe total gross revenue of the movie made as of May 22, 2013 in millions of 2011 dollarsLifetime TheatersThe total number of theaters the movie is shown as of May 22, 2013YearThe year of Date of ReleaseSeasonSeason in which movie was releaseBudgetThe estimated total cost of producing the movie, in millions of 2011 US DollarsMetascoreA numerical score assigned by to the movie, ranging from 1 (worst reviews) to 10 (best reviews).You created indicator/dummy variables for season and then run the following regression:SUMMARY OUTPUTRegression StatisticsMultiple R0.67R Square0.45Adjusted R Square0.45Standard Error54.97Observations1832.00?CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Intercept-33.674.71-7.140.00-42.91-24.42metascore0.850.070.00budget1.080.03Summer9.893.622.730.012.7816.99Fall-5.573.63-1.540.12-12.681.54Winter5.543.821.450.15-1.9513.03Interpret the R2 of the regression.Close to half (45% ) of the variation in box office is explained by variation in metascore, budget and season. Interpret the coefficient of budget For a $1 million increase in budget, the predicted Lifetime Gross would increase by $1.08 million, keeping metascore and season constant. What is the t-statistic associated with the coefficient on Budget? What does it mean? t=35.77. The coefficient on budget is significantly different from zero at 95% level. We have strong evidence to conclude that budget has a positive effect on Lifetime Gross.Extra: In fact, we can use the p-value of .00 (rounded to 2 digits) and calculate that we can be more than 99.99% certain that budget has a positive effect on Lifetime Gross.Based on the coefficient on budget, is investing in producing a movie profitable? Explain (based on the table).It is, given that the confidence interval (1.08-2*0.03-1.08+2.*0.03) for the coefficient is above one, telling us, that we are quite certain that if we invest one million dollars, we will get more than one million dollars. Of course, we are not considering here the alternative uses of the money, and the returns –though positive- are not very high. Interpret the coefficient of winter. What can we learn from the corresponding p-value?Keeping budget and score constant, a movie released in the Winter makes $5.54 million more in average than one released in the Spring.Since p>0.05, we do not have enough evidence to reject the hypothesis that winter and spring have different Lifetime Gross, holding other factors constant.Holding metascore and budget constant, on average, which season should the producer release the movie? Which season should they avoid?The producer should release in summer since the coefficient in summer is the highest then. The producer should avoid release in Fall since the coefficient in summer is lowest. Fall has the lowest predicted Lifetime Gross, holding other factors constant. Note however, that the coefficient is not statistically significant.A corrupted employee of offers to change the score of a movie from 5 to 8 and asks for 2.7 million dollars to do so. The (also corrupted) producer of the movie says that the amount is completely unreasonable. Do you agree or disagree with the producer’s statement? Explain.I disagree, increasing by 3 points the score will increase the boxoffice somewhere between 3*(0.85-2*0.07) and 3*(0.85+2*0.07). The upper limit is greater than 2.7, so there is a reasonable chance that the producer will make more money by taking the deal. Part VI. Omitted Variable Bias Researchers studying the relationship between guns and crime have collected data on the following variables across all states in 1999: [See list with Question](1)(2)Dependent Variable:Murder rate Murder rate Concealed Carry Law-1.3560.099(0.747)(0.563)region=northeast-0.949-1.594(0.885)(0.624)region=south2.8301.341(0.952)(0.698)region=west0.080-0.053(0.954)(0.665)Robbery Rate 0.025(0.004)Intercept5.1792.152(0.641)(0.630)Standard errors in parentheses Do states with Concealed Carry Laws have higher or lower robbery crime rates than states without such laws? Explain how you can tell. [Hint: this is a missing-variable bias question.]Lower. The coefficient on CCL increases (and even becomes positive) once we add robbery to the regression (compare regression 1 to regression 2). This means that the indirect effect due to Robbery Rate that the CCL picked up in regression (1) was negative. Or, calculate it as:Total effect regression 1: -1.356minusDirect effect regression 2: - ( .099)equals Bias (Indirect effect) -1.455Also, in regression 2 we see that robbery has a positive direct effect.So the correlation between CCL and robbery must be negative, since the sign of the correlation of CCL and Robbery Rate times the sign of the direct effect of ROBBERY is the sign of the indirect effect on CCL due to Robbery missing in regression 1... Bias = -1.355 = b1 + b2 a1 -1.355 = + .099 b1 + .025 * a1 Clearly, a1 must be negative for this to be true.87439513779500 CCL b1 + .099 (regression 2)7531103302000 Murder Rate13665202857500a1 Robbery Rate b2 + .025 (regression 2)Part VII.Executives at a major financial company are trying to model which households own stocks. They collected data for a national sample of households from around the country. The data they collected includes:own_stock:whether or not the household owns any stocks.college:whether the most educated person in the household completed collegehighschool:whether the most educated person in the household completed high school but not collegeThey ask you to model who owns stocks, so you run a set of regressions with “own_stock” as the left hand side (Y) variable.You run this regression. Without any statistics terms or jargon, what does the coefficient 0.1296 tell us? (1-2 sentences.)Regression 1:Regression StatisticsMultiple R0.33171R Square0.110032Adjusted R Square0.109733Standard Error0.394304Observations5962?CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Intercept0.1085530.00715115.17914.35E-510.0945330.122572highschool0.1296130.0128910.055681.34E-230.1043450.154881college0.3322120.01225427.10941E-1520.3081880.356235It says that a household with someone with a highschool degree (but no one with a college degree) has a 12.96% higher chance of stock than a household where no one even has a high school degree.You would like to estimate a regression of BMI on age. What regression would you run that could fit this curve? How would you do this in Stata?Run a regression of BMI on age and agesq, where agesq is equal to age squared.gen agesq= age*ageregress BMI age agesq ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download