Faculty Directory | UVA Darden School of Business



Consumer Reports provided extensive testing and ratings of 24 treadmills. They provided an overall score based on ease of use, ergonomics, exercise range, and quality. Higher scores indicated higher performance. The following data (available in file “EXAM# DATA” on the course website) gives Price (in dollars), an overall categorical quality rating (Good, Very Good, Excellent), and the performance score (Consumer Reports, February 2006). Basic summary statistics for the numerically-scaled variables are also provided. (Based on EMBS question 45, page 603).Brand & ModelPriceQualityScoreLandice L72900Excellent86NordicTrack S30003500Very good85SportsArt 31102900Excellent82Precor3500Excellent81True Z4 HRC2300Excellent81Vision Fitness T95002000Excellent81Precor M 9.313000Excellent79Vision Fitness T92001300Very good78Star Trac TR9013200Very good72Trimline T350HR1600Very good72Schwinn 820p1300Very good69Bowflex 7-Series1500Excellent83NordicTrack S19002600Very good83Horizon Fitness PST81600Very good82Horizon Fitness 5.2T1800Very good80Evo by Smooth Fitness FX301700Very good75ProForm 1000S1600Very good75Horizon Fitness CST4.51000Very good74Keys Fitness 320t1200Very good73Smooth Fitness 7.1HR Pro1600Very good73NordicTrack C23001000Good70Spirit Inspire1400Very good70ProForm 7501000Good67Image 19.0 R600Good66??Sample Mean1920.876.54Sample Standard deviation855.25.92Count2424Median1600?76.51. One would expect that the performance score would be different (on average) for the three quality categories. We expect that for two reasons. The performance score was based, to some extent, on the quality of the treadmill. In addition, we expect better performing treadmills to belong to a different quality category than lower performing treadmills. Formulate and test an appropriate null Hypothesis. State the alternative hypothesis, conduct the test, and clearly state your conclusion. [20 points]Regressing Score on two (of the three) quality dummy variables givesSUMMARY OUTPUTRegression StatisticsMultiple R0.740615594R Square0.548511459Adjusted R Square0.50551255Standard Error4.162651126Observations24ANOVA?dfSSMSFSignificance FRegression2442.077221.03912.7560.00024Residual21363.88117.328Total23805.958????CoefficientsStandard Errort StatP-valueLower 95%Intercept67.6672.40328.1560.000062.669De14.1902.8734.9400.00018.217Dvg8.1192.6483.0660.00592.612The model p-value is 0.00024. Thus we reject H0: mean score is equal for the three quality groups in favor of Ha: they are not all equal.2. Which is the better predictor of price, score or quality? (In answering this question, treat quality as a categorically-scaled variable—with three categories.) [20 points]The regression of price on score has model standard error of 651.7 (and adjusted R-square of 42%) while the regression of price on the two quality dummies had a model standard error of 699.0 (adjusted R-square of 33%). Score is the better predictor of price. 3. A new treadmill was recently tested and received a score of 76.54. It was the first time in the history of Consumer Reports that a product received as score with two decimal places. We do not know its quality category. If we use linear regression to construct a point forecast of price for the new treadmill, what will that point forecast be? [10 points]The easy way to answer this is to take advantage of the knowledge imparted to me by my dear professor Pfeifer that regressions always go through the averages. Since 76.54 is the average score, the regression forecast will be $1920.8, the sample average price. I did not actually need to run the regression.4. Consumer Reports hires you as a summer intern to identify the treadmill that is the “best value”. In other words, which of the 24 treadmills has the best (lowest) price for its score? Hint: Although “Image 19.0 R” is the lowest price at $600, it is not a very good value given its low score of 66. Justify your pick. [15 points]This is a question about model residuals. The mill with the largest negative residual is the mill whose price is most below expected price. That is mill 12 (Bowflex 7 series) with predicted price of $2,543 and actual price of only $1,500….for a residual of -$1,043.Regression StatisticsMultiple R0.666746911R Square0.444551444Adjusted R Square0.419303782Standard Error651.6556981Observations24ANOVA?dfRegression1Residual22Total23?CoefficientsIntercept-5451.589722Score96.31908184ObservationPredicted PriceResiduals12831.8568.1522735.53764.4732446.57453.4342350.261149.7452350.26-50.2662350.26-350.2672157.62842.3882061.30-761.3091483.381716.62101483.38116.62111194.43105.57122542.89-1042.89132542.8957.11142446.57-846.57152253.94-453.94161772.34-72.34171772.34-172.34181676.02-676.02191579.70-379.70201579.7020.30211290.75-290.75221290.75109.25231001.79-1.7924905.47-305.475. Another new treadmill scored 84. Will its price be less than or more than $4,000? [20 points]The point forecast (based on the regression of price on score) is $2,639. So we are fairly certain the price will be less than $4,000. The probability the price will be less than $4000 is 1 – t.dist.rt[(4000-2639)/652,22] or 98%.6. Which one (pick just one) of the four assumptions is most clearly violated by a regression of price on score? [10 points]Homoskedasticity is clearly violated. The scatter of price about the line increases with score. The relationship looks linear, and sorted residuals show no obvious signs of skewness (non-normality). We might assume that either these are ALL the treadmills on the market or a random sample of ALL treadmills. Thus, independence is reasonable. Homoskedasticity is clearly the one. 7. If we account for the violated assumption mentioned above, in what way (what direction) will your answer to Q5 change? [10 points]Since 84 is a high score, the standard error of 652 under-estimated the actual uncertainty. If we re-calculate the probability using, say, 700 as the standard error, we get 97%. So with more uncertainty due to the heteroskdasticity, the probability of being less than $4,000 goes down slightly. This makes sense. The more uncertainty, the less chance it will be priced under $4,000.8. Al bought a treadmill with a score of 80. Bo’s treadmill received a score of 70. How much more do we expect Al paid? How much more do we expect Al paid if both treadmills received a “very good” quality rating? [20 points]First assume these two treadmills are NOT in the dataset. If all we know is score, we expect Al to pay 96.3(10) = $963 more. If we know both to be “very good” quality, we include two quality dummies in the regression (see below) and find the expected gap in price to be only $651. (Note, to account for quality, we need to use 2 dummies. Some paper used just one dummy for “very good”…..which is NOT a good model…as it lumps together “excellent” and “good”….which does not make sense.)Since the set of two quality dummies were not statistically significant (we don’t really know how to reach that conclusion…but it it true, nonetheless), we also accepted $963 as the answer to the second part of this question if you made that argument.?CoefficientsIntercept-3539.658444Score65.11810508De794.9906993Dvg418.92205169. Pfeifer thought the scores were too low, so he applied a linear “curve” so that the 86 became 100 and the 66 became 70. What is the equation for that “curve”? [10 points]ScoreCurved Score861006670SUMMARY OUTPUTRegression StatisticsMultiple R1R Square1Adjusted R Square65535Standard Error0Observations2?CoefficientsIntercept-29Score1.5Since we know two points, we can use regression to determine the equation of the line that goes through those two points. The regression is a “perfect” fit with standard error of zero. The equation isCurved Score = -29 + 1.5 * Score10. What is the curved score for the treadmill scoring 84? What is point forecast of price using only the curved scores in a linear regression? [10 points]The curved score is -29 + 1.5*84 = 97. The point forecast of price will be $2,639, the same as in the answer to Q5. There is no need to waste my valuable exam time running a new regression. The point forecasts will be identical to those from the regression using the original scores. This is always the case if one replaces an X with any exact linear function of that X in a regression11. Al regressed Y on X after first subtracting the sample average X from each X. What can we say about the estimated intercept (a) in Al’s model? (Circle one and only one answer.) [10 points]NothingIt will be zeroIt will be the average of the Y’sIt will be the square root of pi.Briefly explain your answer.C. It will be the average of the Y’s. Since regression always goes through the average, and the average of Al’s X’s will be zero, the point forecast at X=0 (the average) will be the average of the Y’s. This is also the intercept.12. Al and Bo built identical models using the Oakland A’s data except:Al used opponent dummies 1 to 12 (leaving out Kansas City)Bo used dummies 1 to 3 and 5 to 13 (leaving out the Yankees)Which are true? (Circle all those that are true. Do not circle those that are false.) [10 points]A. Al’s standard error will be equal to Bo’sB. Bo’s adjusted R-square will be lower than Al’s (because Bo left out the Yanks)C. Bo’s t-stats for the 12 opponent dummies will all be negative.D. Bo’s intercept will be greater than Al’s.A, C, and D are all true. A is true and B is false because the model’s give identical forecasts with identical number of parameters. Bo’s intercept will be high (because it represents the Yankees) and the coefficient of all 12 dummies will be negative (since the Yankees are the most popular opponent). Negative coefficients have negative t-stats. 13. Al and Bo built identical models using the Oakland A’s data except:Al included DOW as defined in the data set (Monday=1, Sunday=7)Bo changed DOW to go from 7 on Monday to 1 on Sunday.Which are true? (Circle all those that are true. Do not circle those that are false.) [10 points]The coefficients of DOW will be equal but of opposite signs.The t-stats for DOW will be equal but of opposite signAl will get a higher adjusted R2 because attendance is higher on weekends.The models will produce identical forecasts.A, B, and D are true. C is not. Again, these models produce identical forecasts….as regression will adjust its intercept and coefficient to account for whether we use 1 to 7 or 7 to 1. ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download