DS 533 - Western Illinois University

DS 533

Fall 2004

Exam # 3

Show All your Work

An automobile rental company wants to predict the yearly maintenance expense (Y) for an automobile using the number of miles driven during the year ([pic]) and the age of the car ([pic], in years) at the beginning of the year. The company has gathered the data on 10 automobiles and the regression information from Excel is presented below. Use this information to answer the following questions.

|Summary measures | | | |

|Multiple R |0.9689 | | | |

|R-Square |0.9387 | | | |

|Adj R-Square |0.9212 | | | |

|Standard Error |72.218 | | | |

| | | | | |

|Regression coefficients | | | |

| |Coefficient |Std Err |t-value |p-value |

|Constant |33.796 |48.181 |0.7014 |0.5057 |

|Miles Driven |0.0549 |0.0191 |2.8666 |0.0241 |

|Age of car |21.467 |20.573 |1.0434 |0.3314 |

a. Use the information above to estimate the linear regression model.

b. Interpret each of the estimated regression coefficients of the regression model in Question a.

c. Identify and interpret the coefficient of determination ([pic]), and the standard error of the estimate (Sy.x) for the model in Question 3.

d. Does the given set of explanatory variables do a good job of explaining changes in the maintenance costs? Explain why or why not.

e. Would you recommend that this company examine any other factors to predict maintenance expense? If yes, what other factors would you want to consider? Explain your answer.

f. Give a 95% confidence interval for the average yearly maintenance cost for an automobile for every extra mile driven during the year ([pic]).

g. What is the average yearly maintenance cost for a 10-year-old automobile that drives 12000 miles per year?

Mid-Valley Travel Agency (MVTA) has offices in 12 cities. The company believes that its monthly airline bookings are related to the mean income in those cities and has collected the following data:

|Location |Bookings |Income |

|1 |1098 |43299 |

|2 |1131 |45021 |

|3 |1120 |40290 |

|4 |1142 |41893 |

|5 |971 |30620 |

|6 |1403 |48105 |

|7 |855 |27482 |

|8 |1054 |33025 |

|9 |1081 |34687 |

|10 |982 |28725 |

|11 |1098 |37892 |

|12 |1387 |46198 |

The data are analyzed using regression analysis. The partial computer output is given below:


| | | | | |

|Regression Statistics | | | |

|Multiple R |0.879189 | | | |

|R Square |0.772974 | | | |

|Adjusted R Square |0.750271 | | | |

|Standard Error |78.16735 | | | |

|Observations |12 | | | |

| | | | | |

|ANOVA | | | | |

|  |df |SS |MS |F |

|Regression |1 |208036.3 |208036.3 |34.04775 |

|Residual |10 |61101.35 |6110.135 | |

|Total |11 |269137.7 |  |  |

| | | | | |

|  |Coefficients |Standard Error |t Stat |P-value |

|Intercept |371.6758 |128.5571 |2.891133 |0.016076 |

|X Variable 1 |0.019381 |0.003322 | | |

a) What is the estimated least square regression line?

b) What is the standard error of the estimate?

c) Forecast the number of bookings when the mean income is $51385.

d) Test the significance of the regression coefficient at the 5% level (state the null and alternative hypothesis, the value of your test statistic, the p-value or the decision rule, and your conclusion).

e) Give an interval estimate of (1 with a 95% confidence coefficient.

Multiple Choice Questions

Select the best answer

1. In choosing the “best-fitting” line through a set of points in linear regression, we choose the one with the:

a. smallest sum of squared residuals

b. largest sum of squared residuals

c. smallest number of outliers

d. largest number of points on the line

e. none of the above

2. In a multiple regression analysis, there are 25 data points and 5 independent variables, and the sum of the squared differences between observed and predicted values of y is 160. The multiple standard error of estimate will be:

a. 2.530

b. 3.464

c. 2.902

d. 5.657

e. none of the above

3. In a simple linear regression analysis, the following sum of squares are produced:


The proportion of the variation in y that is explained by the variation in x is:

a. 20%

b. 80%

c. 25%

d. 50%

e. none of the above

4. Given the least squares regression line [pic]8 – 3x,

a. the relationship between x and y is positive

b. the relationship between x and y is negative

c. as x increases, so does y

d. as x decreases, so does y

e. there is no relationship between x and y

5. A multiple regression equation includes 6 independent variables, and the coefficient of multiple determination is 0.91. The percentage of the variation in y that is explained by the regression equation is:

a. 91%

b. 95%

c. 83%

d. about 15%

e. none of the above

6. A “fan” shape in a scatterplot indicates:

a. unequal variance

b. a nonlinear relationship

c. he absence of outliers

d. sampling error

7. The values of the regression parameters (i are not known. We estimate them from the data.

a) True b) false c) Not enough information

8. Residual plots can be used to check the aptness of the model for the data.

a) True b) False c) Not enough information

9. We need to estimate the variance of the error terms because:

I) It gives an indication of the variability of the distribution of y.

II) It is needed for making inference concerning regression function and the prediction of y.

a) Only (I) is true.

b) Only (II) is true.

c) Both (I) and (II) are true.

d) Neither (I) nor (II) is true.


