Technology - Winona State University



Handout #6: Confidence and Prediction Intervals for PredictionsSection 6.1 : Modeling Used Car PricesExample 6.1: The first example in this handout will use the CarPrices datasets from our course website. This dataset includes a various variables that are thought to influence the price of a vehicle. A snip-it of the data is provided here.Simple Linear Regression SetupModel to be fit using only used cars, i.e., New=No.Response Variable: PricePredictor Variable: MilesAssume the following structure for mean and variance functionsEPrice|Miles, New=No=β0+β1*MilesVarPrice|Miles, New=No=σ2The first step in running an analysis in JMP is to subset or filter the original dataset to includes only the used vehicles. This can be done easily in JMP. First, select Analyze > Distribution. Place New in the Y, Columns box and click OK.Select Analyze > Distribution, place New in Y, Columns box.Double clicking on New=No, will create a new dataset that includes only the New=No vehiclesJMP clearly names this new subset of the original data. This new subset is called CarPrices (New=No). We can see below this dataset includes 170 of the observations from the original CarPrices dataset.First, consider a scatterplot to visualize the relationship between Price | Miles, New=No.QuestionsWhat is the general pattern/trend/relationship between Price and Miles?Do you think the assumed mean function stated above is appropriate? How about the assumed form for the variance function? Discuss.Lowess smoother for Price | Miles, New=NoLowess smoother of the variabilityConsider again the much simpler form for the mean function.EPrice|Miles, New=No=β0+β1*MilesTo fit this model in JMP, select Analyze > Fit Model. Put Price, i.e. the response, in the Y box, and Miles, i.e. the predictor variable, in the Construct Model Effect box. Click Run.The distribution of Price | Miles, New=No with the estimated mean function and the summary of fit output as provided by JMP.Questions Does a linear mean function appear to fit this data well? Discuss.The New=No filtered dataset had 170 observations (see JMP spreadsheet on p2), but it appears only 169 were used in our regression analysis (see Observations (or Sum Wgts) = 169). Why is this the case?What is the best estimate for the variance in the condition distribution, i.e., VarPriceMiles,New=No= σ2?The standard deviation is simply the square root of the variance. Compute this quantity for the conditional distribution given here. That is, compute VarPriceMiles,New=No= σ2. Does this match the Root Mean Square Error computed by JMP? What is the interpretation of the Root Mean Square Error quantity in the context of this problem? What is the interpretation of the R2 value for this model?Consider again the form of the mean function and the parameter estimate portion of the output provided by JMP.EPrice|Miles, New=No=β0+β1*MilesQuestionsWhat is the best estimate for the slope of the true mean function? That is, what is β1? What is the best estimate for the y-intercept of the true mean function? That is, what is β0? Write out the estimated mean function using the estimated parameters.Interpret, in context and using laymen’s language, the slope in the above equation.Interpret, in context and using laymen’s language, the y-intercept in the above equation.The 95% confidence intervals for the parameters in the true mean function can be obtained by selecting Show All Confidence Intervals under Regression Reports from the red drop-down menu in JMP.QuestionsInterpret, in context and using laymen’s language, the 95% confidence interval for β1? Interpret, in context and using laymen’s language, the 95% confidence interval for β0? The estimated mean function is given by the quantityEPrice|Miles, New=No=β0+β1*Miles=20889-0.10*MilesThis quantity can be used to estimate the average Price of a used car with 50,000 Miles. The math is shown here.EPrice|Miles=50000, New=No=20889-0.10*50000=$15,889The $15,889 value appears to be reasonable when we isolate vehicles near 50,000 miles in our original scatterplot.Confidence Interval for PredictionsAkin to all other estimated quantities, we can expect variation to exist in this estimate. That is, a different random sample will produce a different estimated price for a vehicle with 50,000 miles.A 95% confidence interval for the quantity EPrice|Miles=50000, New=Nocan be obtained directly in JMP by selecting Mean Confidence Interval from the Save Columns menu item in JMP.A 95% confidence interval for every observation in the dataset is provided in the JMP spreadsheet.The JMP spreadsheet can be sorted by Miles in order to more easily find a 95% confidence interval for vehicles with 50,000 miles. From a review of the table above, we can see there are no observations that have exactly 50,000 miles. That is, none of the confidence intervals provided are correct for the quantityEPrice|Miles=50000, New=NoObtaining output for a New Observations To obtaining a prediction, standard error, and confidence interval for a new observation use the Formula versions of these quantities as provided in JMP.Next, create a pseudo observation in JMP with the desired characteristics. For our example here, this pseudo observation will have Miles = 50000.QuestionsDoes the predicted price (aside from rounding) provided by JMP agree with what we computed above? Discuss.What is the standard error for EPrice|Miles=50000, New=No? Give a practical interpretation of this quantity.The formula for the 95% normal-based confidence interval for the average predicted value is given byLower Limit= Predicted Value - c * Standard ErrorUpper Limit= Predicted Value + c * Standard Errorwhere, c is the 97.5th percentile from a t-distribution with n-2 degrees-of-freedom. Task: Verify the calculations for the 95% confidence interval for the average predicted value here.Lower Limit:Upper Limit:t-distribution with df = 169 – 2 = 167In Excel:Interpret the 95% confidence interval the average predicted value for a used vehicle with 50,000 miles, i.e. the 95% confidence interval for the quantity EPrice|Miles=50000, New=mentThe above confidence interval is not appropriate when attempting to predict the Price for a single vehicle, but instead is a reasonable range of values for average predicted price for vehicles with 50,000 miles.Confidence IntervalAverage Predicted PriceAll vehicles have 50,000 MilesPrediction IntervalSingle Predicted PriceSingle Vehicle with 50,000 Miles An overlay plot of the data, the estimated mean function, and the 95% confidence interval for the average predicted price for vehicles with 50,000 miles.Consider the plot of the estimated mean function over repeated samples. These plots were introduced in a previous handout. Notice that the variation in the estimated mean function is smaller for some values and larger for others. In particular, the variation is smallest near the average miles and increases as miles either increases or decrease. The reason is for this is that all estimated linear mean functions must pass through the point (Average Miles, Average Price). The average number of miles for our dataset is 78,442, thus, the standard error for the average prediction will be smallest when making predictions for vehicles near 78,000 miles.The 95% confidence interval bands for an average prediction across all miles.Prediction Interval for PredictionsRecall, a confidence interval is the appropriate quantity when interested in the average predicted price for a vehicle with 50,000 miles.EPrice|Miles=50000, New=NoHowever, a prediction interval is necessary when attempting to make predictions for a single vehicle. Price|Miles=50000, New=NoThe 95% prediction intervals can be obtained in JMP by selecting Indiv Confidence Interval from the Save Columns menu in JMP.The prediction intervals are placed in the spreadsheet in JMP.Again, sorting the spreadsheet by Miles to obtain prediction intervals for used vehicles with close to 50,000 miles.Similar to what was done above, in order to obtain a 95% prediction interval for a new observation not currently in the dataset, you must use the Formula versions of these quantities.Obtaining a prediction interval for a vehicle with 50,000 miles.The formula for the 95% normal-based prediction interval for the single value is given byLower Limit= Predicted Value - c * Standard ErrorUpper Limit= Predicted Value + c * Standard Errorwhere, c is the 97.5th percentile from a t-distribution with n-2 degrees-of-freedom. For some reason, JMP does not compute the standard error for an individual prediction – notice the dot in the StdErr Indiv Price column for our new observation. The standard error for a prediction interval is computed as follows.Standard Error=VarE(Price|Miles=50000,New=No)Varibility in Mean Function +VarPrice|Miles,New=NoVaribility in individual observations=426.382+4850.5122=4869.22 The actual calculations for the 95% prediction interval.Lower Limit: $15,882.72-1.9743*4869.22=$6,269.4Upper Limit: $15,882.72+1.9743*4869.22=$25,496.02t-distribution with df = 169 – 2 = 167In Excel:QuestionsInterpret the above 95% prediction interval for a single used vehicle with 50,000 miles, i.e. the 95% prediction interval for Price|Miles=50000, New=No.In the context of this example, explicitly explain the difference in the scope of inference between the 95% confidence interval and the 95% prediction interval.Visual contrast of the 95% confidence interval and 95% prediction interval for used vehicles with 50,000 miles.Plotting the 95% confidence and prediction bands across all observations in our dataset in JMP can be done using an overlay plot.Overlay plot in JMP ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download