10-4 Variation and Prediction Intervals
[Pages:5]10-4 Variation and Prediction Intervals
Explained and unexplained variation
In this section, we study two measures used in correlation and regression studies. (The coefficient of determination and the standard error of estimate.) We also learn how to construct a prediction interval for y using a regression line and a given value of x. To study these concepts, we need to understand and calculate the total variation, explained deviation, and the unexplained deviation for each ordered pair in a data set.
Assume that we have a collection of paired data containing the sample point
(x , y), that is the predicted value of y, and that the mean of the sample y-values is .
The total variation about a regression line is the sum of the squares of the differences between the y-value of each ordered pair and the mean of y.
total variation = ( - )
The explained variation is the sum of the squared of the differences between each predicted y-value and the mean of y.
explained variation = ( - )
The unexplained variation is the sum of the squared of the differences between the y-value of each ordered pair and each corresponding predicted y-value.
unexplained variation = ( - )
The sum of the explained and unexplained variations is equal to the total variation.
Total variation = Explained variation + Unexplained variation
As its name implies, the explained variation can be explained by the relationship between x and y. The unexplained variation cannot be explained by the relationship between x and y and is due to chance or other variables.
Consider the advertising and sales data used throughout this section with a regression line of = 50.729 x + 104.061.
Using the data point (2.0, 220) we can find the total, explained, and unexplained variation:
The Coefficient of determination
The coefficient of determination r2 is the ratio of the explained variation to the total variation.
2
=
We can compute 2 by using the definition or by squaring the linear correlation coefficient r.
Ex 1)
The correlation coefficient for the following advertising expenses and company sales data is 0.913. Find the coefficient of determination. What does this tell you about the explained variation of the data about the regression line? About the unexplained variation? (r= 0.913 suggests a strong positive linear correlation)
= 0.834
About 83.4% of the variation in the company sales can be explained by the variation in the advertising expenditures. About 16.6% of the variation is unexplained and is due to chance or other variables.
Advertising expenses (1000s of $), x 2.4 1.6 2.0 2.6 1.4 1.6 2.0 2.2 Sums 15.8 =(1634/8) =204.25
Company sales xy
(1000s of $), y
225
540
184
294.4
220
440
240
624
180
252
184
294.4
186
372
215
473
1634
3289.8
=(15.8/8)=1.975 ,
x2
5.76 2.56 4 6.76 1.96 2.56 4 4.84
32.44
y2
50,625 33,856 48,400 57,600 32,400 33,856 34,596 46,225
337,558
The Standard Error of Estimate The Standard Error of Estimate se is the standard deviation of the observed y-values about the predicted -value for a given x-value. It is given by
=
( - )2 - 2
Or as the following equivalent formula:
=
2 - 0 - 1 - 2
Ex 2)
The regression equation of the advertising expenses and company sales data in example 1) is
= 50.729 x + 104.061
Find the standard error of estimate.
x
y
( - )2
2.4
225
225.81 0.6561
1.6
184
185.23 1.5129
2.0
220
205.52 209.6704
2.6
240
235.96 16.3216
1.4
180
175.08 24.2064
1.6
184
185.23 1.5129
2.0
186
205.52 381.0304
2.2
215
215.66 0.4356
Sum
635.3463
The standard error of estimate of the company sales for a specific advertising expense is about $10,290.
In chapter 7, we saw that point estimates will not give us any information about how accurate they might be. Thus, we developed confidence interval estimates to overcome this advantage. In this section we follow the same approach to construct a prediction interval.
A prediction interval is an interval estimate of a predicted value of y.
Given a linear regression equation = 0 + 1 and x0, a specific value of x, a prediction interval for y is
- < < +
Where
=
2
1 1 + +
0 - 2 2 - 2
With n-2 degrees of freedom.
Ex3)
Using the results of previous example, construct a 95% prediction interval for the company sales when the advertising expenses are $2100. What can you conclude?
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- embedded value calculation for a life insurance company
- part 2 analysis of relationship between two variables
- lecture 9 linear regression
- section 9 2 linear regression university of utah
- how to report the percentage of explained common variance
- simple linear regression
- regression step by step using microsoft excel
- 10 4 variation and prediction intervals
- correlation coefficient and anova table
- f distribution and anova
Related searches
- 4 ecosystems and communities answers
- 4 lobes and their functions
- tableau 10 4 download
- element with 4 protons and 6 neutrons
- 10 4 generator cable
- 10 4 wire
- 10 4 electrical wire
- 10 4 extension cord wire
- direct variation and indirect variation
- dell command update windows 10 4 0
- dell update application for windows 10 4 0
- dell command update for windows 10 4 1 0