10-4 Variation and Prediction Intervals

[Pages:5]10-4 Variation and Prediction Intervals

Explained and unexplained variation

In this section, we study two measures used in correlation and regression studies. (The coefficient of determination and the standard error of estimate.) We also learn how to construct a prediction interval for y using a regression line and a given value of x. To study these concepts, we need to understand and calculate the total variation, explained deviation, and the unexplained deviation for each ordered pair in a data set.

Assume that we have a collection of paired data containing the sample point

(x , y), that is the predicted value of y, and that the mean of the sample y-values is .

The total variation about a regression line is the sum of the squares of the differences between the y-value of each ordered pair and the mean of y.

total variation = ( - )

The explained variation is the sum of the squared of the differences between each predicted y-value and the mean of y.

explained variation = ( - )

The unexplained variation is the sum of the squared of the differences between the y-value of each ordered pair and each corresponding predicted y-value.

unexplained variation = ( - )

The sum of the explained and unexplained variations is equal to the total variation.

Total variation = Explained variation + Unexplained variation

As its name implies, the explained variation can be explained by the relationship between x and y. The unexplained variation cannot be explained by the relationship between x and y and is due to chance or other variables.

Consider the advertising and sales data used throughout this section with a regression line of = 50.729 x + 104.061.

Using the data point (2.0, 220) we can find the total, explained, and unexplained variation:

The Coefficient of determination

The coefficient of determination r2 is the ratio of the explained variation to the total variation.

2

=

We can compute 2 by using the definition or by squaring the linear correlation coefficient r.

Ex 1)

The correlation coefficient for the following advertising expenses and company sales data is 0.913. Find the coefficient of determination. What does this tell you about the explained variation of the data about the regression line? About the unexplained variation? (r= 0.913 suggests a strong positive linear correlation)

= 0.834

About 83.4% of the variation in the company sales can be explained by the variation in the advertising expenditures. About 16.6% of the variation is unexplained and is due to chance or other variables.

Advertising expenses (1000s of $), x 2.4 1.6 2.0 2.6 1.4 1.6 2.0 2.2 Sums 15.8 =(1634/8) =204.25

Company sales xy

(1000s of $), y

225

540

184

294.4

220

440

240

624

180

252

184

294.4

186

372

215

473

1634

3289.8

=(15.8/8)=1.975 ,

x2

5.76 2.56 4 6.76 1.96 2.56 4 4.84

32.44

y2

50,625 33,856 48,400 57,600 32,400 33,856 34,596 46,225

337,558

The Standard Error of Estimate The Standard Error of Estimate se is the standard deviation of the observed y-values about the predicted -value for a given x-value. It is given by

=

( - )2 - 2

Or as the following equivalent formula:

=

2 - 0 - 1 - 2

Ex 2)

The regression equation of the advertising expenses and company sales data in example 1) is

= 50.729 x + 104.061

Find the standard error of estimate.

x

y

( - )2

2.4

225

225.81 0.6561

1.6

184

185.23 1.5129

2.0

220

205.52 209.6704

2.6

240

235.96 16.3216

1.4

180

175.08 24.2064

1.6

184

185.23 1.5129

2.0

186

205.52 381.0304

2.2

215

215.66 0.4356

Sum

635.3463

The standard error of estimate of the company sales for a specific advertising expense is about $10,290.

In chapter 7, we saw that point estimates will not give us any information about how accurate they might be. Thus, we developed confidence interval estimates to overcome this advantage. In this section we follow the same approach to construct a prediction interval.

A prediction interval is an interval estimate of a predicted value of y.

Given a linear regression equation = 0 + 1 and x0, a specific value of x, a prediction interval for y is

- < < +

Where

=

2

1 1 + +

0 - 2 2 - 2

With n-2 degrees of freedom.

Ex3)

Using the results of previous example, construct a 95% prediction interval for the company sales when the advertising expenses are $2100. What can you conclude?

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download