Chapter 1 Simple Linear Regression (part 4)

Chapter 1 Simple Linear Regression (part 4)

1 Analysis of Variance (ANOVA) approach to regression analysis

Recall the model again

Yi = 0 + 1Xi + i, i = 1, ..., n

The observations can be written as

obs

Y

X

1

Y1

X1

2

Y2

X2

...

...

...

n

Yn

Xn

The deviation of each Yi from the mean Y? ,

Yi - Y?

The fitted Y^i = b0 + b1Xi, i = 1, ..., n are from the regression and determined by Xi.

Their mean is

Y?^

=

1 n

n

Yi

i=1

=

Y?

Thus the deviation of Y^i from its mean is

Y^i - Y?

The residuals ei = Yi - Y^i, with mean is

e? = 0 (why?)

Thus the deviation of ei from its mean is ei = Yi-Y^i

1

Write

We have

Yi - Y?

=

Y^i - Y?

+

ei

Total deviation

Deviation

Deviation

due the regression due to the error

obs

1 2 ... n Sum of squares

deviation of

Yi Y1 - Y? Y2 - Y?

... Yn - Y? ni=1(Yi - Y? )2 Total Sum of squares

(SST)

deviation of

Y^i = b0 + b1Xi Y^1 - Y? Y^2 - Y? ...

Y^n - Y?

n i=1

(Y^i

-

Y?

)2

Sum of

squares due to

regression

(SSR)

deviation of ei = Yi - Y^i

e1 - e? = e1

e2 - e? = e2 ...

en - e? = en

n i=1

e2i

Sum of

squares of

error/residuals

(SSE)

n

n

n

(Yi - Y? )2 =

(Y^i - Y? )2 +

e2i

i=1

i=1

i=1

SST

SSR

SSE

Proof:

n

n

(Yi - Y? )2 =

(Y^i - Y? + Yi - Y^i)2

i=1

i=1

n

=

{(Y^i - Y? )2 + (Yi - Y^i)2 + 2(Y^i - Y? )(Yi - Y^i)}

i=1

n

= SSR + SSE + 2 (Y^i - Y? )(Yi - Y^i)

i=1

n

= SSR + SSE + 2 (Y^i - Y? )ei

i=1

n

= SSR + SSE + 2 (b0 + b1Xi - Y? )ei

i=1

n

n

n

= SSR + SSE + 2b0 ei + 2b1 Xiei - 2Y? ei

i=1

i=1

i=1

= SSR + SSE

It is also easy to check

n

n

SSR = (b0 + b1Xi - b0 - b1X? )2 = b21 (Xi - X? )2

(1)

i=1

i=1

2

Breakdown of the degree of freedom The degrees of freedom for SST is n - 1: noticing that Y1 - Y? , ....., Yn - Y?

have one constraint ni=1(Yi - Y? ) = 0 The degrees of freedom for SSR is 1: noticing that Y^i = b0 + b1Xi

(see Figure 1)

2

2

1

Y fitted yhat residuals e

1

1

0

0

0

-1

0

0.5

1

0

0.5

1

0

0.5

1

X

X

X

Figure 1: A figure shows the degree of freedom

The degrees of freedom for SSE is n - 2: noticing that

e1, ..., en

have TWO constraints

n i=1

ei

=

0

and

n i=1

Xiei

=

0

(i.e.,

the

normal

equation).

Mean (of ) Squares

M SR = SSR/1

called regression mean square

M SE = SSE/(n - 2) called error mean square

Analysis of variance (ANOVA) table Based on the break-down, we write it as a table

Source of

variation

SS

df MS

F-value P (> F )

Regression Error Total

SSR = SSE = SST =

ni=1(Y^i - Y? )2 ni=1(Yi - Y^i)2 ni=1(Y^i - Y? )2

1

MSR

=

SSR 1

n-2

MSE

=

SSE n-2

n-1

F

=

MSR MSE

p-value

3

R command for the calculation anova(object, ...)

where "object" is the output of a regression.

Expected Mean Squares

E(M SE) = 2

and

n

E(M SR) = 2 + 12 (Xi - X? )2

i=1

[Proof: the first equation was proved (where?). By (1), we have

n

n

E(M SR) = E(b1)2 (Xi - X? )2 = [V ar(b1) + (Eb1)2] (Xi - X? )2

i=1

i=1

=[

2

n i=1

(Xi

-

X? )2

+

12]

n

(Xi

i=1

-

X? )2

=

2

+

12

n

(Xi

i=1

-

X? )2

]

2 F-test of H0 : 1 = 0

Consider the hypothesis test

H0 : 1 = 0, Ha : 1 = 0.

Note that Y^i = b0 + b1Xi and

n

SSR = b21 (Xi - X? )2

i=1

If b1 = 0 then SSR = 0 (why). Thus we can test 1 = 0 based on SSR. i.e. under H0, SSR or MSR should be "small".

We consider the F-statistic

F

=

MSR MSE

=

SSR/1 SSE/(n -

2) .

Under H0,

F F (1, n - 2)

For a given significant level , our criterion is

4

If F F (1 - , 1, n - 2) (i.e. indeed small), accept H0 If F > F (1 - , 1, n - 2)(i.e. not small), reject H0

where F (1 - , 1, n - 2) is the (1 - ) quantile of the F distribution. We can also do the test based on the p-value = P (F > F ), If p-value , accept H0 If p-value < , reject H0

Example 2.1 For the example above (with n = 25, in part 3), we fit a model

Yi = 0 + 1Xi + i

(By (R code)), we have the following output

Analysis of Variance Table

Response: Y

Df Sum Sq Mean Sq F value P r(> F )

X

1 252378 252378 105.88 4.449e-10 ***

Residuals 23 54825 2384

Suppose we need to test H0 : 1 = 0 with significant level 0.01, based on the calculation, the p-value is 4.449 ? 10-10 F (1 - , 1, n - 2) (t)2 > (t(1 - /2, n - 2))2 |t| > t(1 - /2, n - 2).

and F F (1 - , 1, n - 2) (t)2 (t(1 - /2, n - 2))2 |t| t(1 - /2, n - 2).

(you can check in the statistical table F (1 - , 1, n - 2) = (t(1 - /2, n - 2))2) Therefore, the test results based on F and t statistics are the same. (But ONLY for simple linear regression model)

5

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download