Chapter 1 Simple Linear Regression (part 4)
Chapter 1 Simple Linear Regression (part 4)
1 Analysis of Variance (ANOVA) approach to regression analysis
Recall the model again
Yi = 0 + 1Xi + i, i = 1, ..., n
The observations can be written as
obs
Y
X
1
Y1
X1
2
Y2
X2
...
...
...
n
Yn
Xn
The deviation of each Yi from the mean Y? ,
Yi - Y?
The fitted Y^i = b0 + b1Xi, i = 1, ..., n are from the regression and determined by Xi.
Their mean is
Y?^
=
1 n
n
Yi
i=1
=
Y?
Thus the deviation of Y^i from its mean is
Y^i - Y?
The residuals ei = Yi - Y^i, with mean is
e? = 0 (why?)
Thus the deviation of ei from its mean is ei = Yi-Y^i
1
Write
We have
Yi - Y?
=
Y^i - Y?
+
ei
Total deviation
Deviation
Deviation
due the regression due to the error
obs
1 2 ... n Sum of squares
deviation of
Yi Y1 - Y? Y2 - Y?
... Yn - Y? ni=1(Yi - Y? )2 Total Sum of squares
(SST)
deviation of
Y^i = b0 + b1Xi Y^1 - Y? Y^2 - Y? ...
Y^n - Y?
n i=1
(Y^i
-
Y?
)2
Sum of
squares due to
regression
(SSR)
deviation of ei = Yi - Y^i
e1 - e? = e1
e2 - e? = e2 ...
en - e? = en
n i=1
e2i
Sum of
squares of
error/residuals
(SSE)
n
n
n
(Yi - Y? )2 =
(Y^i - Y? )2 +
e2i
i=1
i=1
i=1
SST
SSR
SSE
Proof:
n
n
(Yi - Y? )2 =
(Y^i - Y? + Yi - Y^i)2
i=1
i=1
n
=
{(Y^i - Y? )2 + (Yi - Y^i)2 + 2(Y^i - Y? )(Yi - Y^i)}
i=1
n
= SSR + SSE + 2 (Y^i - Y? )(Yi - Y^i)
i=1
n
= SSR + SSE + 2 (Y^i - Y? )ei
i=1
n
= SSR + SSE + 2 (b0 + b1Xi - Y? )ei
i=1
n
n
n
= SSR + SSE + 2b0 ei + 2b1 Xiei - 2Y? ei
i=1
i=1
i=1
= SSR + SSE
It is also easy to check
n
n
SSR = (b0 + b1Xi - b0 - b1X? )2 = b21 (Xi - X? )2
(1)
i=1
i=1
2
Breakdown of the degree of freedom The degrees of freedom for SST is n - 1: noticing that Y1 - Y? , ....., Yn - Y?
have one constraint ni=1(Yi - Y? ) = 0 The degrees of freedom for SSR is 1: noticing that Y^i = b0 + b1Xi
(see Figure 1)
2
2
1
Y fitted yhat residuals e
1
1
0
0
0
-1
0
0.5
1
0
0.5
1
0
0.5
1
X
X
X
Figure 1: A figure shows the degree of freedom
The degrees of freedom for SSE is n - 2: noticing that
e1, ..., en
have TWO constraints
n i=1
ei
=
0
and
n i=1
Xiei
=
0
(i.e.,
the
normal
equation).
Mean (of ) Squares
M SR = SSR/1
called regression mean square
M SE = SSE/(n - 2) called error mean square
Analysis of variance (ANOVA) table Based on the break-down, we write it as a table
Source of
variation
SS
df MS
F-value P (> F )
Regression Error Total
SSR = SSE = SST =
ni=1(Y^i - Y? )2 ni=1(Yi - Y^i)2 ni=1(Y^i - Y? )2
1
MSR
=
SSR 1
n-2
MSE
=
SSE n-2
n-1
F
=
MSR MSE
p-value
3
R command for the calculation anova(object, ...)
where "object" is the output of a regression.
Expected Mean Squares
E(M SE) = 2
and
n
E(M SR) = 2 + 12 (Xi - X? )2
i=1
[Proof: the first equation was proved (where?). By (1), we have
n
n
E(M SR) = E(b1)2 (Xi - X? )2 = [V ar(b1) + (Eb1)2] (Xi - X? )2
i=1
i=1
=[
2
n i=1
(Xi
-
X? )2
+
12]
n
(Xi
i=1
-
X? )2
=
2
+
12
n
(Xi
i=1
-
X? )2
]
2 F-test of H0 : 1 = 0
Consider the hypothesis test
H0 : 1 = 0, Ha : 1 = 0.
Note that Y^i = b0 + b1Xi and
n
SSR = b21 (Xi - X? )2
i=1
If b1 = 0 then SSR = 0 (why). Thus we can test 1 = 0 based on SSR. i.e. under H0, SSR or MSR should be "small".
We consider the F-statistic
F
=
MSR MSE
=
SSR/1 SSE/(n -
2) .
Under H0,
F F (1, n - 2)
For a given significant level , our criterion is
4
If F F (1 - , 1, n - 2) (i.e. indeed small), accept H0 If F > F (1 - , 1, n - 2)(i.e. not small), reject H0
where F (1 - , 1, n - 2) is the (1 - ) quantile of the F distribution. We can also do the test based on the p-value = P (F > F ), If p-value , accept H0 If p-value < , reject H0
Example 2.1 For the example above (with n = 25, in part 3), we fit a model
Yi = 0 + 1Xi + i
(By (R code)), we have the following output
Analysis of Variance Table
Response: Y
Df Sum Sq Mean Sq F value P r(> F )
X
1 252378 252378 105.88 4.449e-10 ***
Residuals 23 54825 2384
Suppose we need to test H0 : 1 = 0 with significant level 0.01, based on the calculation, the p-value is 4.449 ? 10-10 F (1 - , 1, n - 2) (t)2 > (t(1 - /2, n - 2))2 |t| > t(1 - /2, n - 2).
and F F (1 - , 1, n - 2) (t)2 (t(1 - /2, n - 2))2 |t| t(1 - /2, n - 2).
(you can check in the statistical table F (1 - , 1, n - 2) = (t(1 - /2, n - 2))2) Therefore, the test results based on F and t statistics are the same. (But ONLY for simple linear regression model)
5
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- chapter 14 solutions and their behavior texas a m university
- distributions related to the normal distribution
- inch and metric thread size comparison chart pouce et métrique
- new passive components digi key
- dpm 742 bl 4 20ma loop powered indicator with backlighting
- pipe dimensions weights chart snowcrest
- coordinate geometry circle nui galway nui galway
- chapter 1 simple linear regression part 4
- plasma catalog 4
- chapter 10 coordination chemistry ii bonding
Related searches
- simple linear regression test statistic
- simple linear regression hypothesis testing
- simple linear regression null hypothesis
- simple linear regression model calculator
- simple linear regression uses
- simple linear regression model pdf
- simple linear regression practice problems
- simple linear regression least squares
- simple linear regression excel
- simple linear regression example pdf
- simple linear regression example questions
- simple linear regression business example