Assumptions for Linear Regression



Summary: Basic Concepts of Linear Regression Analysis (one independent variable)

Regression analysis is a statistical technique for modeling and investigating the relationship between 2 or more variables. For an established relationship, it is used for prediction of the dependent variable for a given independent variable.

Model for two variables:

μ(y/x) = β0 +β1X +ε

ε: Ν(ο,σ2) − This notation means ε is a random variable - called random error. (It is the vertical deviation of a response from the fitted line.)

• For any value of Xi, ε is normally distributed.

• For any value of Xi, the mean and variance are μ=E(ε)=0 and Var(ε)=σ2.

• Random errors are mutually independent.

E(Y/X) = E(β0 +β1X +ε)

Y = β0 +β1X+0

β0, β1, and σ2: unknown model parameters

[pic]=b0 + b1X where b0 and b1 estimate β0 andβ1 in the model

1. Construct a scatter plot of the sample data and consider the relationship. Is it linear? Quadratic? Other?

2. Check the assumptions of the model.

1) There is no error in the X-values. They are set prior to the experiment. The X variable is not random. This is established by the experimental design.

2) At each X, there is a normal distribution of Y-values that are independent of each other. The variance σ2 of the normal distribution at each X-value is the same. (sLF estimates the standard deviation.) This assumption is referred to as homoscedasticity. To evaluate check for normal (random) scatter – plot the residuals vs. X (or plot residuals vs. the fits). The plot should reflect a random scatter of points about 0 on the vertical axis

3) The random error terms e are independent and, for any value of X, have a normal distribution with mean 0 and variance σ 2. To check, examine a normal probability plot.

3. Write the equation for [pic] using the least square method.

4. Examine R2 and sLF. What do they tell you about the relationship?

R2 is the coefficient of determination. It is the percent of raw variation in Y accounted for by using the fitted equation.

sLF estimates the common standard deviation in Y for a fixed X.

sLF 2 = [pic]= [pic]

5. Test the slope of the line to see if there is a significant relationship between the two variables.

Test the following hypothesis.

Ho: β1 = 0

Ha: β1 [pic] 0

b1 has a normal distribution with μ b1 = β1 and

Var (b1) = [pic] (df = n-2)

Use t = [pic]

Failure to reject H0 means no linear relationship between X and Y.

6. Establish a confidence interval estimate of β1.

b1+ t *[pic]

7. Establish a confidence interval estimate of a predicted Y value. A confidence interval estimate is an estimate of the mean Y for a fixed value of X. (df = n-2)

[pic]

8. Construct a prediction interval for Y. A prediction interval predicts an additional single observation of Y for a particular (fixed) value of X. (df = n-2)

[pic]

Predict values of Y for the given X. Be careful not to extrapolate too much from the given data.

ANOVA Approach

The ANOVA approach is an additional method that is used to examine the model.

SST [pic] Sum Squares Total - Variation of the values around their mean

SSE [pic] Sum Squares Error – Residuals – Unexplained Variation – Variation of the value from the predicted value (for a fixed X) – Random variation, variation that can be attributed to factors other than the relationship

SSR [pic] Sum Squares Regression – Explained Variation – Variation of the predicted values from the mean – Variation than can be attributed to the relationship between X and Y

SST = SSR + SSE

R2 = [pic]

F = [pic]

F is the ratio of explained variation to unexplained variation. If more variation is explained, F>1. Use the F table to check significance.

The F test is used to test the following hypothesis.

Ho: β1 = 0

Ha: β1 [pic] 0

F = [pic] = [pic] = [pic]

The information is generally organized in a table as follows.

|Souace |SS |df |MS |F |

|Regression |SSR |1 |SSR/1 |MSR/MSE |

|Error |SSE |n-2 |SSE/(n-2) | |

|Total |SST |n-1 | | |

See an Excel Example:

-----------------------

(X,[pic])

Y

[pic]

(X,Y)

[pic]

Xi

[pic]= SSR

[pic]=SSE

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download