Stat 112 Review Notes for Chapter 3, Lecture Notes 1-5

Stat 112 Review Notes for Chapter 3, Lecture Notes 1-5

1. Simple Linear Regression Model: The simple linear regression model for the mean of [pic]given [pic]is

[pic] (1.1)

where [pic]=slope=change in mean of [pic]for each one unit change in [pic];

[pic]=intercept=mean of [pic]given[pic]. The disturbance [pic]for the simple linear regression model is the difference between the actual [pic]and the mean of [pic]given [pic] for observation [pic]: [pic]. In addition to (1.1), the simple linear regression model makes the following assumptions about the disturbances [pic]:

(i) Linearity assumption: [pic]. This implies that the linear model (1.1) for the mean of [pic]given [pic]is the correct model for the mean.

(ii) Constant variance assumption: The disturbances [pic]are assumed to all have the same variance [pic].

(iii) Normality assumption: The disturbances [pic]are assumed to have a normal distribution.

(iv) Independence assumption: The disturbances [pic]are assumed to be independent.

2. Least Squares Estimates of the Simple Linear Regression Model: Based on a sample [pic], we estimate the slope and intercept by the least squares principle --

we minimize the sum of squared prediction errors in the data, [pic]. The least squares estimates of the slope and intercept are the [pic] and [pic]that minimize the sum of squared prediction errors. Some properties of the least squares estimates are:

(i) Unbiased estimators: The means of the sampling distribution of [pic] and [pic]are [pic] and [pic] respectively.

(ii) Consistent estimators: As the sample size [pic]increases, the probability that [pic] and [pic] will come close[pic] and [pic] respectively converges to 1.

(iii) Minimum variance estimators: The least squares estimators are the best possible estimators of [pic] and [pic] in the sense of having the smallest variance among unbiased estimators.

3. Residuals: The disturbance [pic]is the difference between the actual [pic]and the mean of [pic]given [pic]: [pic]. The residual [pic]is an estimate of the disturbance: [pic].

4. Using the Residuals to Check the Assumptions of the Simple Linear Regression Model: The residual plot is a scatterplot of the [pic]pairs, i.e., a plot of the [pic]variable versus the residuals. To check the linearity assumption, we check if [pic] is approximately zero for each part of the range of [pic]. To check the constant variance assumption, we check if the spread of the residuals remains constant as [pic]varies. To check the normality assumption, we check if the histogram of the residuals is approximately bell shaped. For now, we will not consider the independence assumption; we will consider it in Section 6.

5. Root Mean Square Error: The root mean square error (RMSE) is approximately the average absolute error that is made when using [pic] to predict [pic]. The RMSE is denoted by [pic] in the textbook.

6. Confidence Interval for the Slope: The confidence interval for the slope is a range of plausible values for the true slope [pic] based on the sample [pic]. The 95% confidence interval for the slope is [pic], where [pic]is the standard error of the slope, [pic]. The 95% confidence interval for the slope is approximately [pic].

7. Hypothesis Testing for the Slope: To test hypotheses for the slope, we use the t-statistic [pic] where [pic]is detailed below.

(i) Two-sided test: [pic] vs. [pic]. We reject [pic]if [pic] or [pic].

(ii) One-sided test I: [pic] vs. [pic]. We reject [pic]if [pic]

(iii) One-sided test II: [pic] vs. [pic]. We reject [pic]if [pic]

When [pic], we can calculate the p-values for these two tests using JMP as follows:

(i) Two-sided test: the p-value is Prob>|t|

(ii) One-sided test I: If [pic]is negative (i.e., the sign of the t-statistic is in favor of the alternative hypothesis), the p-value is (Prob>|t)/2. If [pic]is positive (i.e., the sign of the t-statistic is in favor of the null hypothesis), the p-value is 1-(Prob>|t)/2.

(iii) One-sided test II: If [pic]is positive (i.e., the sign of the t-statistic is in favor of the alternative hypothesis), the p-value is (Prob>|t)/2. If [pic]is negative (i.e., the sign of the t-statistic is in favor of the null hypothesis), the p-value is 1-(Prob>|t)/2.

8. R Squared: The R squared statistic measures how much of the variability in the response the regression model explains. R squared ranges from 0 to 1, with higher R squared values meaning that the regression model is explaining more of the variability in the response.

[pic]

9. Prediction Intervals: The best prediction for the [pic]of a new observation [pic]with [pic]is the estimated mean of [pic]given [pic]: [pic].

The 95% prediction interval for the [pic]of a new observation [pic]with [pic]is an interval that will contain the value of [pic]most of the time. The formula for the prediction interval is :

[pic][pic], where

[pic];

[pic];

[pic].

When n is large (say n>30), the 95% prediction interval is approximately equal to

[pic].

10. Cautions in Interpreting Regression Results:

(i) The regression of [pic]on [pic]measures the association between [pic]and [pic]. A strong association between [pic]and [pic]does not necessarily mean that changes in [pic]cause changes in [pic]. A strong association between [pic]and [pic]could be explained by [pic]causing changes in [pic]or by there being a lurking variable that is related to both [pic]and [pic].

(ii) The regression model cannot be relied on to make accurate predictions for the [pic]of[pic]that are outside the range of the observed [pic]’s, [pic]. The prediction intervals for the [pic]of [pic]that are outside the range of the observed [pic]’s are also not reliable. Trying to use the regression model to predict the [pic]of [pic]that are outside the range of the observed [pic]’s is called extrapolation.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

To fulfill the demand for quickly locating and searching documents.

It is intelligent file search solution for home and business.

Literature Lottery

To fulfill the demand for quickly locating and searching documents.

Related download

Related searches