AP Statistics - Miami Killian Senior High School



AP Statistics Chapter 14 Inference for Regression: The regression slope

Recall the concepts from relationships between two quantitative variables:

• Scatterplots

• Least-squares regression line

• Outliers - points which lie far away from the overall linear pattern

• Influential observations - points which, if removed, would cause a large change in the position of the regression line

Inference is not appropriate if there are influential points.

Compute the correlation coefficient r and r2 in order to determine the strength of the linear relationship. The r2 value is proportion of the observed variation in y that is caused by the linear relationship of y to x

The regression model

[pic]

The slope, b, and the y intercept, a, are statistics, because they are computed from the sample data.

Assumptions for Regression Inference

We have n observations on an explanatory variable, x, and a response variable, y. The goal of inference is to predict the behavior of y for given values of x.

• For any fixed value of x, the response, y, varies according to a normal distribution. Repeated responses, y, are independent of each other.(see diagram on p. 757)

• The mean response, [pic], has a straight line relationship with x, given by the model [pic], where the slope[pic]and the y-intercept [pic] are unknown parameters.

• The standard deviation of y, [pic], is the same for all values of x. The value of [pic]is unknown. Small [pic] = points are tightly clustered around the true regression line. Large [pic] = points are widely scattered around the true regression line.

Ways to check these assumptions will be discussed later.

14.1 Inference About the Model

[pic] regression line computed from sample

[pic] true regression line

The value a is an unbiased estimator of [pic], the true intercept.

The value b is an unbiased estimator of [pic], the true slope.

The standard deviation [pic] describes the variability of the response, y, about the true regression line. The residuals are the estimate of this variability.

Recall that the residuals are the vertical deviations of the points from the least-squares regression line.

Residual = observed y - predicted y

= y - [pic]

The standard deviation,[pic], can be estimated by the sample standard deviation of the residuals - called standard error about the line, s.

[pic]

s is an unbiased estimator of [pic].

Example

Archaeopteryx is an extinct beast having feathers like a bird but teeth and a long bony tail like a reptile. Here are the lengths in centimeters of the femur (a leg bone) and the humerus (a bone in the upper arm) for the five fossil specimens that preserve both bones:

|Femur |38 56 59 64 74 |

|Humerus |41 63 70 72 84 |

The strong linear relationship between the lengths of the two bones helped persuade scientists that all five specimens belong to the same species.

a) Graphically examine the data, with femur length as the explanatory variable. Calculate the correlation r and the equation of the least-squares regression line. Do you think that femur length will allow good prediction of humerus length?

b) Complete the following table with the residual for each data point. (remember - your calculator automatically calculates the residuals)

|Femur |38 |56 |59 |64 |74 |

|Residual | | | | | |

Check that the sum of the residuals is 0.

c) The model for regression inference has three parameters, [pic]. Estimate these parameters from the data. (to get the sum of the residuals2, get 1-var stats on the residuals)

d) Explain in words what the slope [pic] says about the bones in Archaeopteryx.

Confidence Intervals for Regression Slope

The slope [pic] is usually the most important parameter in practical regression problems.

The slope is the rate of change of the mean response as the explanatory variable increases.

|A level C confidence interval for the slope [pic] of the true regression line is |

|[pic] |

|where |

| |

|[pic] |

| |

|t* is the upper [pic] critical value from the t distribution with n - 2 degrees of freedom. |

|Note: regression software will give you the value for b and SEb |

| |

Testing the hypothesis of no linear relationship

If the regression line has a slope of zero, then it is horizontal. This means that the mean of y does not change when x changes.

The null hypothesis says that there is no linear relationship between x and y. This is equivalent to saying that there is no correlation between x and y. If r = 0, then the slope is zero.

|To test the hypothesis H0: [pic] = 0, |

|use the test statistic [pic], |

|where t has n - 2 degrees of freedom. |

Statistical software usually gives a 2-sided P-value. So, if Ha is one-sided, divide the P-value from the output by 2.

Example

Refer to the Archaeopteryx example. Here is part of the regression output from a statistical software package:

| |Coef |Std. Err |t.stat |p.value |

|Intercept |-3.6596 |4.4590 |-0.8207 |0.4719 |

|Femur |1.1969 |0.0751 |* |* |

a) What is the equation of the least-squares regression line?

b) Use the output to determine the value of the t statistic in testing the null hypothesis

H0: [pic] = 0

c) What are the degrees of freedom for t?

d) Approximate the P-value of the test against the alternative Ha : [pic] > 0

Explain the conclusion to this test in plain language.

e) Give a 95% confidence interval for the slope [pic] of the true regression li

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download