Kenwood Academy



AP Statistics

Chapter 3 Notes – Examining Relationships

|3.2 Least-Squares Regression |Objectives: |

| |Part I : 35, 37, 39, 41 |

| |Interpret the slope and y intercept of a least squares regression line in context. |

| |Use the least-squares regression line to predict y for a given x. |

| |Explain the dangers of extrapolation. |

| |Part II: 43, 45, 47, 53 |

| |Calculate and interpret residuals in context. |

| |Explain the concept of least squares. |

| |Use technology to find a least-squares regression line. |

| |Find the slope and intercept of the least squares regression line from the means and standard deviations of x and y and |

| |their correlation. |

| |Part III: 49, 54, 56, 58– 61 |

| |Construct and interpret residual plots to assess if a linear model is appropriate. |

| |Use the standard deviation of the residuals to assess how well the line fits the data. |

| |Use r2 to assess how well the line fits the data. |

| |Interpret the standard deviation of the residuals and r2 in context. |

| |Part IV: 63, 65, 68, 69, 71–78 |

| |Identify the equation of a least-squares regression line from computer output. |

|Regression line |Explain why association doesn’t imply causation. |

| |Recognize how the slope, y intercept, standard deviation of the residuals, and r2 are |

| |influenced by extreme observations. |

| | |

| | |

| |Least-squares regression is a method for finding a line that summarizes the relationship between two variables. |

| | |

| |Regression line – A straight line that describes how a response variable y changes as an explanatory variable x changes. |

|See Example page 164-165 |(Sound familiar) We often use a regression line to predict the value of y for a given value of x. |

| | |

| |If we believe that the data show a linear trend, then it would be appropriate to try to fit a least-squares regression |

| |line, LSRL, to the data. |

| | |

| | |

| | |

| |When nonexercise activity = 800 cal, our line predicts a fat gain of |

| |about 0.8 kg after 8 weeks. |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

|Interpreting a Regression Line | |

| | |

| |Different people might draw different lines by eye on a scatterplot. This is especially true when the points are widely |

| |scattered. So we need a regression line that isn’t dependent on our guess. No line will pass through all the points, |

| |but we want one that is as close as possible. |

| | |

| | |

| | |

| | |

| | |

| |Suppose that y is a response variable (plotted on the vertical axis) and x is an explanatory variable (plotted on the |

| |horizontal axis). A regression line relating y to x has an equation of the form |

| |ŷ = a + bx |

| |In this equation, |

| |ŷ (read “y hat”) is the predicted value of the response variable y for a given value of the explanatory variable x. |

| |b is the slope, the amount by which y is predicted to change when x increases by one unit. |

| |a is the y intercept, the predicted value of y when x = 0. |

| | |

| | |

| | |

| |The slope b = -0.00344 tells us that the amount of fat gained is predicted to go |

|Predictions |down by 0.00344 kg for each added |

| |calorie of NEA. |

| | |

| |The y-intercept a = 3.505 kg is the |

| |fat gain estimated by this model |

| |if NEA does not change when |

| |a person overeats. |

| | |

| | |

| | |

| | |

| | |

| | |

| |We can use a regression line to predict the response ŷ for a specific value of the explanatory variable x. |

| | |

|Extrapolation | |

| |Use the NEA and fat gain regression line |

| |to predict the fat gain for a person whose |

| |NEA increases by 400 cal when she |

| |overeats. |

| | |

| | |

|Example: | |

| | |

| | |

| | |

| | |

| | |

| | |

|Residuals | |

| | |

| |We can use a regression line to predict the response ŷ for a specific value of the explanatory variable x. The accuracy |

| |of the prediction depends on how much the data scatter about the line. |

| |While we can substitute any value of x into the equation of the regression line, we must exercise caution in making |

| |predictions outside the observed values of x. |

| | |

| |Extrapolation is the use of a regression line for prediction far outside the interval of values of the explanatory |

| |variable x used to obtain the line. Such predictions are often not accurate. |

| | |

| |Check Your Understanding page 167 |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| |Residual is the difference between an observed value of the response variable and the value predicted by the regression |

| |line. |

|Least-Squares Regression Line |Residual = observed y – predicted y |

| |= y - [pic] |

| |Because the residuals show how far the data fall from our regression line, examining the residuals helps assess how well |

| |the line describes the data. |

| | |

| | |

| |[pic] |

| | |

| | |

| |Different regression lines produce different residuals. The regression line we want is the one that minimizes the sum of|

| |the squared residuals. |

| | |

| |The least-squares regression line of y on x is the line that makes the sum of the squared residuals as small as possible.|

| | |

| |[pic] |

| | |

| | |

|Equation of the LSRL |Equation of the LSRL : [pic] |

| |With slope: [pic] |

| |And intercept: [pic] |

| | |

| |Every LSRL passes through the point ([pic],[pic]) and the slope is equal to the product of the correlation and the |

| |quotient of the standard deviations. |

| |The slope is the rate of change, or the amount of change in [pic] when x increases by 1. |

| |The intercept of the regression line is the value of [pic]when x = 0. |

| | |

|Least-squares lines on the |Technology Corner page 170-171 |

|calculator | |

| |(AP Exam Tip page 172) |

|Example : | |

| |Check Your Understanding page171 |

| | |

| | |

| | |

| |Page 192 #48 |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

|Residual Plot |A residual plot is a scatterplot of the residuals against the explanatory variable. Residual plots help us assess how |

| |well a regression line fits the data. |

| |A residual plot magnifies the deviations of the points from the line, making it easier to see unusual observations and |

| |patterns. |

| |The residual plot should show no obvious patterns |

| |The residuals should be relatively small in size. |

| | |

| |What to look for when you are examining the residuals: |

| |Uniform scatter of points indicates that the regression line fits the dats well, so the line is a good model. |

| | |

| |[pic] |

| |A curved pattern shows that the relationship is not linear. |

| | |

| | |

| |[pic] |

| |Increasing or decreasing spread about the line as x increases indicates that prediction of y will be less accurate for |

| |larger x. |

| | |

| |[pic] |

| |Check Your Understanding page 176 |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| |The residuals from the least squares line have a special property, the mean of the least squares residuals is always 0. |

| |That’s because the positive and negative residuals “balance each other out”. |

| | |

|Example: |To tell how far the predictions are, on average , we use the standard deviation of the residuals |

| |The standard deviation of the residuals (s) is given by |

| | |

| |[pic] |

| | |

| |This gives us the average vertical distance of the data points from the regression line. |

| | |

| | |

| |Technology Corner page 178 – Residual Plots and s on the calculator |

| | |

| | |

| |Check Your understanding page 179 |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

|Example: | |

| | |

| | |

| | |

| |The coefficient of determination, r², measures how well how well the least-squares regression line predicts values of the|

| |response y. Squaring the correlation gives us a better idea of the strength of the association. |

| |Perfect correlation mean the points lie exactly on a line (r = 1 or r = -1). This means r² = 1 (100%) and all of the |

| |variation in one variable is accounted for by the linear relationship with the other variable. If r = -0.7 or 0.7, then |

| |r² = 0.49 (49%), or about half of the variation is accounted for by the linear relationship. |

| |R² is an overall measure of how successful the regression line is in relating y to x. When you report a regression, be |

| |sure to give r² as a measure of how successful the regression was in explaining the response. |

| | |

| |The coefficient of determination r² is the fraction of the variation in the values of y that is accounted for by the |

| |least-squares regression line of y on x. We can calculate r2 using the following formula: |

|The role of r² in regression |[pic] |

| |where [pic] |

| |and [pic] |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| |When interpreting r² , write “ _____% of the variation in [response variable name] is accounted for by the regression |

| |line.” |

| | |

| | |

| |Check Your Understanding page 181 |

| | |

| | |

| | |

| | |

| | |

| | |

|Example: | |

| | |

| | |

| | |

| | |

| | |

|Interpreting Computer Regression | |

|Output | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

|Example Pages 182-183: | |

| | |

| | |

| | |

| | |

| | |

| | |

|Correlation and Regression Wisdom |1. The distinction between explanatory and response variables is important in regression. |

| |2. Correlation and regression lines describe only linear relationships. |

| |3. Correlation and least-squares regression lines are not resistant. |

| |4. Association does not imply causation. |

| |An association between an explanatory variable x and a response variable y, even if it is very strong, is not by itself |

| |good evidence that changes in x actually cause changes in y. |

| | |

| | |

| |An outlier is an observation that lies outside the overall pattern of the other observations. Points that are outliers in|

| |the y direction but not the x direction of a scatterplot have large residuals. Other outliers may not have large |

| |residuals. |

| | |

|Example page 187 |An observation is influential for a statistical calculation if removing it would markedly change the result of the |

| |calculation. Points that are outliers in the x direction of a scatterplot are often influential for the least-squares |

| |regression line. |

| | |

| |[pic] |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| |A serious study once found that people with |

|Example: |two cars live longer than people who only own |

| |one car. Owning three cars is even better, |

| |and so on. There is a substantial positive |

| |correlation between number of cars x and length |

| |of life y. Why? |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

| | |

|Summary |

-----------------------

If we use the mean backpack weight as our prediction, the sum of the squared residuals is 83.87.

SST = 83.87

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download