Chapter 3



Ap Stats 3.3 Correlation and Regression Wisdom

Limitations of correlation & regression:

1) only works for linear relationships

2) extrapolation can be unreliable

3) not resistant

Outliers & Influential Observations In Regression

outlier - lie away from the overall pattern of the observations (in y direction have large residuals); “oval rule”, not necessarily influential.

influential observations - if removing them markedly change the calculations (x outliers are often influential)

Example #1:

Height (x) : 62 64 66 70 72 68 60 67 84

Weight (y) : 125 130 140 160 180 155 105 220 270

1) scatterplot

2) regression line + graph

3) r, r2

4) possible outliers?

5) influential? redo the calculations to check

Lurking Variable - variable that’s not explanatory or response, but influences the relationships between the variables

Example #2:

# of Methodist Amt. of Cuban Rum

Year Ministers in Boston Imported into Boston

1860 63 8376

1865 48 6406

1870 53 7005

1875 64 8486

1880 72 9595

1885 80 10643

1890 85 11265

1895 76 10071

1900 80 10547

1905 83 11008

1910 105 13885

1915 140 18559

Describe the relationship.

Is there a correlation between more ministries and amount of rum imported?

Homework p 238 59 p 242 63 - 65

Review: p 251 77 - 79

1) Height (x) : 60 64 68 72 63 65

Weight (y) : 100 130 150 200 100 220

a) scatterplot

b) regression line + graph

c) what is the slope and interpret it

d) what is the y-intercept and interpret it

e) r

f) r2 and explain

g) possible outliers?

h) influential? redo the calculations to check

i) residual plot

j) predict weight for height of 24 inches

k) predict height for weight of 250 lbs.

2) For the following data, use the oval rule to determine outliers. Test each outlier to determine if it is influential or not.

SAT-M: 400 500 600 650 550 450 500 550 600 650 400 750 200

SAT-V: 450 510 450 700 500 400 520 450 600 600 750 750 220

a. Draw the scatterplot of the regression of SAT-V on SAT-M. Interpret it. Use the oval rule to determine outliers. Test each outlier to determine if it is influential or not.

[pic]

b. Create a residual plot. Use it to interpret the linear fit.

[pic]

c. Interpret linear fit of SAT-V on SAT-M using r.

d. Find and interpret r2.

e. Find r for the regression of SAT-M on SAT-V.

f. Double each SAT-M score and find r for SAT-V on SAT-M.

g. After d) add 50 points to each SAT-V score and find r for SAT-M on SAT-V.

h. What can we say about r and linear transformations?

3) Is there a relationship between Auto Mechanic Aptitude test and number of hours grade school children watch TV? A group of mechanics was surveyed. Their TV hours is normally distributed with a mean of 20 with a standard deviation of 2 hours. Their average test score is 270 with a standard deviation of 35. If correlation is 0.5110:

a) Find the equation of the best-fit line.

b) Find r2 and explain its meaning.

c) Predict the test score for an auto mechanic who watched TV 37 hours.

4) If the best-fit line for predicting weight from height if ŷ = 5x -120, find the correlation if [pic] = 10, sx = 2, [pic] = 100, and sy = 30.

5) Shown below is output from Minitab:

Dependent variable is: height

R squared = 98.9% R squared (adjusted) = 98.8%

s = 0.256 with 12 – 2 degrees of freedom

Variable Constant s.e. of Coeff t-ratio prob

Constant 64.9283 0.5084 128 0.0001

Temp 0.634965 0.0214 29.7 0.0001

a) Find the best-fit line.

b) How many observations were used to create the output?

c) Interpret the relationship.

d) If the independent variable is “age”, find the residual for the observation with age 40 and “height” 100.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download