Section 10



Chapter 4

The student will be able to:

1. Use the scatter diagram and linear correlation coefficient to determine whether a linear relationship exists between two variables.

2. Determine the regression line for bivariate data.

3. Test hypotheses about correlation coefficients.

4. Understand that correlated data may not have a causal relationship.

5. Determine the best prediction relative to correlation.

Section 4.1 – Scatter Diagrams and Correlation

Objectives

1. Draw and interpret scatter diagrams

2. Describe the properties of the linear correlation coefficient

3. Compute and interpret the linear correlation coefficient

4. Determine whether a linear relation exists between two variables

5. Explain the difference between correlation and causation

Objective 1 – Draw and interpret scatter diagrams

Univariate data – One variable

BiVariate data – Two variables

Credit Score Interest Rate (%)

545 19

595 18

640 12

675 9

705 7

750 5

The response variable is the variable whose value can be explained by the value of the explanatory or predictor variable.

A scatter diagram is a graph that shows the relationship between two quantitative variables measured on the same individual. Each individual in the data set is represented by a point. The explanatory variable is plotted on the horizontal axis, and the response variable is plotted on the vertical axis.

Correlation – There is a correlation between two variables when one of them is related to the other in some way.

Example

|Temperature |Cricket Chirps |

|83 |1025 |

|72 |960 |

|88 |1200 |

|84 |1100 |

|80 |900 |

|76 |860 |

|70 |880 |

|93 |1180 |

Enter the data above into L1 and L2 and draw a scatter plot

[pic]

Looking at a scatter diagram can help you determine if the variables have a linear relationship.

(pg. 192)

Two variables that are linearly related are positively associated when above-average values of one variable are associated with above-average values of the other variable and below-average values of one variable are associated with below-average values of the other variable. That is, two variables are positively associated if, whenever the value of one variable increases, the value of the other variable also increases.

As the explanatory variable goes up the response variable goes up and at a constant rate.

Two variables that are linearly related are negatively associated when above-average values of one variable are associated with below-average values of the other variable. That is, two variables are negatively associated if, whenever the value of one variable increases, the value of the other variable decreases.

Looking at a scatter plot of the data can help you determine if the two variables are positively associated, negatively associated, or have no association. (pg. 194)

Objective 2 - Describe the Properties of the Linear Correlation Coefficient

The linear correlation coefficient measures the strength and direction of the linear relation between two quantitative variables. The Greek letter ρ (rho) represents the population correlation coefficient, and r represents the sample correlation coefficient.

[pic] (round to 3 decimal places)

where

[pic] is the sample mean of the explanatory variable

sx is the sample standard deviation of the explanatory variable

[pic]is the sample mean of the response variable

sy is the sample standard deviation of the response variable

n is the number of individuals in the sample

[pic]

Sample scatter plots with associated value for r

[pic]

Objective 3 - Compute and Interpret the Linear Correlation Coefficient

[pic]

The default for LinReg is the explanatory variable is in L1 and the response variable is in L2. If the explanatory and response variables are in different lists than L1 and L2 then enter the lists after LinReg, for example

LinReg (ax+b) L3, L4

NOTE: the formula chart does not mention LinReg. Also, r can be found using LinRegTTest (see instructions below)

Example

Find r for the following set of data.

|Temperature |Cricket Chirps |

|83 |1025 |

|72 |960 |

|88 |1200 |

|84 |1100 |

|80 |900 |

|76 |860 |

|70 |880 |

|93 |1180 |

Objective 4 - Determine whether a linear relation exists between two variables

Method 1: P-Value approach

Since the formula chart specifically mentions LinRegTTest, we will prefer the P-value approach instead of the critical value approach.

Note this approach is not listed in the book.

[pic]

[pic]

Method 2: Critical value approach

Following is the critical value approach which is the approach given in the book

[pic]

Critical Values for Correlation Coefficient (Table II Appendix A from book)

|n |Critical Value |n |Critical Value |

|1 |0.997 |21 |0.413 |

|2 |0.950 |22 |0.404 |

|3 |0.878 |23 |0.396 |

|4 |0.811 |24 |0.388 |

|5 |0.754 |25 |0.381 |

|6 |0.707 |26 |0.374 |

|7 |0.666 |27 |0.367 |

|8 |0.632 |28 |0.361 |

|9 |0.602 |29 |0.355 |

|10 |0.576 |30 |0.349 |

|11 |0.555 |40 |0.304 |

|12 |0.532 |50 |0.273 |

|13 |0.514 |60 |0.250 |

|14 |0.497 |70 |0.232 |

|15 |0.482 |80 |0.217 |

|16 |0.468 |90 |0.205 |

|17 |0.456 |100 |0.195 |

|18 |0.444 | | |

|19 |0.433 | | |

|20 |0.423 | | |

Example

Assume that 20 pairs of data result in a value of r = 0.855. Is there a linear relation between x and y?

Example

Assume that 10 pairs of data result in a value of r = 0.601. Is there a linear relation between x and y?

Example

Is there a linear relationship between temperature and cricket chirps? Use the P-value approach and ( = 0.05.

|Temperature |83 |

|139 |110 |

|138 |60 |

|139 |90 |

|120.5 |60 |

|149 |85 |

|141 |100 |

|141 |95 |

|150 |85 |

|166 |155 |

|151.5 |140 |

|129.5 |105 |

|150 |110 |

Does a linear relationship exist between the weight of the bear and it’s height? Use the P-value approach and ( = 0.05.

Objective 5 - Explain the difference between correlation and causation

Note, do note read “causal” as “casual, ” not the same!

Causation can only come from designed experiments, not observational studies.

A lurking variable is related to both the explanatory and response variables. Two variables can be correlated without there being a causal relationship through a lurking variable.

Causation

If there is a significant linear correlation between two variables, then one of five situations can be true.

• There is a direct cause and effect relationship

• There is a reverse cause and effect relationship

• The relationship may be caused by a third variable

• The relationship may be caused by complex interactions of several variables

• The relationship may be coincidental

Common Errors

There are some common errors that are made when looking at correlation.

• Avoid concluding causation. Just because there is a linear relationship doesn't mean that one thing caused the other. It could be any of the five situations above.

• Avoid data based on rates or averages. Variation is suppressed when using a rate or an average. Remember the central limit theorem? The variance of the sample means was the variance of the population divided by the sample size. So, if you work with averages, the variances are smaller and you might be able to find linear relationships that are significant when they would not be if the original data was used.

• Watch out for linearity. All that we're testing here is the strength of a linear relationship. There are other kinds of relationships. In algebra, we talk about linear, quadratic, cubic, quartic, exponential, logarithmic, Gaussian (bell shaped), logistics, and power models. A scatter plot is a good way to look for patterns.

Section 4.2 – Least-Squares Regression

Objectives

1. Find the least-squares regression line and use the line to make predictions

2. Interpret the slope and the y-intercept of the least-squares regression line

3. Compute the sum of squared residuals

Objective 1 - Find the least-squares regression line and use the line to make predictions

Once the linear correlation coefficient has indicated that a linear relationship exists between two variables, our next step is to find a linear equation that describes the relationship between the two variables.

The goal of this section is to find not just any linear equation, but the “best” linear equation that fits our data.

What does “best” mean?

We will define “best” in terms of residuals, or errors. A residual is the difference between an observed y-value (y) and predicted y-value ([pic]). The predicted y-value comes from the line we chose to represent the data.

From an example in the book

[pic]

Residual = [pic]

Positive residuals indicate that a data point is above the line, i.e., above average

Negative residuals indicate that a data point is below the line, i.e., below average.

So the definition of “best” is to minimize the sum of the squared residuals

[pic]

The line of best-fit or the least-squares regression line is the line that minimizes the sum of the squared residuals.

[pic]

On the calculator it will be [pic].

Relate this equation back to the slope-intercept form for a linear equation, [pic], that you learned in Algebra, where m is the slope and b is the y-intercept.

x is called the predictor variable & y is called the response variable

The good news is that the calculator will do all of the work for us. Use either LinReg or LinRegTTest to get the a and b values to form the least-squares regression line. Calculator instructions for these formulas was presented earlier.

[pic]

Example

Use your calculator to find the least-squares regression line for the following set of data:

|Temperature |83 |

|139 |110 |

|138 |60 |

|139 |90 |

|120.5 |60 |

|149 |85 |

|141 |100 |

|141 |95 |

|150 |85 |

|166 |155 |

|151.5 |140 |

|129.5 |105 |

|150 |110 |

Use your calculator to find the least-squares regression line.

Predict the weight of a bear if the length is 150cm. 160cm. 200cm

Objective 2 - Interpret the slope and the y-intercept of the least-squares regression line

The y-intercept of any line is the point where the line intersects with the vertical axis. Find the y-intercept by letting x=0 in the equation and solving for y.

To interpret the y-intercept, first ask two questions?

1. Is 0 a reasonable value for the explanatory variable?

2. Do any observations near x = 0 exist in the data set?

If the answer is no to either question, do not interpret the y-intercept.

Do not use the regression model to make predictions outside the scope of the model. That is, do not use the regression model for values of the explanatory variable that are much smaller or larger than the observed data.

The x-intercept is the rate of change, on average.

Example – Cricket Chirps

|Temperature |83 |

|139 |110 |

|138 |60 |

|139 |90 |

|120.5 |60 |

|149 |85 |

|141 |100 |

|141 |95 |

|150 |85 |

|166 |155 |

|151.5 |140 |

|129.5 |105 |

|150 |110 |

Interpret the slope and y-intercept of the least-squares regression line found earlier.

-----------------------

Calculator Instructions for r

First enable diagnostics by selecting the catalog (2nd 0). Scroll down and select DiagnosticOn. Hit ENTER twice to activate diagnostics. You only have to do this once.

1. Enter the explanatory variable in L1

2. Enter the response variable in L2

3. Press STAT, CALC, and select 4: LinReg (ax+b)

4. Press ENTER

How to test for a linear relation between two variables

1. Determine the absolute value of r.

2. Find the critical value in Table II (Appendix A) for the given sample size.

3. If the absolute value of r exceeds the critical value, then a linear relation exists between the two variables.

To show a scatter plot on the calculator

1. Enter data into L1 and L2

2. 2nd Y= (to invoke STAT PLOT)

3. Select 1: Plot1, select On, ensure Type is the scatter plot. If not, arrow down to the next line and highlight scatter plot (the first image). Press ENTER

4. Ensure XList is L1 and YList is L2. If not, highlight XList and press 2nd 1, then highlight YLiis L2. If not, highlight XList and press 2nd 1, then highlight YList and press 2nd 2.

5. Press ZOOM then 9:ZoomStat

Properties of the Linear Correlation Coefficient

1. –1 ≤ r ≤ 1.

2. If r = + 1, then a perfect positive linear relation exists between the two variables.

3. If r = –1, then a perfect negative linear relation exists between the two variables.

4. The closer r is to +1, the stronger is the evidence of positive association between the two variables.

5. The closer r is to –1, the stronger is the evidence of negative association between the two variables.

6. If r is close to 0, then little or no evidence exists of a linear relation between the two variables. So r close to 0 does not imply no relation, just no linear relation.

7. r is unitless. So the unit of measure for x and y plays no role in the interpretation of r.

8. r not resistant. Therefore, an observation that does not follow the overall pattern of the data could affect the value of the linear correlation coefficient.

Hypothesis test for determining whether a linear relationship exists between two variables

1. Determine the null and alternative hypothesis. For this class the alternative hypothesis will always be two-tailed

a. H0: ( = 0 (( is rho)

b. H1: ( ( 0

2. Select a level of significance, (. Remember to use 0.05 if not specifically listed

3. Calculate the test statistic and P-value from the calculator using LinRegTTest

4. Compare the P-value to (.

a. If P–value < α, Reject H0

b. If P–value > α, Do Not Reject H0

5. State conclusion. There will either be a significant (positive or negative) linear correlation or no significant linear correlation

Calculator instructions for LinRegTTest (These instructions are not listed in the book)

1. Enter explanatory data into L1 and response data into L2

2. Press STAT, TESTS, and select E: LinRegTTest

3. For XList select L1 (the explanatory data)

4. For YList select L2 (the response data)

5. Set Freq to 1

6. ( and (: select ( (we are only concerned with ()

7. Leave RegEQ blank

8. Press Calculate

Least-Squares Regression Criterion

The least-squares regression line is the line that minimizes the sum of the squared errors (or residuals). This line minimizes the sum of the squared vertical distance between the observed values of y and those predicted by the line [pic], (“y-hat”). We represent this as

“ minimize Σ residuals2 ”.

The Least-Squares Regression Line

The equation of the least-squares regression line is given by

[pic]

where

[pic] is the slope of the least-squares regression line

[pic] is the y-intercept of the least-squares regression line

Using Regression Lines to Make Predictions:

When predicting a y-value based on some value of x

• If there is a linear relation between the variables, use the regression line, [pic], to make your prediction. (i.e., substitute the x value into the regression equation)

• If there is no linear relation between the variables, use [pic]for your prediction. (i.e., the average of the y-values)

• When using the regression line to predict, stay within the scope of the sample data.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download