Correlation and Regression



Unit 8 Chapter 9 Correlation and Regression

Scatter Diagram and Linear Correlation

A scatter diagram is a graph in which data points (x, y) are plotted as individual points on a grid with horizontal axis x and vertical axis y. The x variable is called the explanatory variable. The y is the response variable.

By observing the scatter diagram it can be observed if there may be a linear relationship between the x and y values. Correlation will give us tools to determine if there exists a relationship and how strong the relationship is if it does exist. A linear relationship is what we are looking for.

A veterinary science study was conducted to study the weight of Shetland Ponies. The question poses was “How much should a healthy Shetland Pony weight?” The follow data was observed and expanded to develop a correlation for the situation. Then it was desired to construct a line of best fit for the data.

| |

|Weight of Shetland Ponies |

| |

| |

| |

| |

| |

|x = age of the pony (in months) y = average weight of the pony (in kilograms) |

| |

| |

| |

| |

| |

| |

| |

| |

| |

| |

|x |

|y |

|x^2 |

|y^2 |

|xy |

| |

| |

| |

|3 |

|60 |

|9 |

|3600 |

|180 |

| |

| |

|n = 5 |

|6 |

|95 |

|36 |

|9025 |

|570 |

| |

| |

| |

|12 |

|140 |

|144 |

|19600 |

|1680 |

| |

| |

| |

|18 |

|170 |

|324 |

|28900 |

|3060 |

| |

| |

| |

|24 |

|185 |

|576 |

|34225 |

|4440 |

| |

| |

| |

| |

| |

| |

| |

| |

| |

| |

|Totals |

|63 |

|650 |

|1089 |

|95350 |

|9930 |

| |

| |

A scatter diagram shows the point observed in the applications. The points show a close to linear pattern with the y increasing as the x increases.

[pic]

The Sample Correlation Coefficient r can be calculated to give a measure showing the strength on the linear association between the two variables.

1) The calculated r is between -1 and 1.

2) If r is = -1, there is a perfect negative correlation which means as the x variable increase, the y variable decrease.

3) If r is 1, there is a perfect positive correlation which means as the x variable increase, the y variable increase.

4) If r = 0, there is no linear correlation.

5) The closer r is to -1 and 1, the better/stronger the relationship.

Correlation Coefficient

[pic]

Use Excel to construct a table to calculate these totals.

|x = age of the pony (in months) y = average weight of the pony (in kilograms) |

| | | | | | | |

| |x |y |x^2 |y^2 |xy | |

| |3 |60 |9 |3600 |180 | |

|n = 5 |6 |95 |36 |9025 |570 | |

| |12 |140 |144 |19600 |1680 | |

| |18 |170 |324 |28900 |3060 | |

| |24 |185 |576 |34225 |4440 | |

| | | | | | | |

|Totals |63 |650 |1089 |95350 |9930 | |

n = 5 ( xy = 9930 ( x = 63 ( y = 650 ( x2 = 1089 ( y2 = 95350

[pic]

Since r = 0.972 is close to 1, there is a very high positive linear correlation.

|Strength of Correlation |

|Size of r Interpretation |

| |

|Note: These values could be positive and negative. |

|Only positive numbers are shown. |

| |

|0.90 to 1.00 - very high |

|0.70 to 0.89 - high |

|0.50 to 0.69 - moderate |

|0.30 to 0.49 - low |

|0.00 to 0.29 - little, if any |

Linear Regression and the Coefficient of Determination

The scatter diagram below has a least-squares line overlaid in the grid. Excel uses the Trendline option to produce the line. But you should use the formula given to calculate the equation of the line.

[pic]

Least-squares line [pic] where a is the intercept and b is the slope.

[pic] is pronounced y -hat

Using the Excel sheet for the values--

First find sample mean for x: [pic] and

sample mean for y:[pic]

Slope [pic]

Intercept [pic]

Therefore the regression line is [pic].

(Note that the value in the excel line may vary slightly due to rounding.)

Using the least-squares line for prediction:

Making predictions is the main application of linear regression. The least-squares line can be used to predict [pic] values for corresponding x values. There are two types of predictions.

1) Interpolation: Predicting [pic] values that are between observed x values in the data set.

For example, find [pic] for a 10 year old pony.

[pic]= 55.79 + 5.89 (10) = 114.69 lb

2) Extrapolation: Predicting [pic] values that are beyond observed x values in the data set. Extrapolation to far beyond observed x values may be unreasonable at some point.

For example, find [pic] for a 30 year old pony.

[pic]= 55.79 + 5.89 (30) = 203.04 lb

Coefficient of Determination r2 is formed by squaring the correlation coefficient r.

r ( 0.792, r2 ( 0.945

The coefficient of determination is a measurement of proportion of the variation in y explained by the regression line, using x as the explanatory variable.

For r2 ( 0.945, then 94.5% of variation of y can be explained by x if we use the regression line. In addition, 5.5% of the variation is due to random chance or possibly a lurking variable.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download