Correlation and Regression
Unit 8 Chapter 9 Correlation and Regression
Scatter Diagram and Linear Correlation
A scatter diagram is a graph in which data points (x, y) are plotted as individual points on a grid with horizontal axis x and vertical axis y. The x variable is called the explanatory variable. The y is the response variable.
By observing the scatter diagram it can be observed if there may be a linear relationship between the x and y values. Correlation will give us tools to determine if there exists a relationship and how strong the relationship is if it does exist. A linear relationship is what we are looking for.
A veterinary science study was conducted to study the weight of Shetland Ponies. The question poses was “How much should a healthy Shetland Pony weight?” The follow data was observed and expanded to develop a correlation for the situation. Then it was desired to construct a line of best fit for the data.
| |
|Weight of Shetland Ponies |
| |
| |
| |
| |
| |
|x = age of the pony (in months) y = average weight of the pony (in kilograms) |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
|x |
|y |
|x^2 |
|y^2 |
|xy |
| |
| |
| |
|3 |
|60 |
|9 |
|3600 |
|180 |
| |
| |
|n = 5 |
|6 |
|95 |
|36 |
|9025 |
|570 |
| |
| |
| |
|12 |
|140 |
|144 |
|19600 |
|1680 |
| |
| |
| |
|18 |
|170 |
|324 |
|28900 |
|3060 |
| |
| |
| |
|24 |
|185 |
|576 |
|34225 |
|4440 |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
|Totals |
|63 |
|650 |
|1089 |
|95350 |
|9930 |
| |
| |
A scatter diagram shows the point observed in the applications. The points show a close to linear pattern with the y increasing as the x increases.
[pic]
The Sample Correlation Coefficient r can be calculated to give a measure showing the strength on the linear association between the two variables.
1) The calculated r is between -1 and 1.
2) If r is = -1, there is a perfect negative correlation which means as the x variable increase, the y variable decrease.
3) If r is 1, there is a perfect positive correlation which means as the x variable increase, the y variable increase.
4) If r = 0, there is no linear correlation.
5) The closer r is to -1 and 1, the better/stronger the relationship.
Correlation Coefficient
[pic]
Use Excel to construct a table to calculate these totals.
|x = age of the pony (in months) y = average weight of the pony (in kilograms) |
| | | | | | | |
| |x |y |x^2 |y^2 |xy | |
| |3 |60 |9 |3600 |180 | |
|n = 5 |6 |95 |36 |9025 |570 | |
| |12 |140 |144 |19600 |1680 | |
| |18 |170 |324 |28900 |3060 | |
| |24 |185 |576 |34225 |4440 | |
| | | | | | | |
|Totals |63 |650 |1089 |95350 |9930 | |
n = 5 ( xy = 9930 ( x = 63 ( y = 650 ( x2 = 1089 ( y2 = 95350
[pic]
Since r = 0.972 is close to 1, there is a very high positive linear correlation.
|Strength of Correlation |
|Size of r Interpretation |
| |
|Note: These values could be positive and negative. |
|Only positive numbers are shown. |
| |
|0.90 to 1.00 - very high |
|0.70 to 0.89 - high |
|0.50 to 0.69 - moderate |
|0.30 to 0.49 - low |
|0.00 to 0.29 - little, if any |
Linear Regression and the Coefficient of Determination
The scatter diagram below has a least-squares line overlaid in the grid. Excel uses the Trendline option to produce the line. But you should use the formula given to calculate the equation of the line.
[pic]
Least-squares line [pic] where a is the intercept and b is the slope.
[pic] is pronounced y -hat
Using the Excel sheet for the values--
First find sample mean for x: [pic] and
sample mean for y:[pic]
Slope [pic]
Intercept [pic]
Therefore the regression line is [pic].
(Note that the value in the excel line may vary slightly due to rounding.)
Using the least-squares line for prediction:
Making predictions is the main application of linear regression. The least-squares line can be used to predict [pic] values for corresponding x values. There are two types of predictions.
1) Interpolation: Predicting [pic] values that are between observed x values in the data set.
For example, find [pic] for a 10 year old pony.
[pic]= 55.79 + 5.89 (10) = 114.69 lb
2) Extrapolation: Predicting [pic] values that are beyond observed x values in the data set. Extrapolation to far beyond observed x values may be unreasonable at some point.
For example, find [pic] for a 30 year old pony.
[pic]= 55.79 + 5.89 (30) = 203.04 lb
Coefficient of Determination r2 is formed by squaring the correlation coefficient r.
r ( 0.792, r2 ( 0.945
The coefficient of determination is a measurement of proportion of the variation in y explained by the regression line, using x as the explanatory variable.
For r2 ( 0.945, then 94.5% of variation of y can be explained by x if we use the regression line. In addition, 5.5% of the variation is due to random chance or possibly a lurking variable.
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related searches
- correlation and regression pdf
- correlation and regression analysis pdf
- correlation and regression statistics
- correlation and regression ppt
- correlation and regression analysis examples
- correlation and regression examples pdf
- correlation and regression studies
- correlation and regression test
- correlation and regression example problems
- correlation and regression project
- correlation and regression calculator
- correlation and regression examples