Correlation and the Analysis of Variance Approach to ...
[Pages:18]Correlation and the Analysis of Variance Approach to Simple Linear Regression
Biometry 755 Spring 2009
Correlation and the Analysis of Variance Approach to Simple Linear Regression ? p. 1/35
Correlation review
Correlation quantifies the direction and strength of the linear association between two random variables. Consider the scatterplot of risk of nosocomial infection by routine culturing ratio. There appears to be a strong positively sloped linear relationship between the two variables. We would like a single index to quantify both features of this apparent relationship.
Risk of nosocomial infection (x 100) 2345678
0
10 20 30 40 50 60
Routine culturing ratio
Correlation and the Analysis of Variance Approach to Simple Linear Regression ? p. 2/35
Quantifying linear association
A
B
Risk of nosocomial infection (x 100) 2345678
Avg INFRISK = 4.4
D Avg CULT = 15.8 C
0
10 20 30 40 50 60
Routine culturing ratio
Correlation and the Analysis of Variance Approach to Simple Linear Regression ? p. 3/35
Quantifying linear association
Consider data points (x, y) in each of the four quadrants, A, B, C and D, formed by drawing a vertical line at the average culturing ratio value (X), and a horizontal line at the average value of nosocomial infection risk (Y).
Region A B C D
x - x?
y - y?
(x - x?)(y - y?)
Correlation and the Analysis of Variance Approach to Simple Linear Regression ? p. 4/35
The sample correlation coefficient
A
B
Risk of nosocomial infection (x 100) 2345678
Avg INFRISK = 4.4
D Avg CULT = 15.8 C
0
10 20 30 40 50 60
Routine culturing ratio
For points in quadrants A and C, (x - x?)(y - y?) will be negative. For points in quadrants B and D, (x - x?)(y - y?) will be positive. If a strong linear association exists, then the sum of this product across all data points,
n
(xi - x?)(yi - y?),
i=1
will be dominated by either positive or negative terms.
Correlation and the Analysis of Variance Approach to Simple Linear Regression ? p. 5/35
The sample correlation coefficient (cont.)
(1)
sxy =
n i=1
(xi
-
x?)(yi
-
y?)
n-1
is an estimate of the covariance between X and Y , which measures the strength of their association. (Population covariance is denoted by xy.)
It seems that Equation (1) is a good choice for assessing both direction and strength of linear association, but there is one drawback ... Equation (1) can be large because of the scale of measurement of the variables themselves, rather than the strength of a linear association. Therefore, we scale Equation (1) by dividing by estimates of the standard deviations of X and Y .
Correlation and the Analysis of Variance Approach to Simple Linear Regression ? p. 6/35
The sample correlation coefficient (cont.)
Recall that
^x = sx =
ni=1(xi - x?)2 n-1
and
^y = sy =
n i=1
(yi
n-
- 1
y?)2
.
Then our `standardized' index of linear association is
n i=1
(xi-x?)(yi
-y?)
s n-1
= s s . ni=1(xi-x?)2
n i=1
(yi
-y?)2
xy xy
n-1
n-1
Correlation and the Analysis of Variance Approach to Simple Linear Regression ? p. 7/35
Definition of sample correlation coefficient
This leads to the following definition of the sample correlation coefficient, r. It is also known as the Pearson correlation coefficient.
r=
ni=1(xi ni=1(xi -
- x?)(yi - y?) x?)2 ni=1(yi -
y?)2
.
? r's range of values is -1 to 1.
? r = 1 observations lie on positively sloped line.
? r = -1 observations lie on negatively sloped line.
? r is a dimensionless measure.
? r measures the strength of the linear association.
? r tends to be close to zero if there is no linear association.
Correlation and the Analysis of Variance Approach to Simple Linear Regression ? p. 8/35
What does r estimate?
r is an index obtained from a sample of n observations and is an estimator for an unknown population parameter. The parameter is called the population correlation coefficient, and is defined as
=
Cov(x, y) V ar(x)V ar(y)
=
xy xy
.
In other words,
r = ^.
Correlation and the Analysis of Variance Approach to Simple Linear Regression ? p. 9/35
Picturing and r
Each graph depicts a sample of 30 data points, (x, y), drawn from a population with the specified value of . r is calculated based on the 30 data points.
rho = -0.6 ; r = -0.691
rho = -0.05 ; r = -0.201
rho = 0.4 ; r = 0.556
rho = 0.9 ; r = 0.892
Correlation and the Analysis of Variance Approach to Simple Linear Regression ? p. 10/35
Inference about
When r is non-zero, does that imply that is non-zero? Not necessarily. We must have a method that accounts for the sampling variability in order to make rigorous inference about whether is different from zero.
We use the following hypothesis testing procedure.
H0 : = 0 versus HA : = 0.
The test statistic is
t = r n - 2
1 - r2
where r is the sample correlation coefficient and t tn-2 under H0.
Correlation and the Analysis of Variance Approach to Simple Linear Regression ? p. 11/35
Correlation analysis in SAS
SAS's PROC CORR computes the sample correlation, r, and conducts a two-sided = 0.05-level test of the null hypothesis H0 : = 0.
proc corr data = one; var infrisk cult;
run;
Correlation and the Analysis of Variance Approach to Simple Linear Regression ? p. 12/35
PROC CORR output
Pearson Correlation Coefficients, N = 113
Prob > |r| under H0: Rho=0
INFRISK
CULT
INFRISK RISK OF INFECTION
1.00000
0.55916 ................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- anova and r squared revisited multiple regression and r
- chapter 6 randomized block design two factor anova
- introduction to probability and statistics twelfth edition
- unit 2 two variable data
- ma 180 418 final exam version a
- math 128 elementary statistics spring 2018
- recall positive negative height and handspan association
- correlation and the analysis of variance approach to
- inferential methods in regression and correlation
- quantitative approaches contents lesson 10 bivariate
Related searches
- the benefits of reading aloud to children
- correlation and regression analysis pdf
- the purpose of photosynthesis is to produce
- happiness is the meaning and the purpose of life the whole aim and end of human
- correlation and regression analysis examples
- the constitution and the bill of rights
- the trait approach to personality assumes that
- chapter 2 neuroscience and the biology of behavior
- the trait approach to leadership
- the trade approach to personality assumes that
- the situational approach to personality assumes that
- the amount of solar power to run landscaping lights