The Correlation Coefficient



The Correlation Coefficient

1. Bivariate data

Univariate data = a single list of numbers.

Bivariate data = paired data.

Chapter 5 is bivariate data land. Like (length of hair, length of nose) or (age, height). And we’ll make scatterplots to start.

Generally, we’re interested in whether there is a relationship between the two sets of numbers.

-is there a relationship between the length of a person’s hair and the length of their nose? -is there a relationship between a person’s age and their height?

Remember: We already know how to describe a scatterplot: “There is a strong, positive, linear relationship between the # of bowls of Cheerios you eat and the amount you can bench press.”

2. Is there a relationship?

Again, the first thing we do is to make a scatterplot and look for relationships. By the way, we wouldn’t have collected the data in pairs in the first place if we didn’t think there was a relationship!

(graphs)

3. The correlation coefficient, r.

The correlation coefficient r is often the second thing we do to find a linear relationship (and it’s good for linear only!)

Facts about r:

a) It’s between -1 and 1. When r = 1 you have a perfect, positive, linear relationship.

b) When r = -1, you have a perfect, negative, linear relationship.

c) When r = 0, you have no linear relationship.

d) If r is between 0.8 and 1, that’s a strong linear relationship.

If r is between 0.5 and 0.8, that’s a moderate linear relationship.

If r is less than 0.5, that’s a weak linear relationship.

e) The value of r doesn’t depend on which variable is x and which is y

f) The value of r doesn’t depend on the unit of measurement (like turning everything from years to seconds wouldn’t change r)

g) The sample correlation coefficient is r. The population correlation coefficient is [pic] (“rho”)

h) The formula for r is:

[pic]

where the z’s are the z-scores for the x’s and y’s. You don’t need to know the formula, though. The calculator can do it for you.

j) r can only tell you if there is a LINEAR relationship. There could be non-linear relationships (like quadratics, for example, that look like “U”s) that the correlation coefficient can’t tell you about.

4. The Calculator Doing It For You

Step 1: Enter the x-values under L1 and the y-values under L2.

Step 2: STAT, CALC, 8:LinReg(a+bx)

Step 3: type L1, L2 after the LinReg(a+bx). (So that it looks like LinReg(a+bx) L1,L2)

Step 4: ENTER. r2 and r are given.

(If for some reason it’s not showing, go to MODE and turn DIAGNOSTICS ON).

5. This Doesn’t Imply That

Finally, a strong correlation does not imply cause-and-effect. Cause-and-effect can only be determined by an experimental design. If you find a perfect linear relationship between the time people pray and their life expectancies, it’s not necessarily true that praying causes longer lives.

In other words: Correlation does not imply causation.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download