Correlation - Weebly



SCATTERPLOTS (Teacher)

When we want to know if there is some sort of relationship between 2 numerical variables, we can use a scatterplot. It gives a visual display of the relationship between the 2 variables.

Graphing a scatterplot:

1. Decide which is the IV & DV

2. Independent variable = horizontal axis

3. Dependent variable = vertical axis

4. Make sure scales are appropriate & accurate

Interpreting Scatterplots

We can talk about the correlation or relationship or association between two variables and mean the same thing.

Step 1: Look to see if there is a clear pattern. If so, proceed to Step 2.

Step 2: Look for DIRECTION (or Polarity) – i.e. whether it is positive or negative.

Positive Relationship Negative Relationship Randomly scattered or

No Association.

Step 3: Make a judgement about the STRENGTH of the relationship between the variables. This is based on the spread of the points.

Step 4: Observe whether the pattern of the points appears to be LINEAR (in a straight line) or not. Can also be referred to as ‘linearity’ or ‘form’.

Linear:

Non – Linear:

Step 4: Identify and investigate any outliers. Sometimes they are a mistake. Sometimes they are genuinely extraordinary data and should be included.

EXAMPLES

Look at the following 2 scatterplots. Does a relationship exist?

If so, describe the relationship –

• Linearity

• Strength

• Polarity

• Linearity - linear

• Strength - moderate

• Polarity - positive

• Linearity – no relationship

• Strength - zero

• Polarity – N/A

Pearson’s Product-Moment Correlation Coefficient

Also called Correlation Coefficient or ‘r’. This is a more precise tool. In summary it is a measure of the tendency of data to lie on a straight line.

The value of ‘r’ ranges from -1 to 1 i.e. -1 ≤ r ≤ 1.

Following are examples of scatterplots:

The value of the Correlation Coefficient indicates the strength of the linear relationship between 2 variables. The following diagram gives a guide to the strength of the correlation based on the value of ‘r’.

EXAMPLES

Estimate r & comment on strength & direction of the relationship between variables.

r ≈ 0.9

Strong, +ve r/ship

r ≈ -0.7

Mod, -ve r/ship

r ≈ -0.1

No linear r/ship

A set of data relating the variables x and y is found to have an r value of – 0.83. The scatterplot that could represent this data set is:

A formula can be used to calculate ‘r’:

[pic]

OR alternatively, using your calculator:

• Calc – Lin Reg – 2 variable

Coefficient of Determination

The degree to which one variable can be predicted from another linearly related variable is given by the Coefficient of Determination or ‘ r2 ’ and values range from 0 to 1 i.e. . 0 ≤ r2 ≤ 1.

NB: Answer is normally expressed as a percentage i.e. r2 = 0.58, therefore Coefficient of Determination is 58%.

NB: When calculating r from r2, there can be 2 possible answers, positive or negative. If no further information is given, both answers must be given. Always check for a graph or equation.

Example: Calculate r given r2 = 0.64. r = √0.64 = ± 0.8

Interpreting r2

Always use the following sentence:

The Coefficient of Determination tells us that (r2 × 100%) of the variation in the DEPENDENT variable is explained by the variation in the INDEPENDENT variable.

Example: The coefficient of determination for a set of data relating age and pulse rate is 0.7. This means that:

70% of the variation in pulse rate (DV) can be

explained by the variation in age (IV).

Correlation and Causation

The value of r was calculated for the following 2 variables: height of a footballer & number of marks he takes. It was 0.86 which entitles us to say there is a strong association between the variables. We CANNOT, however, assert that the height of a footballer causes him to take a lot of marks. Being tall might assist in taking marks but there will be many other factors which come into play, for example skill level, accuracy of delivery, etc.

Which Graph?

EXAMPLE (completed in books)

The table below gives data relating the percentage of lectures attended by students in a semester and the corresponding mark for each student in the exam for that subject.

a) Construct a scatterplot for the data on your calculator & then sketch it.

b) Comment on the correlation between the lectures attended and the examination results and make an estimate of r.

c) Calculate r.

d) Calculate the coefficient of determination.

e) Interpret the coefficient of determination.

-----------------------

4D

Strong

Moderate

Weak

Strong

Moderate

Weak

Only use ‘r’ when data is:

• Numeric

• Linear

• Has No outliers (check by creating a graph)

4E, 4F

4G

4H

4I

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download