CHAPTER 10



CHAPTER 10

Correlation and Regression

Objectives

Draw a scatter plot for a set of ordered pairs.

Compute the correlation coefficient.

Test the hypothesis H0: ( ( 0.

Compute the equation of the regression line. Compute the coefficient of determination.

Section 10.1 Introduction

In addition to hypothesis testing and confidence intervals, inferential statistics involves determining whether a relationship between two or more numerical or quantitative variables exists.

Statistical Methods

Correlation is a statistical method used to determine whether a relationship between variables exists.

Regression is a statistical method used to describe the nature of the relationship between variables—that is, positive or negative, linear or nonlinear.

Statistical Questions

Are two or more variables related?

If so, what is the strength of the relationship?

What type or relationship exists?

What kind of predictions can be made from the relationship?

Section 10.2 Correlation

I. Scatter Plots

A scatter plot is a graph of the ordered pairs (x,y) of numbers consisting of the independent variable, x, and the dependent variable, y.

A scatter plot is a visual way to describe the nature of the relationship between the independent and dependent variables.

Example: Construct a scatter plot for the data obtained in a study of the number of hours of sleep and performance.

II. Correlation Coefficient

A correlation coefficient is a measure to determine the strength of the relationship between two variables.

In a simple relationship, there are only two types of variables under study.

In multiple relationships, many variables are under study.

Correlation Coefficient

The correlation coefficient computed from the sample data measures the strength and direction of a linear relationship between two variables.

The symbol for the sample correlation coefficient is r.

The symbol for the population correlation coefficient is ( (Greek letter rho).

The range of the correlation coefficient is from (1 to (1.

If there is a strong positive linear relationship between the variables, the value of r will be close to (1.

If there is a strong negative linear relationship between the variables, the value of r will be close to (1.

When there is no linear relationship between the variables or only a weak relationship, the value of r will be close to 0.

Range of Values for the Correlation Coefficient

[pic]

In general, [pic] , there is a positive/negative linear correlation between X and Y.

Relationship Between the Correlation Coefficient and the Scatter Plot

[pic]

Formula for the Correlation Coefficient r

where n is the number of data pairs.

Example: Compute the value of the correlation coefficient for the data obtained in the study of age and blood pressure.

|Age x |Pressure y |xy |x2 |y2 |

|43 |128 | | | |

|48 |120 | | | |

|56 |135 | | | |

|61 |143 | | | |

|67 |141 | | | |

|70 |152 | | | |

| | | | | |

Possible Relationships Between Variables

There is a direct cause-and-effect relationship between the variables: that is, x causes y.

There is a reverse cause-and-effect relationship between the variables: that is, y causes x.

The relationship between the variable may be caused by a third variable: that is, y may appear to cause x but in reality z causes x.

There may be a complexity of interrelationships among many variables; that is, x may cause y but w, t, and z fit into the picture as well.

The relationship may be coincidental: although a researcher may find a relationship between x and y, commonsense may prove otherwise.

Interpretation of Relationships

When the null hypothesis is rejected, the researcher must consider all possibilities and select the appropriate relationship between the variables as determined by the study. Remember, correlation does not necessarily imply causation.

Population Correlation Coefficient

Formally defined, the population correlation coefficient ( is the correlation computed by using all possible pairs of data values (x, y) taken from a population.

Hypothesis Testing

In hypothesis testing, one of the following is true:

H0: ( ( 0 This null hypothesis means that there is no correlation between the x and y variables in the population.

H1: ( ( 0 This alternative hypothesis means that there is a significant correlation between the variables in the population.

Formula for the t Test for the Correlation Coefficient

Formula for the t test for the correlation coefficient:

with degrees of freedom equal to n ( 2.

Example: Test the significance of the correlation coefficient found for the data obtained in the study of age and blood pressure. Use [pic] and [pic]

Testing the significance of r using Table I

Table I shows the values of the correlation coefficient that are significant for a specific

level and a specific number of degrees of freedom.

Any value of r greater than a positive critical value or less than a negative critical value will be significant, and the null hypothesis will be rejected.

Example: Using Table I, test the significance of the correlation coefficient r = 0.0667, at , and sample size is 9.

Section 10.3 Regression

Regression is a statistical method used to describe the nature of the relationship between variables—that is, positive or negative, linear or nonlinear.

Types of Regression and correlation

[pic]

Linear Regression

If the value of the correlation coefficient is significant, the next step is to determine the equation of the regression line which is the data’s line of best fit.

Best fit means that the sum of the squares of the vertical distances from each point to the line is at a minimum.

Scatter Plot with Three Lines A Linear Relation

[pic] [pic]

Equation of a Line

In algebra, the equation of a line is usually given as y = mx + b, where m is the slope of the line and b is the y intercept.

In statistics, the equation of the regression line is written as y( = a + bx, where a is the

y( intercept and b is the slope of the line.

Formulas for the Regression Line

Formulas for the regression line y( = a + bx:

where a is the y' intercept and b is the slope of the line.

Rounding Rule

When calculating the values of a and b, round to three decimal places.

Example: Find the equation of the regression line for the data obtained in the study of age and blood pressure.

Example: Using the equation of the regression line, predict the blood pressure for a person who is 50 years old.

Procedure

Finding the correlation coefficient and the regression line equation

Step 1 Make a table with columns for subject, x, y, xy, x2, and y2.

Step 2 Find the values of xy, x2, and y2. Place them in the appropriate columns,

and sum each column.

Step 3 Substitute in the formula to find the value of r.

Step 4 When r is significant, substitute in the formulas to find the values of a and

b for the regression line equation .

Summary

The strength and direction of the linear relationship between variables is measured by the value of the correlation coefficient r.

r can assume values between and including (1 and (1.

The closer the value of the correlation coefficient is to (1 or (1, the stronger the linear relationship is between the variables.

A value of (1 or (1 indicates a perfect linear relationship.

Relationships can be linear or curvilinear.

To determine the shape, one draws a scatter plot of the variables.

If the relationship is linear, the data can be approximated by a straight line, called the regression line or the line of best fit.

Conclusion

Many relationships among variables exist in the real world. One way to determine whether a relationship exists is to use the statistical techniques known as correlation and regression.

-----------------------

[pic]

[pic]

[pic]

[pic]

no correlation

As x increases, y decreases

negative correlation

As x increases, y increases

positive correlation

perfect correlation

exponential regression

[pic]

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download