Lecture 16 - Correlation and Regression - Duke University

Lecture 16 - Correlation and Regression

Statistics 102

Colin Rundel

April 1, 2013

Modeling numerical variables

Modeling numerical variables

So far we have worked with single numerical and categorical variables,

and explored relationships between numerical and categorical, and

two categorical variables.

This week we will learn to quantify the relationship between two

numerical variables, as well as modeling numerical response variables

using a numerical or categorical explanatory variable.

Next week we will learn to model numerical variables using many

explanatory variables at once.

Statistics 102 (Colin Rundel)

Lec 16

April 1, 2013

2 / 34

Modeling numerical variables

Poverty vs. HS graduate rate

The scatterplot below shows the relationship between HS graduate rate in

all 50 US states and DC and the % of residents who live below the poverty

line (income below $23,050 for a family of 4 in 2012).

18

¡ñ

Response?

¡ñ ¡ñ

¡ñ

% in poverty

16

¡ñ

¡ñ

¡ñ

¡ñ

14

Explanatory?

¡ñ

¡ñ

¡ñ

¡ñ

¡ñ

¡ñ

12

¡ñ

¡ñ

¡ñ

¡ñ¡ñ

10

¡ñ¡ñ

¡ñ

¡ñ

¡ñ

8

¡ñ

¡ñ

¡ñ

¡ñ

¡ñ

Relationship?

¡ñ

¡ñ

¡ñ¡ñ

¡ñ

¡ñ

¡ñ ¡ñ

¡ñ

¡ñ

¡ñ¡ñ

¡ñ

¡ñ

¡ñ

¡ñ

¡ñ

¡ñ

¡ñ

¡ñ

¡ñ

6

¡ñ

80

Statistics 102 (Colin Rundel)

85

% HS grad

90

Lec 16

April 1, 2013

3 / 34

Correlation

Quantifying the relationship

Correlation describes the strength of the linear association between

two variables.

It takes values between -1 (perfect negative) and +1 (perfect

positive).

A value of 0 indicates no linear association.

We use ¦Ñ to indicate the population correlation coefficient, and R or r

to indicate the sample correlation coefficient.

Statistics 102 (Colin Rundel)

Lec 16

April 1, 2013

4 / 34

Correlation

Correlation Examples

From

Statistics 102 (Colin Rundel)

Lec 16

April 1, 2013

5 / 34

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download