Quantitative approaches Contents Lesson 10: Bivariate ...

[Pages:8]Quantitative approaches

Lesson 10: Bivariate regression

Quantitative approaches

Contents

1. What is (bivariate) linear regression? 2. Example : Size of dwarfs and the influence of food 3. How to do it in SPSS

Quantitative approaches

1. What is (bivariate) linear regression?

Quantitative approaches

What is (bivariate) linear regression?

Bivariate linear regression = Statistical method that relates an independent variable to a dependent (or response) variable by modeling the relationship as a straight line.

Regression analysis is used when both variables are continuous variables (measured on an interval or metric scale)

Quantitative approaches

The basic model

The basic model we fit in a bivariate linear regression is a straight line with y = a + bx y = response variable (= dependent) x = explanatory variable (= independent) a = intercept b = slope

Quantitative approaches

2. Example : Size of dwarfs and influence of food

Quantitative approaches

What is (bivariate) linear regression?

a a = intercept

delta x

delta y

b = delta y = slope of line delta x

Quantitative approaches

Size of dwarfs and influence of food : data

Food (X) 8 7 6 5 4 3 2 1 0

Size (Y) 12 10 8 11 6 7 2 3 3

Quantitative approaches

Food and size of dwarfs: Scatterplot

Quantitative approaches

The meaning of the slope b

Slope

b

=

!Y !X

=

! !

Size Food

Slope b = change in Y that accompanies a unit change in X

In our example: Adding one unit of food causes a dwarf to grow 1.22 cm on average.

Quantitative approaches

Scatterplot and regression line

The regression line is our ?!model!? for the data. For every value of ?!food!?, the model predicts the value of ?!size!? on the regression line.

Quantitative approaches

Errors (or: residuals)

errors e = Y ! Y!

Y

Y!

Since the prediction is rarely

completely accurate, we get for

every value of ?!food!?

Y an ?!error!? e , that is, the

distance between actual value

of of

?!size!? ?!size!?

aY!n.d

predicted

value

We also get an ?!explained part

of the variance! r ?

Quantitative approaches

The Least Squares Criterion

error e = Y ! Y!

Y

Y!

We look for the line that minimizes the squared residuals e (SSE). This is called the ?!least squares criterion!?

minimize SSE = " e2 = "(Y ! Y!)2

Quantitative approaches

Explained Variance

All the variance is

No variance is

explained through the model explained through the model

Quantitative approaches

Degree of fit: R-square

It is not enough to know the value of slope b. Very different relationships between X and Y may have the same slope b. We therefore calculate R-square, (= explained variance/total variance) in order to measure the ?!fit!? of the model. R-square ranges from 0 to 1.

b = 1.163 R-squared = 0.979

b = 1.483 R-squared = 0.877

b = 1.521 R-squared = 0.589

Quantitative approaches

Degree of fit: R-square

By introducing the regression line, we divide the total variation of ?!size!? into a regression variation SSR (explained) and a error variation SSE (unexplained).

Explained variance = R-square = explained variation/total variation

R2 = SSR SSY

Quantitative approaches

Formula (1)

sums of squares in Y sums of squares in X sums of products X,Y

SSY = "(y ! y)2 SSX = "(x ! x)2 SSXY = "(x ! x)(y ! y)

slope of regression line

b = SSXY SSX

intercept of regression line

a= !y "b*!x

n

n

Quantitative approaches

Calculating intercept a and slope b

SSY = "(y ! y)2 = 108.8889 SSX = "(x ! x)2 = 60 SSXY = "(x ! x)(y ! y) = 73

b = SSXY = 73 = 1.22

SSX 60

a = ! y " b * ! x = 62 " 1* 36 = 2.02

n

n9

9

y = a+b*x y = 2.02 + 1.22 * x

Quantitative approaches

Formula (2)

total variation (sum of squares) regression variation (explained)

error variation (unexplained)

explained variance

SSY = "(y ! y)2

SSR = SSXY 2 SSX

SSE = SSY ! SSR

R2 = SSR SSY

Quantitative approaches

Calculating explained variation, residual variation and explained variance

SSY = "(y ! y)2 = 108.8889 SSX = "(x ! x)2 = 60 SSXY = "(x ! x)(y ! y) = 73

Explained variance

SSR = 88.8166 = 0.8157 = 81.6% SSY 108.8889

Regression variation

Error variation

SSR = SSXY 2 = 732 = 88.8166 SSX 60

SSE = SSY ! SSR = 108.8889 ! 88.8166 = 20.0723

Quantitative approaches

Calculating error variance (ANOVA-table)

Sum of squares

df

Regression 88.817 (SSR)

1

Error

20.072 (SSE)

7

Total

108.889 (SSY)

8

Since the F-ratio is greater than the critical F-value for df= 1/7, we reject the 0-hypothesis that the real b in population could be equal to 0 The ANOVA-table of the regression tells us if all the explanatory variables have together a significant effect on the variance of Y

Mean squares

F ratio

88.817 = 1

88.817

20.072 = s2 = 2.86746

7

88.817 = 30.974

2.86746

the error variance will be used to calculate standard errors for b and a

critical F-value = 5.591

Quantitative approaches

Calculating the p-value of intercept a

Coefficients:

(Intercept) food

Estimate 2.0222 1.2167

Std. Error t value p value 1.0408 1.943 0.093129 0.2186 5.565 0.000846 ***

Estimate = t value Std. Error

2.0222 = 1.943 1.0408

The t-value +/-1.943 cuts off two areas of the t-distribution with df=8 on the left and the right hand side. The total of these two areas is the p-value 0.0931. -> in 9.3% of the cases an intercept might have come up with this size or bigger, even if the real intercept was 0. -> The intercept is not significantly bigger than 0.

Quantitative approaches

Calculating the standard errors of the intercept a and the slope b

We can now use the error variance s2 from the Anova-table in order to calculate the standard errors of the intercept a and the slope b.

standard error of b = s2 = 2.867 = 0.2186 SSX 60

! standard error of a = s2 x2 = 2.867 * 204 = 1.0408

n * SSX

9 * 60

Quantitative approaches

Calculating the p-value of slope b

Coefficients:

(Intercept) food

Estimate 2.0222 1.2167

Std. Error t value p value 1.0408 1.943 0.093129 0.2186 5.565 0.000846 ***

Estimate = t value Std. Error

1.2167 = 5.565 0.2186

The t-value +/- 5.565 cuts off two areas of the t-distribution with df=8 on the left and the right hand side. The total of these two areas is the p-value 0.000846. -> in 0.08% of the cases an intercept might have come up with this size or bigger, even if the real intercept was 0. -> The slope is not significantly different from 0.

Quantitative approaches

Calculating the p-value of slope b

Quantitative approaches

3. How to do it in SPSS

t=-5.565

t=5.565

Quantitative approaches

Regression (1) : get data

File -> Open -> Data Click on FoodSize.sav Open

Quantitative approaches

Regression (2)

Analyze -> Regression -> Linear Put ?!food!? into ?!Dependent!? Put ?!size!? into ?!Indepedent(s)!?

Statistics: Regression Coefficients: - Estimates - Confidence intervals Continue OK

Quantitative approaches

Regression (3) : Results

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download