Lecture 4: Multivariate Regression Model in Matrix Form

[Pages:29]Takashi Yamano

Lecture Notes on Advanced Econometrics

Lecture 4: Multivariate Regression Model in Matrix Form

In this lecture, we rewrite the multiple regression model in the matrix form. A general

multiple-regression model can be written as

yi

=

0

+

1

xi1

+ 2 xi2

+ ...+ k xik

+ ui

for i = 1, ... ,n.

In matrix form, we can rewrite this model as

y1 y2

yn

=

1 1 1

x11 x21

xn1

x12 x22

xn2

... ...

...

x1k x2k

xnk

0

1

2

k

+

uu12

un

n x 1 n x (k+1) (k+1) x 1 n x 1

Y = X + u

We want to estimate .

Least Squared Residual Approach in Matrix Form (Please see Lecture Note A1 for details)

The strategy in the least squared residual approach is the same as in the bivariate linear regression model. First, we calculate the sum of squared residuals and, second, find a set of estimators that minimize the sum. Thus, the minimizing problem of the sum of the squared residuals in matrix form is

min uu = (Y - X )(Y - X )

1xn nx1

1

Notice here that uu is a scalar or number (such as 10,000) because u is a 1 x n matrix and u is a n x 1 matrix and the product of these two matrices is a 1 x 1 matrix (thus a scalar). Then, we can take the first derivative of this object function in matrix form. First, we simplify the matrices:

uu = (Y - X )(Y - X )

= Y Y - X Y - Y X + X X

= Y Y - 2 X Y + X X

Then, by taking the first derivative with respect to , we have:

(uu)

=

-2 X

Y

+

2X

X

From the first order condition (F.O.C.), we have

- 2 X Y + 2 X X^ = 0

X X^ = X Y

Notice that I have replaced with ^ because ^ satisfy the F.O.C, by definition. Multiply the inverse matrix of (X X )-1on the both sides, and we have:

^ = ( X X ) -1 X Y

(1)

This is the least squared estimator for the multivariate regression linear model in matrix form. We call it as the Ordinary Least Squared (OLS) estimator.

Note that the first order conditions (4-2) can be written in matrix form as

2

X (Y - X^) = 0

1

x11

x1k

1 ... 1

x21

...

xn1

x2k ... xnk

y1 y2

yn

-

1 1

x11 x21

... x1k ... x2k

1 xn1 ... xnk

^0 ^1 ^k

=

0

1

x11

x1k

1 ... 1

x21

...

xn1

x2k ... xnk

y1 y2

yn

- -

-

^ 0

^ 0

^0

- ^ 1

-

^ 1

- ^1

x11 x21

xn1

... ...

...

- -

-

^ k

x1k

^k x2k

^k xnk

=

0

(k+1) x n

n x 1

This is the same as the first order conditions, k+1 conditions, we derived in the previous

lecture note (on the simple regression model):

n

( yi

- 0

-

1

xi1

-

2

xi2

-...- bk xik )

=0

i =1

n

xi1 ( yi

- 0

-

1 xi1 -

2

xi2

-...- bk xik )

=0

i =1

n

xik ( yi

- 0

-

1

xi1

-

2

xi2

-... - bk xik

)

=0

i =1

Example 4-1 : A bivariate linear regression (k=1) in matrix form

As an example, let's consider a bivariate model in matrix form. A bivariate model is

yi

=

0

+

1

xi1

+ ui

for i = 1, ..., n.

In matrix form, this is Y = X + u

3

y1 y2

=

1 1

x1 x2

10

+

uu12

yn 1 xn

un

From (1), we have

^ = ( X X ) -1 X Y

(2)

Let's consider each component in (2).

X X

=

1 x1

1 x2

... ...

1 xn

1 1

x1 x2

1 xn

n

1

=

i=1 n

i=1 xi

n

xi

i =1 n

xi2

i =1

=

n nx

nx

n

i =1

xi2

This is a 2 x 2 square matrix. Thus, the inverse matrix of X X is,

( X X )-1 =

1

n

n xi2 - n x 2

i =1

n

xi2

i=1

- nx

-

nx

n

=

1

n

n ( xi - x)2

i =1

n

xi2

i=1

- nx

-

nx

n

The second term is

4

X Y

=

1 x1

1 x2

... ...

1 xn

y1 y2

yn

n

yi

= i=1

=

n

i=1 xi y i

ny

n

i=1

xi

y

i

Thus the OLS estimators are:

^ = ( X X )-1 X Y =

1

n

n ( xi - x)2

i =1

n

xi2

i=1

- nx

-

nx n

ny

n i =1

xi

y

i

=

1

n

n ( xi - x)2

i =1

ny

n

xi2

- nx

n

xi yi

i =1

i =1

-

nx

ny

n

+ n xi yi

i =1

=

1

y

n

xi2

-x

n

xi yi

i=1

i =1

n

(

i =1

xi

-

x)2

n i =1

xi

yi

- nxy

=

1

y

n

xi2 -

y x2 +

y x2 - x

n

xi yi

i=1

i =1

n

( xi - x)2

i =1

n

(xi yi - x y)

i =1

=

n

(

i =1

1 xi -

x)2

n y(

n

xi2 - x 2 ) - x (

i=1

i =1

n i =1

(xi

-

x)( y i

-

y)

xi y i

- y x)

y

-

^ 1

x

=

n

(xi - x)( yi - y)

i=1

n

( xi - x)2

i =1

=

^

0

^1

5

This is what you studied in the previous lecture note.

End of Example 4-1

Unbiasedness of OLS

In this sub-section, we show the unbiasedness of OLS under the following assumptions.

Assumptions:

E 1 (Linear in parameters): Y = X + u E 2 (Zero conditional mean): E(u | X ) = 0 E 3 (No perfect collinearity): X has rank k.

From (2), we know the OLS estimators are ^ = ( X X ) -1 X Y

We can replace y with the population model (E 1),

^ = ( X X )-1 X ( X + u) = ( X X )-1 X X + ( X X )-1 X u = + ( X X )-1 X u

By taking the expectation on the both sides of the equation, we have: E(^ ) = + ( X X )-1 E( X u)

From E2, we have E(u | X ) = 0 . Thus, E(^) =

Under the assumptions E1-E3, the OLS estimators are unbiased.

The Variance of OLS Estimators

6

Next, we consider the variance of the estimators. Assumption: E 4 (Homoskedasticity): Var (ui | X ) = 2 and Cov(ui ,u j ) = 0 , thus Var(u | X ) = 2 I . Because of this assumption, we have

E

(uu

)

=

E

uu12

[u1

u2

un

]un

=

E E

(u1u1 ) (u2u1 )

E(u1u2 ) E(u2u2 )

E(unu1 ) E(unu2 )

E (u1u n E (u 2 u

) n)

=

0

2

0 2

E(unun ) 0 0

0

0

=

2

I

2

n x 1

1 x n

n x n

nxn nxn

Therefore,

Var (^) = Var[ + ( X X ) -1 X u]

= Var[( X X )-1 X u] = E[( X X )-1 X uuX ( X X )-1 ] = ( X X )-1 X E(uu) X ( X X )-1 = ( X X )-1 X 2 I X ( X X )-1

(E4: Homoskedasticity)

Var (^) = 2 ( X X )-1

(3)

GAUSS-MARKOV Theorem: Under assumptions 1 ? 4, ^ is the Best Linear Unbiased Estimator (BLUE).

7

Example 4-2: Step by Step Regression Estimation by STATA

In this sub-section, I would like to show you how the matrix calculations we have studied are used in econometrics packages. Of course, in practices you do not create matrix programs: econometrics packages already have built-in programs.

The following are matrix calculations with STATA using data called, NFIncomeUganda.dta. Here we want to estimate the following model:

ln(income) yi

=

0

+ 1 femalei

+ 2edui

+ 3edusqi

+ ui

All the variables are defined in Example 3-1. Descriptive information about the variables are here:

. su;

Variable |

Obs

Mean Std. Dev.

Min

Max

-------------+--------------------------------------------------------

female |

648 .2222222 .4160609

0

1

edu |

648 6.476852 4.198633

-8

19

edusq |

648 59.55093 63.28897

0

361

-------------+--------------------------------------------------------

ln_income |

648 12.81736 1.505715 7.600903 16.88356

First, we need to define matrices. In STATA, you can load specific variables (data) into matrices. The command is called mkmat. Here we create a matrix, called y, containing the dependent variable, ln_nfincome, and a set of independent variables, called x, containing female, educ, educsq.

. mkmat ln_nfincome, matrix(y)

. mkmat female educ educsq, matrix(x)

Then, we create some components: X X , ( X X )-1, and X Y :

8

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download