Linear Algebra - Bilkent University - Linear algebra least squares example

Linear Algebra

Take the model with L endogenous and K exogenous variables. Then jth equation for the ith observation is

[pic]

We can write it ( for j=1,……,L)

[pic]

where [pic] and [pic]are observation vectors. [pic] and [pic]are coefficient matrices of orders LxL and KxK

[pic] [pic]

where there are N observations i =1,……,N. We combine all

[pic] (1)

where Y and X are of order NxL and NxK

[pic] [pic] [pic]

If [pic] is nonsingular, we can postmultiply above by [pic]

[pic] (2)

and get the reduced form while the earlier is the structural equation

Variance Matrix

Let r be a column vector of random variables [pic]

[pic]

Least Squares

[pic]

[pic] (3)

because [pic]is minimized by changing b.

[pic]

if [pic] is 0 and x is nonrandom, then b is unbiased.

var(b)[pic]

since var[pic]. Substitution of (3) into [pic]yields the protection matrix [pic], where M is symmetric and idempotent.

[pic]

since MX=0, [pic]. Also M[pic]=M, causing the residual sum of squares

[pic]

[pic]

Trace:

The trace of a square matrix is the sum of its diagonal elements

tr(AB) = tr(BA) if both exist

tr(A+B) = tr(A)+tr(B) if they are of same order

example:

[pic]

[pic]

Generalized Least Squares

If cov[pic] rather than [pic]where v is nonsingular then

[pic]

and

[pic]

(derived later in Aitken`s Theorem)

Partitioned matrices:

Take the system in (2) and rewrite it as [pic]

[pic]

We partition by sets of columns.

Rules:

[pic]

[pic]

Inverse of a symmetric partitioned matrix:

[pic]

=[pic]

where [pic] and [pic]. We use the first when C is nonsingular and second when A is nonsingular.

Application: deviation from mean

[pic]

bottom right of inverse

[pic]

Kronecker Product

A special form of partitioning when all submatrices are scalar multiples of the same matrix.

[pic]

We refer to this as Kronecker product of A and B. If B is pxq, then the order is mpxnq

[pic]

[pic] [pic]

Application: Joint GLS

[pic]

[pic][pic]

[pic]

If we assume n disturbances of L equations have equal variance and uncorrelated so that [pic] and for [pic] [pic]contains contemporaneous covariances, then the full covariance matrix

[pic] where [pic]

Assuming [pic] is non-singular, application of GLS results in

[pic]

and

[pic]

[pic] is superior to Least Squares unless [pic] are all identical causing

[pic]

[pic]

or [pic] is diagonal, indicating no covariance between j and i.

Vectorization

Sometimes we need to work with vectors rather than matrices, e.g. finding the variance of [pic]. If the [pic]’s are in a matrix form B. Therefore, we vectorize these parameters by: A=[pic] (a pxq matrix), ai being the i’th column of A.

vec(A) = [pic]which is a pq element column vector.

vec(A+B) = vec(A)+vec(B)

vec(AB)=[pic]

Definiteness

If the quadratic form [pic] is positive for any [pic] is said to be positive definite. The covariance matrix [pic] of any random vector is always positive semi definite.

Example: BLUE implies [pic]is positive semi definite.

Proof:[pic] is linear in y so = [pic] . Define C = B - ([pic]

To write [pic] as

[pic]

Unbiasedness implies CX = 0.

[pic]

The difference [pic] is positive semi definite because [pic] is the non-negative squared length of the vector [pic] (inner product). We use definiteness in evaluation of minima and maxima; e.g. for maximizatior of utility the Hessian should be negative definite. If a matrix is definite and black diagonal, the principal submatrices are also definite.

Diagonalization

For some nxn matrix A, we seek a vector x so that Ax equals [pic] ([pic] is a scalar) This is trivially satisfied by x = 0 so we impose [pic] implying [pic].

[pic] (4)

[pic] is singular

[pic] (characteristic equation)

If A is diagonal, then the terms in the diagonal will be the solution to the characteristic eq. The determinant is a polynomial of degree n and has [pic]as the latent roots. The product of [pic] equals the determinant of A, and the sum [pic] equals the trace of A. All vectors are called characteristic (eigen)vectors.

If A is symmetric [pic], so assume [pic] and [pic] are two different roots with vectors [pic] and [pic].

Premultiplying [pic] by [pic] and [pic] by [pic] and then subtract

[pic]

For a symmetric matrix, left side vanishes and [pic]is non zero, implying orthogonality of the characteristic vectors.

[pic] where [pic]

AX=[pic]

[pic]

where [pic] is diagonal with [pic] on the diagonal. Premultiply by [pic]

[pic]

This double multiplication diagonalizes A (symmetric). Also postmultiplication

[pic]

Special cases: If A is square [pic]

[pic]

which shows that the roots of a square matrix[pic] is the square of [pic]. For symmetric and non singular A premultiply both sides by [pic] to obtain

[pic]

All latent roots of a positive definite matrix are positive.

Aitken’s theorem:

Any symmetric positive definite matrix A can be written as [pic] where Q is some non-singular matrix. For example, consider the covariance matrix [pic]. This matrix is positive def. So its inverse also is. So we can decompose it [pic]. We premultiply both sides of

[pic]

by Q

[pic]

[pic]

[pic] (GLS)

var([pic]

Cholesky decomposition

Rather than using an orthogonal matrix, X, in the previous diagonalization, it is also possible to use a triangular matrix. For instance, consider a diagonal D and an upper triangular C with units in the diagonal.

[pic] [pic]

yielding

[pic]

Any symmetric positive definite matrix [pic] can be uniquely written as [pic] [pic], which is referred to as the Cholesky decomposition.

Simultaneous diagonalization of two matrices

[pic] (5)

where A and B are symmetric nxn matrices, is being positive definite.[pic] is also positive def. So [pic].

[pic]

This shows that (5) can be reduced to (4) when A in (4) is interpreted as [pic]. If A is symmetric, so is [pic]. (5) has n solutions [pic], and if they are distinct, then [pic] are unique.

[pic] as [pic]

and

[pic]

premultiplication by [pic]

[pic]

Therefore

[pic] and [pic]

which shows that matrices being diagonalized together, A into the latent root matrix and B into I.

Example: Constrained extremum

Maximize [pic] subject to [pic] with respect to x

Lagrangian[pic]

[pic]

which shows that [pic] must be a root. Next premultiply by [pic] shows that the largest root [pic] is the maximum of [pic] st [pic]

Principal components

Consider nxk observation matrix Z. The objective is to approximate Z by a matrix of unit rank, [pic], where V is an n element vector and C is a k element coefficient vector. The objective is to minimize the discrepancy matrix [pic]by minimizing the square of all KN discrepancies and also impose [pic] to be able to solve for c. The solution when [pic] and [pic] is

[pic]

So [pic] is the largest latent root of [pic]. Next to approximate Z by [pic], we again minimize the discrepancies st. [pic] and [pic]. The solution again is

[pic]

which is the second largest root and the corresponding vector. You can generalize to i roots.

Derivation for 1 root

Since the sum of squares of any matrix A is equal to trace [pic], the discrepancy matrix sum of squares is

[pic]

[pic]

Derivative wrt c is [pic]

Substituting this back in, we get [pic], so to minimize the discrepancy, maximize [pic]

[pic]

[pic]

This, shows that the maximum root, [pic], minimizes [pic]. Extend to i cases.

[pic]need not be diagonal; the observed variables can be correlated. But the principle components are all uncorrelated because [pic]. Therefore, these components can be viewed as “uncorrelated linear combinations of correlated variables”.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

To fulfill the demand for quickly locating and searching documents.

It is intelligent file search solution for home and business.

Literature Lottery

Linear Algebra - Bilkent University

To fulfill the demand for quickly locating and searching documents.

Related download

Related searches