Objective of this course: introduce basic concepts and ...



Objective of this course: introduce basic concepts and skills in matrix algebra. In addition, some applications of matrix algebra in statistics are described.

Section 1. Matrix Operations

1. Basic matrix operation

Definition of [pic] matrix

An [pic] matrix A is a rectangular array of rc real numbers arranged in r horizontal rows and c vertical columns:

[pic].

The i’th row of A is

[pic],

and the j’th column of A is

[pic]

We often write A as

[pic].

Matrix addition:

Let

[pic],

[pic],

[pic].

Then,

[pic],

[pic]

and the transpose of A is denoted as

[pic]

Example 1:

Let

[pic] and [pic].

Then,

[pic],

[pic]

and

[pic].

1.2 Matrix multiplication

We first define the dot product or inner product of n-vectors.

Definition of dot product:

The dot product or inner product of the n-vectors

[pic] and [pic],

are

[pic].

Example 1:

Let [pic] and [pic]. Then, [pic].

Definition of matrix multiplication:

[pic]

[pic]

[pic]

That is,

[pic]

Example 2:

[pic].

Then,

[pic]

since

[pic], [pic]

[pic], [pic]

[pic], [pic].

Example 3

[pic]

Another expression of matrix multiplication:

[pic]

where [pic] are [pic] matrices.

Example 2 (continue):

[pic]

Note:

Heuristically, the matrices A and B, [pic] and [pic], can be thought as [pic] and [pic] vectors. Thus,

[pic]

can be thought as the multiplication of [pic] and [pic] vectors. Similarly,

[pic]

can be thought as the multiplication of [pic] and [pic] vectors.

Note:

I. [pic] is not necessarily equal to [pic]. For instance, [pic]

[pic].

II. [pic] might be not equal to [pic]. For instance,

[pic]

[pic]

III. [pic], it is not necessary that [pic] or [pic]. For instance,

[pic]

[pic]

IV. [pic], [pic], [pic]

p factors

Also, [pic] is not necessarily equal to [pic].

V. [pic].

1.3 Trace

Definition of the trace of a matrix:

The sum of the diagonal elements of a [pic] square matrix is called the trace of the matrix, written [pic], i.e., for

[pic],

[pic].

Example 4:

Let [pic]. Then, [pic].

Section 2 Special Matrices

2.1 Symmetric matrices

Definition of symmetric matrix:

A [pic] matrix [pic] is defined as symmetric if [pic]. That is,

[pic].

Example 1:

[pic] is symmetric since [pic].

Example 2:

Let [pic] be random variables. Then,

[pic] [pic] … [pic]

[pic]

is called the covariance matrix, where [pic], is the covariance of the random variables [pic] and [pic] and [pic] is the variance of [pic]. V is a symmetric matrix. The correlation matrix for [pic] is defined as

[pic] [pic] … [pic]

[pic]

where [pic], is the correlation of [pic] and [pic]. R is also a symmetric matrix. For instance, let [pic] be the random variable represent the sale amount of some product and [pic] be the random variable represent the cost spent on advertisement. Suppose

[pic]

Then,

[pic]

and

[pic]

Example 3:

Let [pic] be a [pic] matrix. Then, both [pic] and [pic] are symmetric since

[pic] and [pic].

[pic] is a [pic] symmetric matrix while [pic] is a [pic] symmetric matrix.

[pic]

Also,

[pic]

Similarly,

[pic]

and

[pic]

For instance, let

[pic] and [pic].

Then,

[pic]

In addition,

[pic]

Note:

A and B are symmetric matrices. Then, AB is not necessarily equal to [pic]. That is, AB might not be a symmetric matrix.

Example:

[pic] and [pic].

Then,

[pic]

Properties of [pic] and [pic]:

(a)

[pic]

(b)

[pic]

[proof]

(a)

Let

[pic]

[pic].

Thus, for [pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

(b)

Since [pic]

[pic]

By (a),

[pic]

Note:

A [pic] matrix [pic] is defined as skew-symmetric if [pic]. That is,

[pic].

Example:

[pic]

Thus,

[pic][pic]

2.2 Idempotent matrices

Definition of idempotent matrices:

A square matrix K is said to be idempotent if

[pic]

Properties of idempotent matrices:

1. [pic] for r being a positive integer.

2. [pic] is idempotent.

3. If [pic] and [pic] are idempotent matrices and [pic]. Then,

[pic] is idempotent.

[proof:]

1.

For [pic] [pic].

Suppose [pic]is true, then [pic].

By induction, [pic] for r being any positive integer.

2.

[pic]

3.

[pic]

Example:

Let [pic] be a [pic] matrix. Then,

[pic] is an idempotent matrix since

[pic].

Note:

A matrix A satisfying [pic] is called nilpotent, and that for which [pic] could be called unipotent.

Example:

[pic] [pic] A is nilpotent.

[pic] [pic] B is unipotent.

Note:

[pic] is a idempotent matrix. Then, [pic] might not be idempotent.

2.3 Orthogonal matrices

Definition of orthogonality:

Two [pic] vectors u and v are said to be orthogonal if

[pic]

A set of [pic] vectors [pic] is said to be orthonormal if

[pic]

Definition of orthogonal matrix:

A [pic] square matrix P is said to be orthogonal if

[pic].

Note:

[pic]

[pic] [pic]

Thus,

[pic] and [pic]

are both orthonormal sets!!

Example:

(a) Helmert Matrices:

The Helmert matrix of order n has the first row

[pic],

and the other n-1 rows ([pic]) has the form,

[pic]

(i-1) items n-i items

For example, as [pic], then

[pic]

In statistics, we can use H to find a set of uncorrelated random variables. Suppose [pic] are random variables with

[pic]

Let

[pic]

Then,

[pic]

since [pic] is an orthonormal set of vectors. That is, [pic] are uncorrelated random variables. Also,

[pic],

where

[pic].

[pic]

(b) Givens Matrices:

Let the orthogonal matrix be

[pic]

G is referred to as a Givens matrix of order 2. For a Givens matrix of order 3, there are [pic] different forms,

1 2 3 1 2 3

[pic].

The general form of a Givens matrix [pic] of order 3 is an identity matrix except for 4 elements, [pic] and [pic] are in the i’th and j’th rows and columns. Similarly, For a Givens matrix of order 4, there are [pic] different forms,

1 2 3 4 1 2 3 4

[pic]

1 2 3 4 1 2 3 4

[pic]

1 2 3 4 1 2 3 4

[pic].

For the Givens matrix of order n, here are [pic] different forms. The general form of [pic] is an identity matrix except for 4 elements,

[pic].

2.4 Positive definite matrices:

Definition of positive definite matrix:

A symmetric [pic] matrix A satisfying

[pic] for all [pic],

is referred to as a positive definite (p.d.) matrix.

Intuition:

If [pic] for all real numbers x, [pic], then the real number a is positive. Similarly, as x is a [pic] vector, A is a [pic] matrix and [pic], then the matrix A is “positive”.

Note:

A symmetric [pic] matrix A satisfying

[pic] for all [pic],

is referred to as a positive semidefinite (p.d.) matrix.

Example:

Let

[pic] and [pic].

Thus,

[pic]

Let [pic]. Then, A is positive semidefinite since for [pic]

[pic].

Section 3 Determinants

Calculation of Determinants:

There are several ways to obtain the determinants of a matrix. The determinant can be obtained:

(a) Using the definition of the determinant.

(b) Using the cofactor expansion of a matrix.

(c) Using the properties of the determinant.

3.1 Definition

Definition of permutation:

Let [pic] be the set of integers from 1 to n. A rearrangement [pic] of the elements of [pic] is called a permutation of [pic].

Example 1:

Let [pic]. Then, 123, 231, 312, 132, 213, 321 are 6 permutations of [pic].

Note: there are [pic] permutations of [pic].

Example 1 (continue):

[pic] no inversion. [pic] 1 inversion (32)

[pic] 1 inversion (21) [pic] 2 inversion (21, 31)

[pic] 2 inversion (31, 32) [pic] 3 inversion (21, 32, 31)

Definition of even and odd permutations:

When a total number of inversions of [pic] is even, then [pic] is called a even permutation. When a total number of inversions of [pic] is odd, then [pic] is called a odd permutation.

Definition of n-order determinant:

Let [pic] be an [pic] square matrix. We define the determinant of A (written as [pic] or [pic]) by[pic]

[pic],

where [pic] is a permutation of [pic].

As [pic] is a even permutation, then [pic]. As [pic] is a odd permutation, then [pic].

Note: [pic]. Any two of [pic] are not in the same row and also not in the same column.

Example:

[pic].

Then, there are 6 terms in the determinant of A,

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

For instance,

[pic]

3.2 Cofactor expansion

Definition of cofactor:

Let [pic] be [pic] matrix. The cofactor of [pic] is defined as

[pic],

where [pic] is the [pic] submatrix of A by deleting the i’th row of

j’th column o.f A

Example:

Let

[pic]

Then,

[pic],

[pic],

[pic]

Thus,

[pic]

Important result:

Let [pic] be an [pic] matrix. Then,

[pic]

In addition,

[pic]

Example (continue):

[pic]

Thus,

[pic]

Also,

[pic]

In addition,

[pic]

Similarly,

[pic]

3.3 Properties of determinant

Let A be a [pic] matrix.

(a) [pic]

(b) If two rows (or columns) of A are equal, then [pic].

(c) If a row (or column) of A consists entirely of 0, then [pic]

Example:

Let [pic]. Then,

[pic]

[pic].

[pic]

(d) If B result from the matrix A by interchanging two rows (or columns) of A, then [pic].

(e) If B results from A by multiplying a row (or column) of A by a real number c, [pic] for some i, then [pic].

(f) If B results from A by adding [pic] to

[pic], i.e., [pic]

(or [pic]), then [pic]

Example:

Let

[pic]

Since B results from A by interchanging the first two rows of A,

[pic]

Example:

Let

[pic].

[pic],

since [pic]

Example:

Let

[pic].

[pic],

since [pic]

(g) If a matrix [pic] is upper triangular (or lower triangular), then

[pic].

(h) [pic]

If A is nonsingular, then [pic].

(i). [pic]

Example:

Let

[pic].

[pic]

Example:

Let

[pic].

Then,

[pic].

property (g)

Example:

Let

[pic].

[pic]

Thus,

[pic].

and

[pic].

Example:

Let

[pic]

[pic]

property (i)

Example:

Let

[pic] if [pic].

Compute (i) [pic] (ii) [pic].

[solution]

(i)

[pic]

(ii)

[pic]

(j) For [pic] square matrices P, Q, and X,

[pic]

and

[pic],

where I is an identity matrix.

Example:

Let

[pic]

Then,

[pic].

property (j)

Efficient method to compute determinant:

To calculate the determinant of a complex matrix A, a more efficient method is to

transform the matrix into a upper triangular matrix or a lower triangular matrix

via elementary row operations. Then, the determinant of A is the product of the

diagonal elements of the upper triangular matrix.

Example:

[pic]

[pic]

Note:

[pic] is not necessarily equal to [pic]. For example,

[pic].

3.4 Applications of determinant

(a) Inverse Matrix:

Definition of adjoint:

The [pic] matrix [pic], called the adjoint of A, is

[pic].

Important result:

[pic]

and

[pic]

Example (continue):

[pic]

and

[pic].

(b) Cramer’s Rule:

For linear system [pic], if [pic], then the system has the unique solution,

[pic],

where [pic] is the matrix obtained by replacing the i’th column of A by b.

Example:

Please solve for the following system of linear equations by Cramer’s rule,

[pic]

[solution:]

The coefficient matrix A and the vector b are

[pic],

respectively. Then,

[pic]

Thus, [pic].

Note:

Determinant plays a key role in the study of eigenvalues and eigenvectors which will be introduced later.

3.5 Diagonal expansion

Let

[pic].

Then,

[pic]

where [pic]. Note that

[pic].

Similarly,

[pic].

Then,

[pic]

where [pic]. Note that

[pic]

In the above two expansions, we can obtain the determinants of A+D by the following steps:

1. Expand the products of the diagonal elements of A+D,

[pic] or [pic]

2. Replace [pic] by [pic] or [pic] by [pic].

In general, denote [pic],

[pic]

Then, for

[pic],

the determinants of A+D by the following steps:

1. Expand the products of the diagonal elements of A+D,

[pic]

2. Replace [pic] by [pic].

Example:

For

[pic],

[pic]

Section 4 Inverse Matrix

4.1 Definition

Definition of inverse matrix:

An [pic] matrix A is called nonsingular or invertible if there exists an [pic] matrix B such that

[pic],

where [pic] is a [pic] identity matrix. The matrix B is called an inverse of A. If there exists no such matrix B, then A is called singular or noninvertible.

is called a odd permutation.

Theorem:

If A is an invertible matrix, then its inverse is unique.

[proof:]

Suppose B and C are inverses of A. Then,

[pic].

Note:

Since the inverse of a nonsingular matrix A is unique, we denoted the inverse of A as [pic].

Note:

If A is not a square matrix, then

• there might be more than one matrix L such that

[pic].

• there might be some matrix U such that

[pic]

Example:

Let

[pic].

Then,

• there are infinite number of matrices L such that [pic], for example

[pic] or [pic].

• As [pic],

[pic] but [pic].

4.2 Calculation of inverse matrix

1. Using Gauss-Jordan reduction:

The procedure for computing the inverse of a [pic] matrix A:

1. Form the [pic] augmented matrix

[pic]

and transform the augmented matrix to the matrix

[pic]

in reduced row echelon form via elementary row operations.

2. If

(a) [pic], then [pic].

(b) [pic], then [pic] is singular and [pic] does not exist.

Example:

To find the inverse of [pic], we can employ the procedure introduced above.

1.

[pic].

[pic] [pic]

[pic] [pic]

[pic] [pic]

[pic] [pic]

2. The inverse of A is

[pic].

Example:

Find the inverse of [pic]if it exists.

[solution:]

1. Form the augmented matrix

[pic].

And the transformed matrix in reduced row echelon form is

[pic]

2. The inverse of A is

[pic].

Example:

Find the inverse of [pic]if it exists.

[solution:]

1. Form the augmented matrix

[pic].

And the transformed matrix in reduced row echelon form is

[pic]

2. A is singular!!

2. Using the adjoint [pic] of a matrix:

As [pic], then

[pic].

Note:

[pic] is always true.

Note:

As [pic] [pic] A is nonsingular.

4.3 Properties of the inverse matrix

The inverse matrix of an [pic] nonsingular matrix A has the following important properties:

1. [pic].

2. [pic]

3. If A is symmetric, So is its inverse.

4. [pic]

5. If C is an invertible matrix, then

1. [pic]

2. [pic].

6. As [pic]exists, then

[pic].

[proof of 2]

[pic]

similarly,

[pic].

[proof of 3:]

By property 2,

[pic].

[proof of 4:]

[pic].

Similarly,

[pic].

[proof of 5:]

Multiplied by the inverse of C, then

[pic].

Similarly,

[pic].

[proof of 6:]

[pic]

[pic].

Multiplied by [pic] on both sides, we have

[pic].

[pic]

can be obtained by using similar procedure.

Example:

Prove that [pic].

[proof:]

[pic]

Similar procedure can be used to obtain

[pic]

4.4 Left and right inverses:

Definition of left inverse:

For a matrix A,

[pic],

with more than one such L. Then, the matrices L are called left inverse of A.

Definition of right inverse:

For a matrix A,

[pic],

with more than one such R. Then, the matrices R are called left inverse of A.

Theorem:

A [pic] matrix [pic] has left inverses only if [pic].

[proof:]

We prove that a contradictory result can be obtained as [pic]and [pic] having a left inverse. For [pic], let

[pic]

Then, suppose

[pic]

is the left inverse of [pic]. Then,

[pic].

Thus,

[pic]

Since [pic]and both M and X are square matrices, then [pic].

Therefore,

[pic].

However,

[pic].

It is contradictory. Therefore, as [pic], [pic] has no left inverse.

Theorem:

A [pic] matrix [pic] has right inverses only if [pic].

Section 5 Eigen-analysis

5.1 Definition:

Let A be an [pic] matrix. The real number [pic] is called an eigenvalue of A if there exists a nonzero vector x in [pic] such that

[pic].

The nonzero vector x is called an eigenvector of A associated with the eigenvalue [pic].

Example 1:

Let

[pic].

As [pic], then [pic].

Thus, [pic] is the eigenvector of A associated with the eigenvalue [pic].

Similarly,

As [pic], then [pic]

Thus, [pic] is the eigenvector of A associated with the eigenvalue [pic].

Note: Let x be the eigenvector of A associated with some eigenvalue [pic]. Then, [pic], [pic], is also the eigenvector of A associated with the same eigenvalue [pic] since

[pic].

5.2 Calculation of eigenvalues and eigenvectors:

Motivating Example:

Let

[pic].

Find the eigenvalues of A and their associated eigenvectors.

[solution:]

Let [pic] be the eigenvector associated with the eigenvalue [pic]. Then,

[pic].

Thus,

[pic] is the nonzero (nontrivial) solution of the homogeneous linear system [pic]. [pic] [pic] is singular [pic] [pic].

Therefore,

[pic]

[pic].

1. As [pic],

[pic].

[pic]

[pic]

2. As [pic],

[pic].

[pic]

[pic]

Note:

In the above example, the eigenvalues of A satisfy the following equation

[pic].

After finding the eigenvalues, we can further solve the associated homogeneous system to find the eigenvectors.

Definition of the characteristic polynomial:

Let [pic]. The determinant

[pic],

is called the characteristic polynomial of A.

[pic],

is called the characteristic equation of A.

Theorem:

A is singular if and only if 0 is an eigenvalue of A.

[proof:]

[pic].

A is singular [pic] [pic] has non-trivial solution [pic] There exists a nonzero vector x such that

[pic].

[pic] x is the eigenvector of A associated with eigenvalue 0.

[pic]

0 is an eigenvalue of A [pic] There exists a nonzero vector x such that

[pic].

[pic] The homogeneous system [pic] has nontrivial (nonzero) solution.

[pic] A is singular.

Theorem:

The eigenvalues of A are the real roots of the characteristic polynomial of A.

[pic]

Let [pic] be an eigenvalue of A associated with eigenvector u. Also, let [pic] be the characteristic polynomial of A. Then,

[pic] [pic] [pic] [pic] The homogeneous system has nontrivial (nonzero) solution x [pic] [pic] is singular [pic]

[pic].

[pic] [pic] is a real root of [pic].

[pic]

Let [pic] be a real root of [pic] [pic] [pic] [pic] [pic] is a singular matrix [pic] There exists a nonzero vector (nontrivial solution) v such that

[pic].

[pic] v is the eigenvector of A associated with the eigenvalue [pic].



Procedure of finding the eigenvalues and eigenvectors of A:

1. Solve for the real roots of the characteristic equation [pic]. These real roots [pic]are the eigenvalues of A.

2. Solve for the homogeneous system [pic] or [pic], [pic]. The nontrivial (nonzero) solutions are the eigenvectors associated with the eigenvalues [pic].

Example:

Find the eigenvalues and eigenvectors of the matrix

[pic].

[solution:]

[pic]

[pic]and 10.

1. As [pic],

[pic].

[pic]

Thus,

[pic],

are the eigenvectors associated with eigenvalue [pic].

2. As [pic],

[pic].

[pic]

Thus,

[pic],

are the eigenvectors associated with eigenvalue [pic].

Example:

[pic].

Find the eigenvalues and the eigenvectors of A.

[solution:]

[pic]

[pic]and 6.

3. As [pic],

[pic].

[pic]

Thus,

[pic],

are the eigenvectors associated with eigenvalue [pic].

4. As [pic],

[pic].

[pic]

Thus,

[pic],

are the eigenvectors associated with eigenvalue [pic].

Note:

In the above example, there are at most 2 linearly independent eigenvectors [pic] and [pic] for [pic] matrix A.

The following theorem and corollary about the independence of the eigenvectors:

Theorem:

Let [pic] be the eigenvectors of a [pic] matrix A associated with distinct eigenvalues [pic], respectively, [pic]. Then, [pic] are linearly independent.

[proof:]

Assume [pic] are linearly dependent. Then, suppose the dimension of the vector space V generated by [pic] is [pic]

(i.e. the dimension of V[pic]the vector space generated by [pic]). There exists j linearly independent vectors of [pic] which also generate V. Without loss of generality, let [pic] be the j linearly independent vectors which generate V (i.e., [pic] is a basis of V). Thus,

[pic],

[pic] are some real numbers. Then,

[pic]

Also,

[pic]

Thus,

[pic].

Since [pic] are linearly independent,

[pic].

Futhermore,

[pic] are distinct, [pic]

[pic]

It is contradictory!!

Corollary:

If a [pic] matrix A has n distinct eigenvalues, then A has n linearly independent eigenvectors.

5.3 Properties of eigenvalues and eigenvectors:

(a)

Let [pic] be the eigenvector of [pic] associated with the eigenvalue [pic]. Then, the eigenvalue of

,

associated with the eigenvector [pic] is

[pic],

where [pic] are real numbers and [pic] is a positive integer.

[proof:]

[pic]

since

[pic].

Example:

[pic],

what is the eigenvalues of [pic].

[solution:]

The eigenvalues of A are -5 and 7. Thus, the eigenvalues of A are

[pic]

and

[pic].

Example:

Let [pic] be the eigenvalue of A. Then, we denote

[pic].

Then, [pic] has eigenvalue

[pic].

Note:

Let [pic] be the eigenvector of A associated with the eigenvalue [pic]. Then, [pic] is the eigenvector of [pic] associated with the eigenvalue [pic].

[proof:]

[pic].

Therefore, [pic] is the eigenvector of [pic] associated with the eigenvalue [pic].

(b)

Let [pic] be the eigenvalues of A ([pic] are not necessary to be distinct). Then,

[pic] and [pic].

[proof:]

[pic].

Thus,

[pic]

Therefore,

[pic].

Also, by diagonal expansion on the following determinant

[pic],

and by the expansion of

[pic],

therefore,

[pic].

Example:

[pic],

The eigenvalues of A are [pic] and [pic]. Then,

[pic]

and

[pic].

5.4 Diagonalization of a matrix

(a) Definition and procedure for diagonalization of a matrix

Definition:

A matrix A is diagonalizable if there exists a nonsingular matrix P and a diagonal matrix D such that

[pic].

Example:

Let

[pic].

Then,

[pic]

where

[pic].

Theorem:

An [pic] matrix A is diagonalizable if and only if it has n linearly independent eigenvector.

[proof:]

[pic]

A is diagonalizable. Then, there exists a nonsingular matrix P and a diagonal matrix

[pic],

such that

[pic].

Then,

[pic]

That is,

[pic]

are eigenvectors associated with the eigenvalues [pic].

Since P is nonsingular, thus [pic] are linearly independent.

[pic]

Let [pic] be n linearly independent eigenvectors of A associated with the eigenvalues [pic]. That is,

[pic]

Thus, let

[pic]

and

[pic].

Since[pic],

[pic].

Thus,

[pic],

[pic] exists because [pic] are linearly independent and thus P is nonsingular.

Important result:

An [pic] matrix A is diagonalizable if all the roots of its characteristic equation are real and distinct.

Example:

Let

[pic].

Find the nonsingular matrix P and the diagonal matrix D such that

[pic]

and find [pic] is any positive integer.

[solution:]

We need to find the eigenvalues and eigenvectors of A first. The characteristic equation of A is

[pic].

[pic].

By the above important result, A is diagonalizable. Then,

1. As [pic],

[pic]

2. As [pic],

[pic]

Thus,

[pic] and [pic]

are two linearly independent eigenvectors of A.

Let

[pic] and [pic].

Then, by the above theorem,

[pic].

To find [pic],

[pic]

n times

Multiplied by [pic] and [pic] on the both sides,

[pic]

Note:

For any [pic] diagonalizable matrix A, [pic] then

[pic]

where

[pic].

Example:

Is [pic] diagonalizable?

[solution:]

[pic].

Then, [pic].

As [pic]

[pic]

Therefore, all the eigenvectors are spanned by [pic]. There does not exist two linearly independent eigenvectors. By the previous theorem, A is not diagonalizable.

Note:

An [pic] matrix may fail to be diagonalizable since

• Not all roots of its characteristic equation are real numbers.

• It does not have n linearly independent eigenvectors.

Note:

The set [pic] consisting of both all eigenvectors of an [pic] matrix A associated with eigenvalue [pic] and zero vector 0 is a subspace of [pic]. [pic] is called the eigenspace associated with [pic].

5.5 Diagonalization of symmetric matrix

Theorem:

If A is an [pic] symmetric matrix, then the eigenvectors of A associated with distinct eigenvalues are orthogonal.

[proof:]

Let [pic] and [pic] be eigenvectors of A associated with distinct eigenvalues [pic] and [pic], respectively, i.e.,

[pic] and [pic].

Thus,

[pic]

and

[pic].

Therefore,

[pic].

Since [pic], [pic].

Example:

Let

[pic].

A is a symmetric matrix. The characteristic equation is

[pic].

The eigenvalues of A are [pic]. The eigenvectors associated with these eigenvalues are

[pic].

Thus,

[pic] are orthogonal.

Very Important Result:

If A is an [pic] symmetric matrix, then there exists an orthogonal matrix P such that

[pic],

where [pic] are n linearly independent eigenvectors of A and the diagonal elements of D are the eigenvalues of A associated with these eigenvectors.

Example:

Let

[pic].

Please find an orthogonal matrix P and a diagonal matrix D such that [pic].

[solution:]

We need to find the orthonormal eigenvectors of A and the associated eigenvalues first. The characteristic equation is

[pic].

Thus, [pic]

1. As [pic]solve for the homogeneous system

[pic].

The eigenvectors are

[pic]

[pic] and [pic] are two eigenvectors of A. However, the two eigenvectors are not orthogonal. We can obtain two orthonormal eigenvectors via Gram-Schmidt process. The orthogonal eigenvectors are

[pic].

Standardizing these two eigenvectors results in

[pic].

2. As [pic]solve for the homogeneous system

[pic].

The eigenvectors are

[pic].

[pic] is an eigenvectors of A. Standardizing the eigenvector results in

[pic].

Thus,

[pic],

[pic],

and [pic].

Note:

For a set of vectors [pic], we can find a set of orthogonal vectors [pic] via Gram-Schmidt process:

[pic]

Section 6 Applications

1. Differential Operators

Definition of differential operator:

Let

[pic]

Then,

[pic]

Example 1:

Let

[pic]

Then,

[pic]

Example 2:

Let

[pic]

Then,

[pic]

Note:

In example 2,

[pic],

where

[pic].

Then,

[pic].

Theorem:

[pic]

Theorem:

Let A be an [pic] matrix and x be a [pic] vector. Then,

[pic]

[proof:]

[pic]

Then, the k’th element of [pic] is

[pic][pic]

while the k’th element of [pic] is

[pic].

Therefore,

[pic].

Corollary:

Let A be an [pic] symmetric matrix, Then,

[pic].

Example 3:

[pic].

Then,

[pic]

[pic]

Example 3:

For standard linear regression model

[pic]

The least square estimate b is the minimizer of

[pic].

To find b, we need to solve

[pic].

Thus,

[pic]

Theorem:

[pic]

Then,

[pic]

where

[pic]

Note:

Let [pic] be a function of x. Then,

[pic].

Example 4:

Let

[pic],

where X is an [pic] matrix, I is an [pic] identity matrix, and [pic] is a constant. Then,

[pic]

2. Vectors of random variable

In this section, the following topics will be discussed:

• Expectation and covariance of vectors of random variables

• Mean and variance of quadratic forms

• Independence of random variables and chi-square distribution

Expectation and covariance

Let [pic] be random variables. Let

[pic]

be the random matrix.

Definition:

[pic].

Let [pic] and [pic] be the [pic] and [pic] random vectors, respectively. The covariance matrix is

[pic]

and the variance matrix is

[pic]

Theorem:

[pic] are two matrices, then

[pic].

[proof:]

Let

[pic]

where

[pic]

Thus,

[pic]

Let

[pic]

where

[pic]

Thus,

[pic]

Since [pic] for every [pic], then [pic].

Results:

• [pic]

• [pic]

Mean and variance of quadratic Forms

Theorem:

Let [pic] be an [pic] vector of random variables and [pic] be an [pic] symmetric matrix. If [pic] and

[pic]. Then,

[pic],

where [pic] is the sum of the diagonal elements of the matrix M.

[proof:]

[pic]

Then,

[pic]

On the other hand,

[pic]

Then,

[pic]

Thus,

[pic]

Theorem:

[pic],

where [pic] and [pic].

Note:

For a random variable X, [pic] and [pic]. Then

[pic].

Corollary:

If [pic] are independently normally distributed and have common variance [pic]. Then

[pic].

Theorem:

If [pic] are independently normally distributed and have common variance [pic]. Then

[pic].

Independence of random variables and chi-square distribution

Definition of Independence:

Let [pic] and [pic] be the [pic] and [pic] random vectors, respectively. Let [pic] and [pic] be the density functions of X and Y, respectively. Two random vectors X and Y are said to be (statistically) independent if the joint density function

[pic]

Chi-Square Distribution:

[pic] has the density function

[pic],

where [pic] is gamma function. Then, the moment generating function is

[pic]

and the cumulant generating function is

[pic].

Thus,

[pic]

and

[pic]

Theorem:

If [pic] and [pic] is statistically independent of [pic]. Then, [pic].

[proof:]

[pic]

Thus,

[pic]

Therefore, [pic].

3. Multivariate normal distribution

In this chapter, the following topics will be discussed:

• Definition

• Moment generating function and independence of normal variables

• Quadratic forms in normal variable

Definition

Intuition:

Let [pic]. Then, the density function is

[pic]

Definition (Multivariate Normal Random Variable):

A random vector

[pic]

with [pic] has the density function

[pic]

Theorem:

[pic]

[proof:]

Since [pic] is positive definite, [pic], where [pic] is a real orthogonal matrix ([pic]) and [pic]. Then,

[pic]. Thus,

[pic]

where [pic]. Further,

[pic]

Therefore, if we can prove [pic] and [pic] are mutually independent, then

[pic].

The joint density function of [pic] is

[pic],

where

[pic]

Therefore, the density function of [pic]

[pic]

Therefore, [pic] and [pic] are mutually independent.

Moment generating function and independence of normal random variables

Moment Generating Function of Multivariate Normal Random Variable:

Let

[pic].

Then, the moment generating function for Y is

[pic]

Theorem:

If [pic] and C is a [pic] matrix of rank p, then

[pic].

[proof:]

Let [pic]. Then,

[pic]

Since [pic] is the moment generating function of [pic],

[pic]. ◆

Corollary:

If [pic] then

[pic],

where T is an orthogonal matrix.

Theorem:

If [pic], then the marginal distribution of subset of the elements of Y is also multivariate normal.

[pic]

[pic], then [pic], where

[pic]

Theorem:

Y has a multivariate normal distribution if and only if [pic] is univariate normal for all real vectors a.

[proof:]

[pic]

Suppose [pic]. [pic] is univariate normal. Also,

[pic].

Then, [pic]. Since

[pic]

Since

[pic],

is the moment generating function of [pic], thus Y has a multivariate distribution [pic].

[pic]

By the previous theorem. ◆

Quadratic form in normal variables

Theorem:

If [pic] and let P be an [pic] symmetric matrix of rank r. Then,

[pic]

is distributed as [pic] if and only if [pic] (i.e., P is idempotent).

[proof]

[pic]

Suppose [pic] and [pic]. Then, P has r eigenvalues equal to 1 and [pic] eigenvalues equal to 0. Thus, without loss generalization,

[pic]where T is an orthogonal matrix. Then,

[pic]

Since [pic] and [pic], thus

[pic].

[pic] are i.i.d. normal random variables with common variance [pic]. Therefore,

[pic]

[pic]

Since P is symmetric, [pic], where T is an orthogonal matrix and [pic] is a diagonal matrix with elements [pic]. Thus, let [pic]. Since [pic],

[pic].

That is, [pic] are independent normal random variable with variance [pic]. Then,

[pic]

The moment generating function of [pic] is

[pic][pic]

Also, since Q is distributed as [pic], the moment generating function is also equal to [pic]. Thus, for every t,

[pic]

Further,

[pic].

By the uniqueness of polynomial roots, we must have [pic]. Then, [pic] by the following result:

a matrix P is symmetric, then P is idempotent and rank r if and only if it has r eigenvalues equal to 1 and n-r eigenvalues equal to 0. ◆

Important Result:

Let [pic] and let [pic] and [pic]be both distributed as chi-square. Then, [pic] and [pic] are independent if and only if [pic].

Useful Lemma:

If [pic], [pic] and [pic] is semi-positive definite, then

• [pic]

• [pic] is idempotent.

Theorem:

If [pic] and let

[pic]

If [pic], then [pic] and [pic] are independent and [pic].

[proof:]

We first prove [pic]. [pic], thus

[pic]

Since [pic], [pic] is any vector in [pic]. Therefore, [pic] is semidefinite. By the above useful lemma, [pic] is idempotent. Further, by the previous theorem,

[pic]

since

[pic]

We now prove [pic] and [pic] are independent. Since

[pic]

By the previous important result, the proof is complete. ◆

6.4 Linear regression

Let

[pic].

Denote

[pic].

In linear algebra,

[pic]

is the linear combination of the column vector of [pic]. That is,

[pic].

Then,

[pic]

Least square method is to find the appropriate [pic] such that the distance between [pic] and [pic] is smaller than the one between [pic] and the other linear combination of the column vectors of [pic], for example, [pic]. Intuitively, [pic] is the information provided by covariates [pic] to interpret the response [pic]. Thus, [pic] is the information which interprets [pic] most accurately. Further,

[pic]

If we choose the estimate [pic] of [pic] such that [pic] is orthogonal every vector in [pic], then [pic]. Thus,

[pic].

That is, if we choose [pic] satisfying [pic], then

[pic]

and for any other estimate [pic] of [pic],

[pic].

Thus, [pic] satisfying [pic] is the least square estimate. Therefore,

[pic]

Since

[pic],

[pic] is called the projection matrix or hat matrix. [pic] projects the response vector [pic] on the space spanned by the covariate vectors. The vector of residuals is

[pic].

We have the following two important theorems.

Theorem:

1. [pic] and [pic] are idempotent.

2. [pic]

3. [pic]

4.

[pic]

[proof:]

1.

[pic]

and

[pic].

2.

Since [pic] is idempotent, [pic]. Thus,

[pic]

Similarly,

[pic]

3.

[pic]

4.

[pic]

Thus,

[pic]

Therefore,

[pic]

Theorem:

If [pic] where [pic] is a [pic] matrix of rank [pic]. Then,

1. [pic]

2. [pic]

3. [pic]

4. [pic] is independent of

[pic].

[proof:]

1.

Since for a normal random variable [pic],

[pic]

thus for [pic]

[pic]

2.

[pic]

Thus,

[pic].

3.

[pic] and [pic], thus

[pic]Since

[pic],

[pic].

4.

Let

[pic]

where

[pic]

and

[pic]

Since

[pic]

and by the previous result,

[pic],

therefore,

[pic]

is independent of

[pic].

[pic]

6.5 Principal component analysis

Definition:

Suppose the data [pic] generated by the random variable [pic]. Suppose the covariance matrix of Z is

[pic]

Let [pic] [pic] the linear combination of [pic]. Then,

[pic]

and

[pic],

where [pic].

The principal components are those uncorrelated linear combinations [pic] whose variance [pic] are as large as possible, where [pic] are [pic] vectors.

|The procedure to obtain the principal components is as follows: |

| |

|First principal component [pic] linear combination [pic] that maximizes |

|[pic] |

|subject to [pic]and [pic][pic] for any [pic] |

|Second principal component [pic] linear combination [pic] that maximizes [pic] subject to [pic], [pic] and [pic].[pic] [pic] |

|maximize [pic] and is also uncorrelated to the first principal component. |

|[pic] |

|[pic] |

|At the i’th step, |

|i’th principal component [pic] linear combination [pic] that maximizes |

|[pic] subject to [pic], [pic] and [pic].[pic] [pic] maximize [pic] and is also uncorrelated to the first (i-1) principal |

|component. |

Intuitively, these principal components with large variance contain “important” information. On the other hand, those principal components with small variance might be “redundant”. For example, suppose we have 4 variables, [pic] and [pic]. Let [pic] and [pic]. Also, suppose [pic] are mutually uncorrelated. Thus, among these 4 variables, only 3 of them are required since two of them are the same. As using the procedure to obtain the principal components above, then the first principal component is

[pic],

the second principal component is

[pic],

the third principal component is

, [pic]

and the fourth principal component is

[pic].

Therefore, the fourth principal component is redundant. That is, only 3 “important” pieces of information hidden in [pic] and [pic].

Theorem:

[pic] are the eigenvectors of [pic] corresponding to eigenvalues

[pic]. In addition, the variance of the principal components are the eigenvalues [pic]. That is [pic].

[justification:]

Since [pic] is symmetric and nonsigular, [pic], where P is an orthonormal matrix, [pic] is a diagonal matrix with diagonal elements [pic], the i’th column of P is the orthonormal vector [pic] ([pic] and [pic] is the eigenvalue of [pic] corresponding to [pic]. Thus,

[pic].

For any unit vector [pic] ([pic]

is a basis of [pic]),

[pic][pic], [pic],

[pic],

and

[pic].

Thus, [pic] is the first principal component and [pic].

Similarly, for any vector c satisfying [pic], then

[pic]

where [pic] and . [pic]. Then,

[pic]

and

[pic].

Thus, [pic] is the second principal component and [pic].

The other principal components can be justified similarly.

Estimation:

The above principal components are the theoretical principal components. To find the “estimated” principal components, we estimate the theoretical variance-covariance matrix [pic] by the sample variance-covariance [pic],

[pic],

where

[pic][pic],

and where [pic]. Then, suppose [pic] are orthonormal eigenvectors of [pic] corresponding to the eigenvalues [pic]. Thus, the i’th estimated principal component is [pic] and the estimated variance of the i’th estimated principal component is [pic].

6.6 Discriminant Analysis:

Suppose we have two populations. Let [pic] be the [pic] observations from population 1 and let [pic] be [pic] observations from population2. Note that [pic],[pic] are [pic] vectors. The Fisher’s discriminant method is to project these [pic] vectors to the real values via a linear function [pic] and try to separate the two populations as much as possible, where a is some [pic] vector.

|Fisher’s discriminant method is as follows: |

|Find the vector [pic] maximizing the separation function [pic], |

|[pic], |

|where [pic] and |

|[pic] |

Intuition of Fisher’s discriminant method:

[pic] [pic] [pic]

[pic] [pic] [pic]

R [pic] [pic]

As far as possible by finding [pic]

Intuitively, [pic] measures the difference between the transformed means [pic] relative to the sample standard deviation [pic]. If the transformed observations [pic] and [pic] are completely separated, [pic] should be large as the random variation of the transformed data reflected by [pic] is also considered.

Important result:

The vector [pic] maximizing the separation [pic] is

[pic]

, where

[pic],

[pic],

and where [pic]

[pic] and [pic].

Justification:

[pic].

Similarly, [pic].

Also,

[pic]

[pic].

Similarly,

[pic]

Thus,

[pic]

[pic]

[pic]

[pic]

Thus,

[pic]

[pic] can be found by solving the equation based on the first derivative of [pic],

[pic]

Further simplification gives

[pic].

Multiplied by the inverse of the matrix [pic] on the two sides gives

[pic],[pic]

Since [pic] is a real number,

[pic].

-----------------------

[pic]

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download