Objective of this course: introduce basic concepts and ...
Objective of this course: introduce basic concepts and skills in matrix algebra. In addition, some applications of matrix algebra in statistics are described.
Section 1. Matrix Operations
1. Basic matrix operation
Definition of [pic] matrix
An [pic] matrix A is a rectangular array of rc real numbers arranged in r horizontal rows and c vertical columns:
[pic].
The i’th row of A is
[pic],
and the j’th column of A is
[pic]
We often write A as
[pic].
Matrix addition:
Let
[pic],
[pic],
[pic].
Then,
[pic],
[pic]
and the transpose of A is denoted as
[pic]
Example 1:
Let
[pic] and [pic].
Then,
[pic],
[pic]
and
[pic].
1.2 Matrix multiplication
We first define the dot product or inner product of n-vectors.
Definition of dot product:
The dot product or inner product of the n-vectors
[pic] and [pic],
are
[pic].
Example 1:
Let [pic] and [pic]. Then, [pic].
Definition of matrix multiplication:
[pic]
[pic]
[pic]
That is,
[pic]
Example 2:
[pic].
Then,
[pic]
since
[pic], [pic]
[pic], [pic]
[pic], [pic].
Example 3
[pic]
Another expression of matrix multiplication:
[pic]
where [pic] are [pic] matrices.
Example 2 (continue):
[pic]
Note:
Heuristically, the matrices A and B, [pic] and [pic], can be thought as [pic] and [pic] vectors. Thus,
[pic]
can be thought as the multiplication of [pic] and [pic] vectors. Similarly,
[pic]
can be thought as the multiplication of [pic] and [pic] vectors.
Note:
I. [pic] is not necessarily equal to [pic]. For instance, [pic]
[pic].
II. [pic] might be not equal to [pic]. For instance,
[pic]
[pic]
III. [pic], it is not necessary that [pic] or [pic]. For instance,
[pic]
[pic]
IV. [pic], [pic], [pic]
p factors
Also, [pic] is not necessarily equal to [pic].
V. [pic].
1.3 Trace
Definition of the trace of a matrix:
The sum of the diagonal elements of a [pic] square matrix is called the trace of the matrix, written [pic], i.e., for
[pic],
[pic].
Example 4:
Let [pic]. Then, [pic].
Section 2 Special Matrices
2.1 Symmetric matrices
Definition of symmetric matrix:
A [pic] matrix [pic] is defined as symmetric if [pic]. That is,
[pic].
Example 1:
[pic] is symmetric since [pic].
Example 2:
Let [pic] be random variables. Then,
[pic] [pic] … [pic]
[pic]
is called the covariance matrix, where [pic], is the covariance of the random variables [pic] and [pic] and [pic] is the variance of [pic]. V is a symmetric matrix. The correlation matrix for [pic] is defined as
[pic] [pic] … [pic]
[pic]
where [pic], is the correlation of [pic] and [pic]. R is also a symmetric matrix. For instance, let [pic] be the random variable represent the sale amount of some product and [pic] be the random variable represent the cost spent on advertisement. Suppose
[pic]
Then,
[pic]
and
[pic]
Example 3:
Let [pic] be a [pic] matrix. Then, both [pic] and [pic] are symmetric since
[pic] and [pic].
[pic] is a [pic] symmetric matrix while [pic] is a [pic] symmetric matrix.
[pic]
Also,
[pic]
Similarly,
[pic]
and
[pic]
For instance, let
[pic] and [pic].
Then,
[pic]
In addition,
[pic]
Note:
A and B are symmetric matrices. Then, AB is not necessarily equal to [pic]. That is, AB might not be a symmetric matrix.
Example:
[pic] and [pic].
Then,
[pic]
Properties of [pic] and [pic]:
(a)
[pic]
(b)
[pic]
[proof]
(a)
Let
[pic]
[pic].
Thus, for [pic]
[pic]
[pic]
[pic]
[pic]
[pic]
[pic]
(b)
Since [pic]
[pic]
By (a),
[pic]
Note:
A [pic] matrix [pic] is defined as skew-symmetric if [pic]. That is,
[pic].
Example:
[pic]
Thus,
[pic][pic]
2.2 Idempotent matrices
Definition of idempotent matrices:
A square matrix K is said to be idempotent if
[pic]
Properties of idempotent matrices:
1. [pic] for r being a positive integer.
2. [pic] is idempotent.
3. If [pic] and [pic] are idempotent matrices and [pic]. Then,
[pic] is idempotent.
[proof:]
1.
For [pic] [pic].
Suppose [pic]is true, then [pic].
By induction, [pic] for r being any positive integer.
2.
[pic]
3.
[pic]
Example:
Let [pic] be a [pic] matrix. Then,
[pic] is an idempotent matrix since
[pic].
Note:
A matrix A satisfying [pic] is called nilpotent, and that for which [pic] could be called unipotent.
Example:
[pic] [pic] A is nilpotent.
[pic] [pic] B is unipotent.
Note:
[pic] is a idempotent matrix. Then, [pic] might not be idempotent.
2.3 Orthogonal matrices
Definition of orthogonality:
Two [pic] vectors u and v are said to be orthogonal if
[pic]
A set of [pic] vectors [pic] is said to be orthonormal if
[pic]
Definition of orthogonal matrix:
A [pic] square matrix P is said to be orthogonal if
[pic].
Note:
[pic]
[pic] [pic]
Thus,
[pic] and [pic]
are both orthonormal sets!!
Example:
(a) Helmert Matrices:
The Helmert matrix of order n has the first row
[pic],
and the other n-1 rows ([pic]) has the form,
[pic]
(i-1) items n-i items
For example, as [pic], then
[pic]
In statistics, we can use H to find a set of uncorrelated random variables. Suppose [pic] are random variables with
[pic]
Let
[pic]
Then,
[pic]
since [pic] is an orthonormal set of vectors. That is, [pic] are uncorrelated random variables. Also,
[pic],
where
[pic].
[pic]
(b) Givens Matrices:
Let the orthogonal matrix be
[pic]
G is referred to as a Givens matrix of order 2. For a Givens matrix of order 3, there are [pic] different forms,
1 2 3 1 2 3
[pic].
The general form of a Givens matrix [pic] of order 3 is an identity matrix except for 4 elements, [pic] and [pic] are in the i’th and j’th rows and columns. Similarly, For a Givens matrix of order 4, there are [pic] different forms,
1 2 3 4 1 2 3 4
[pic]
1 2 3 4 1 2 3 4
[pic]
1 2 3 4 1 2 3 4
[pic].
For the Givens matrix of order n, here are [pic] different forms. The general form of [pic] is an identity matrix except for 4 elements,
[pic].
2.4 Positive definite matrices:
Definition of positive definite matrix:
A symmetric [pic] matrix A satisfying
[pic] for all [pic],
is referred to as a positive definite (p.d.) matrix.
Intuition:
If [pic] for all real numbers x, [pic], then the real number a is positive. Similarly, as x is a [pic] vector, A is a [pic] matrix and [pic], then the matrix A is “positive”.
Note:
A symmetric [pic] matrix A satisfying
[pic] for all [pic],
is referred to as a positive semidefinite (p.d.) matrix.
Example:
Let
[pic] and [pic].
Thus,
[pic]
Let [pic]. Then, A is positive semidefinite since for [pic]
[pic].
Section 3 Determinants
Calculation of Determinants:
There are several ways to obtain the determinants of a matrix. The determinant can be obtained:
(a) Using the definition of the determinant.
(b) Using the cofactor expansion of a matrix.
(c) Using the properties of the determinant.
3.1 Definition
Definition of permutation:
Let [pic] be the set of integers from 1 to n. A rearrangement [pic] of the elements of [pic] is called a permutation of [pic].
Example 1:
Let [pic]. Then, 123, 231, 312, 132, 213, 321 are 6 permutations of [pic].
Note: there are [pic] permutations of [pic].
Example 1 (continue):
[pic] no inversion. [pic] 1 inversion (32)
[pic] 1 inversion (21) [pic] 2 inversion (21, 31)
[pic] 2 inversion (31, 32) [pic] 3 inversion (21, 32, 31)
Definition of even and odd permutations:
When a total number of inversions of [pic] is even, then [pic] is called a even permutation. When a total number of inversions of [pic] is odd, then [pic] is called a odd permutation.
Definition of n-order determinant:
Let [pic] be an [pic] square matrix. We define the determinant of A (written as [pic] or [pic]) by[pic]
[pic],
where [pic] is a permutation of [pic].
As [pic] is a even permutation, then [pic]. As [pic] is a odd permutation, then [pic].
Note: [pic]. Any two of [pic] are not in the same row and also not in the same column.
Example:
[pic].
Then, there are 6 terms in the determinant of A,
[pic]
[pic]
[pic]
[pic]
[pic]
[pic]
[pic]
For instance,
[pic]
3.2 Cofactor expansion
Definition of cofactor:
Let [pic] be [pic] matrix. The cofactor of [pic] is defined as
[pic],
where [pic] is the [pic] submatrix of A by deleting the i’th row of
j’th column o.f A
Example:
Let
[pic]
Then,
[pic],
[pic],
[pic]
Thus,
[pic]
Important result:
Let [pic] be an [pic] matrix. Then,
[pic]
In addition,
[pic]
Example (continue):
[pic]
Thus,
[pic]
Also,
[pic]
In addition,
[pic]
Similarly,
[pic]
3.3 Properties of determinant
Let A be a [pic] matrix.
(a) [pic]
(b) If two rows (or columns) of A are equal, then [pic].
(c) If a row (or column) of A consists entirely of 0, then [pic]
Example:
Let [pic]. Then,
[pic]
[pic].
[pic]
(d) If B result from the matrix A by interchanging two rows (or columns) of A, then [pic].
(e) If B results from A by multiplying a row (or column) of A by a real number c, [pic] for some i, then [pic].
(f) If B results from A by adding [pic] to
[pic], i.e., [pic]
(or [pic]), then [pic]
Example:
Let
[pic]
Since B results from A by interchanging the first two rows of A,
[pic]
Example:
Let
[pic].
[pic],
since [pic]
Example:
Let
[pic].
[pic],
since [pic]
(g) If a matrix [pic] is upper triangular (or lower triangular), then
[pic].
(h) [pic]
If A is nonsingular, then [pic].
(i). [pic]
Example:
Let
[pic].
[pic]
Example:
Let
[pic].
Then,
[pic].
property (g)
Example:
Let
[pic].
[pic]
Thus,
[pic].
and
[pic].
Example:
Let
[pic]
[pic]
property (i)
Example:
Let
[pic] if [pic].
Compute (i) [pic] (ii) [pic].
[solution]
(i)
[pic]
(ii)
[pic]
(j) For [pic] square matrices P, Q, and X,
[pic]
and
[pic],
where I is an identity matrix.
Example:
Let
[pic]
Then,
[pic].
property (j)
Efficient method to compute determinant:
To calculate the determinant of a complex matrix A, a more efficient method is to
transform the matrix into a upper triangular matrix or a lower triangular matrix
via elementary row operations. Then, the determinant of A is the product of the
diagonal elements of the upper triangular matrix.
Example:
[pic]
[pic]
Note:
[pic] is not necessarily equal to [pic]. For example,
[pic].
3.4 Applications of determinant
(a) Inverse Matrix:
Definition of adjoint:
The [pic] matrix [pic], called the adjoint of A, is
[pic].
Important result:
[pic]
and
[pic]
Example (continue):
[pic]
and
[pic].
(b) Cramer’s Rule:
For linear system [pic], if [pic], then the system has the unique solution,
[pic],
where [pic] is the matrix obtained by replacing the i’th column of A by b.
Example:
Please solve for the following system of linear equations by Cramer’s rule,
[pic]
[solution:]
The coefficient matrix A and the vector b are
[pic],
respectively. Then,
[pic]
Thus, [pic].
Note:
Determinant plays a key role in the study of eigenvalues and eigenvectors which will be introduced later.
3.5 Diagonal expansion
Let
[pic].
Then,
[pic]
where [pic]. Note that
[pic].
Similarly,
[pic].
Then,
[pic]
where [pic]. Note that
[pic]
In the above two expansions, we can obtain the determinants of A+D by the following steps:
1. Expand the products of the diagonal elements of A+D,
[pic] or [pic]
2. Replace [pic] by [pic] or [pic] by [pic].
In general, denote [pic],
[pic]
Then, for
[pic],
the determinants of A+D by the following steps:
1. Expand the products of the diagonal elements of A+D,
[pic]
2. Replace [pic] by [pic].
Example:
For
[pic],
[pic]
Section 4 Inverse Matrix
4.1 Definition
Definition of inverse matrix:
An [pic] matrix A is called nonsingular or invertible if there exists an [pic] matrix B such that
[pic],
where [pic] is a [pic] identity matrix. The matrix B is called an inverse of A. If there exists no such matrix B, then A is called singular or noninvertible.
is called a odd permutation.
Theorem:
If A is an invertible matrix, then its inverse is unique.
[proof:]
Suppose B and C are inverses of A. Then,
[pic].
Note:
Since the inverse of a nonsingular matrix A is unique, we denoted the inverse of A as [pic].
Note:
If A is not a square matrix, then
• there might be more than one matrix L such that
[pic].
• there might be some matrix U such that
[pic]
Example:
Let
[pic].
Then,
• there are infinite number of matrices L such that [pic], for example
[pic] or [pic].
• As [pic],
[pic] but [pic].
4.2 Calculation of inverse matrix
1. Using Gauss-Jordan reduction:
The procedure for computing the inverse of a [pic] matrix A:
1. Form the [pic] augmented matrix
[pic]
and transform the augmented matrix to the matrix
[pic]
in reduced row echelon form via elementary row operations.
2. If
(a) [pic], then [pic].
(b) [pic], then [pic] is singular and [pic] does not exist.
Example:
To find the inverse of [pic], we can employ the procedure introduced above.
1.
[pic].
[pic] [pic]
[pic] [pic]
[pic] [pic]
[pic] [pic]
2. The inverse of A is
[pic].
Example:
Find the inverse of [pic]if it exists.
[solution:]
1. Form the augmented matrix
[pic].
And the transformed matrix in reduced row echelon form is
[pic]
2. The inverse of A is
[pic].
Example:
Find the inverse of [pic]if it exists.
[solution:]
1. Form the augmented matrix
[pic].
And the transformed matrix in reduced row echelon form is
[pic]
2. A is singular!!
2. Using the adjoint [pic] of a matrix:
As [pic], then
[pic].
Note:
[pic] is always true.
Note:
As [pic] [pic] A is nonsingular.
4.3 Properties of the inverse matrix
The inverse matrix of an [pic] nonsingular matrix A has the following important properties:
1. [pic].
2. [pic]
3. If A is symmetric, So is its inverse.
4. [pic]
5. If C is an invertible matrix, then
1. [pic]
2. [pic].
6. As [pic]exists, then
[pic].
[proof of 2]
[pic]
similarly,
[pic].
[proof of 3:]
By property 2,
[pic].
[proof of 4:]
[pic].
Similarly,
[pic].
[proof of 5:]
Multiplied by the inverse of C, then
[pic].
Similarly,
[pic].
[proof of 6:]
[pic]
[pic].
Multiplied by [pic] on both sides, we have
[pic].
[pic]
can be obtained by using similar procedure.
Example:
Prove that [pic].
[proof:]
[pic]
Similar procedure can be used to obtain
[pic]
4.4 Left and right inverses:
Definition of left inverse:
For a matrix A,
[pic],
with more than one such L. Then, the matrices L are called left inverse of A.
Definition of right inverse:
For a matrix A,
[pic],
with more than one such R. Then, the matrices R are called left inverse of A.
Theorem:
A [pic] matrix [pic] has left inverses only if [pic].
[proof:]
We prove that a contradictory result can be obtained as [pic]and [pic] having a left inverse. For [pic], let
[pic]
Then, suppose
[pic]
is the left inverse of [pic]. Then,
[pic].
Thus,
[pic]
Since [pic]and both M and X are square matrices, then [pic].
Therefore,
[pic].
However,
[pic].
It is contradictory. Therefore, as [pic], [pic] has no left inverse.
Theorem:
A [pic] matrix [pic] has right inverses only if [pic].
Section 5 Eigen-analysis
5.1 Definition:
Let A be an [pic] matrix. The real number [pic] is called an eigenvalue of A if there exists a nonzero vector x in [pic] such that
[pic].
The nonzero vector x is called an eigenvector of A associated with the eigenvalue [pic].
Example 1:
Let
[pic].
As [pic], then [pic].
Thus, [pic] is the eigenvector of A associated with the eigenvalue [pic].
Similarly,
As [pic], then [pic]
Thus, [pic] is the eigenvector of A associated with the eigenvalue [pic].
Note: Let x be the eigenvector of A associated with some eigenvalue [pic]. Then, [pic], [pic], is also the eigenvector of A associated with the same eigenvalue [pic] since
[pic].
5.2 Calculation of eigenvalues and eigenvectors:
Motivating Example:
Let
[pic].
Find the eigenvalues of A and their associated eigenvectors.
[solution:]
Let [pic] be the eigenvector associated with the eigenvalue [pic]. Then,
[pic].
Thus,
[pic] is the nonzero (nontrivial) solution of the homogeneous linear system [pic]. [pic] [pic] is singular [pic] [pic].
Therefore,
[pic]
[pic].
1. As [pic],
[pic].
[pic]
[pic]
2. As [pic],
[pic].
[pic]
[pic]
Note:
In the above example, the eigenvalues of A satisfy the following equation
[pic].
After finding the eigenvalues, we can further solve the associated homogeneous system to find the eigenvectors.
Definition of the characteristic polynomial:
Let [pic]. The determinant
[pic],
is called the characteristic polynomial of A.
[pic],
is called the characteristic equation of A.
Theorem:
A is singular if and only if 0 is an eigenvalue of A.
[proof:]
[pic].
A is singular [pic] [pic] has non-trivial solution [pic] There exists a nonzero vector x such that
[pic].
[pic] x is the eigenvector of A associated with eigenvalue 0.
[pic]
0 is an eigenvalue of A [pic] There exists a nonzero vector x such that
[pic].
[pic] The homogeneous system [pic] has nontrivial (nonzero) solution.
[pic] A is singular.
Theorem:
The eigenvalues of A are the real roots of the characteristic polynomial of A.
[pic]
Let [pic] be an eigenvalue of A associated with eigenvector u. Also, let [pic] be the characteristic polynomial of A. Then,
[pic] [pic] [pic] [pic] The homogeneous system has nontrivial (nonzero) solution x [pic] [pic] is singular [pic]
[pic].
[pic] [pic] is a real root of [pic].
[pic]
Let [pic] be a real root of [pic] [pic] [pic] [pic] [pic] is a singular matrix [pic] There exists a nonzero vector (nontrivial solution) v such that
[pic].
[pic] v is the eigenvector of A associated with the eigenvalue [pic].
◆
Procedure of finding the eigenvalues and eigenvectors of A:
1. Solve for the real roots of the characteristic equation [pic]. These real roots [pic]are the eigenvalues of A.
2. Solve for the homogeneous system [pic] or [pic], [pic]. The nontrivial (nonzero) solutions are the eigenvectors associated with the eigenvalues [pic].
Example:
Find the eigenvalues and eigenvectors of the matrix
[pic].
[solution:]
[pic]
[pic]and 10.
1. As [pic],
[pic].
[pic]
Thus,
[pic],
are the eigenvectors associated with eigenvalue [pic].
2. As [pic],
[pic].
[pic]
Thus,
[pic],
are the eigenvectors associated with eigenvalue [pic].
Example:
[pic].
Find the eigenvalues and the eigenvectors of A.
[solution:]
[pic]
[pic]and 6.
3. As [pic],
[pic].
[pic]
Thus,
[pic],
are the eigenvectors associated with eigenvalue [pic].
4. As [pic],
[pic].
[pic]
Thus,
[pic],
are the eigenvectors associated with eigenvalue [pic].
Note:
In the above example, there are at most 2 linearly independent eigenvectors [pic] and [pic] for [pic] matrix A.
The following theorem and corollary about the independence of the eigenvectors:
Theorem:
Let [pic] be the eigenvectors of a [pic] matrix A associated with distinct eigenvalues [pic], respectively, [pic]. Then, [pic] are linearly independent.
[proof:]
Assume [pic] are linearly dependent. Then, suppose the dimension of the vector space V generated by [pic] is [pic]
(i.e. the dimension of V[pic]the vector space generated by [pic]). There exists j linearly independent vectors of [pic] which also generate V. Without loss of generality, let [pic] be the j linearly independent vectors which generate V (i.e., [pic] is a basis of V). Thus,
[pic],
[pic] are some real numbers. Then,
[pic]
Also,
[pic]
Thus,
[pic].
Since [pic] are linearly independent,
[pic].
Futhermore,
[pic] are distinct, [pic]
[pic]
It is contradictory!!
Corollary:
If a [pic] matrix A has n distinct eigenvalues, then A has n linearly independent eigenvectors.
5.3 Properties of eigenvalues and eigenvectors:
(a)
Let [pic] be the eigenvector of [pic] associated with the eigenvalue [pic]. Then, the eigenvalue of
,
associated with the eigenvector [pic] is
[pic],
where [pic] are real numbers and [pic] is a positive integer.
[proof:]
[pic]
since
[pic].
Example:
[pic],
what is the eigenvalues of [pic].
[solution:]
The eigenvalues of A are -5 and 7. Thus, the eigenvalues of A are
[pic]
and
[pic].
Example:
Let [pic] be the eigenvalue of A. Then, we denote
[pic].
Then, [pic] has eigenvalue
[pic].
Note:
Let [pic] be the eigenvector of A associated with the eigenvalue [pic]. Then, [pic] is the eigenvector of [pic] associated with the eigenvalue [pic].
[proof:]
[pic].
Therefore, [pic] is the eigenvector of [pic] associated with the eigenvalue [pic].
(b)
Let [pic] be the eigenvalues of A ([pic] are not necessary to be distinct). Then,
[pic] and [pic].
[proof:]
[pic].
Thus,
[pic]
Therefore,
[pic].
Also, by diagonal expansion on the following determinant
[pic],
and by the expansion of
[pic],
therefore,
[pic].
Example:
[pic],
The eigenvalues of A are [pic] and [pic]. Then,
[pic]
and
[pic].
5.4 Diagonalization of a matrix
(a) Definition and procedure for diagonalization of a matrix
Definition:
A matrix A is diagonalizable if there exists a nonsingular matrix P and a diagonal matrix D such that
[pic].
Example:
Let
[pic].
Then,
[pic]
where
[pic].
Theorem:
An [pic] matrix A is diagonalizable if and only if it has n linearly independent eigenvector.
[proof:]
[pic]
A is diagonalizable. Then, there exists a nonsingular matrix P and a diagonal matrix
[pic],
such that
[pic].
Then,
[pic]
That is,
[pic]
are eigenvectors associated with the eigenvalues [pic].
Since P is nonsingular, thus [pic] are linearly independent.
[pic]
Let [pic] be n linearly independent eigenvectors of A associated with the eigenvalues [pic]. That is,
[pic]
Thus, let
[pic]
and
[pic].
Since[pic],
[pic].
Thus,
[pic],
[pic] exists because [pic] are linearly independent and thus P is nonsingular.
Important result:
An [pic] matrix A is diagonalizable if all the roots of its characteristic equation are real and distinct.
Example:
Let
[pic].
Find the nonsingular matrix P and the diagonal matrix D such that
[pic]
and find [pic] is any positive integer.
[solution:]
We need to find the eigenvalues and eigenvectors of A first. The characteristic equation of A is
[pic].
[pic].
By the above important result, A is diagonalizable. Then,
1. As [pic],
[pic]
2. As [pic],
[pic]
Thus,
[pic] and [pic]
are two linearly independent eigenvectors of A.
Let
[pic] and [pic].
Then, by the above theorem,
[pic].
To find [pic],
[pic]
n times
Multiplied by [pic] and [pic] on the both sides,
[pic]
Note:
For any [pic] diagonalizable matrix A, [pic] then
[pic]
where
[pic].
Example:
Is [pic] diagonalizable?
[solution:]
[pic].
Then, [pic].
As [pic]
[pic]
Therefore, all the eigenvectors are spanned by [pic]. There does not exist two linearly independent eigenvectors. By the previous theorem, A is not diagonalizable.
Note:
An [pic] matrix may fail to be diagonalizable since
• Not all roots of its characteristic equation are real numbers.
• It does not have n linearly independent eigenvectors.
Note:
The set [pic] consisting of both all eigenvectors of an [pic] matrix A associated with eigenvalue [pic] and zero vector 0 is a subspace of [pic]. [pic] is called the eigenspace associated with [pic].
5.5 Diagonalization of symmetric matrix
Theorem:
If A is an [pic] symmetric matrix, then the eigenvectors of A associated with distinct eigenvalues are orthogonal.
[proof:]
Let [pic] and [pic] be eigenvectors of A associated with distinct eigenvalues [pic] and [pic], respectively, i.e.,
[pic] and [pic].
Thus,
[pic]
and
[pic].
Therefore,
[pic].
Since [pic], [pic].
Example:
Let
[pic].
A is a symmetric matrix. The characteristic equation is
[pic].
The eigenvalues of A are [pic]. The eigenvectors associated with these eigenvalues are
[pic].
Thus,
[pic] are orthogonal.
Very Important Result:
If A is an [pic] symmetric matrix, then there exists an orthogonal matrix P such that
[pic],
where [pic] are n linearly independent eigenvectors of A and the diagonal elements of D are the eigenvalues of A associated with these eigenvectors.
Example:
Let
[pic].
Please find an orthogonal matrix P and a diagonal matrix D such that [pic].
[solution:]
We need to find the orthonormal eigenvectors of A and the associated eigenvalues first. The characteristic equation is
[pic].
Thus, [pic]
1. As [pic]solve for the homogeneous system
[pic].
The eigenvectors are
[pic]
[pic] and [pic] are two eigenvectors of A. However, the two eigenvectors are not orthogonal. We can obtain two orthonormal eigenvectors via Gram-Schmidt process. The orthogonal eigenvectors are
[pic].
Standardizing these two eigenvectors results in
[pic].
2. As [pic]solve for the homogeneous system
[pic].
The eigenvectors are
[pic].
[pic] is an eigenvectors of A. Standardizing the eigenvector results in
[pic].
Thus,
[pic],
[pic],
and [pic].
Note:
For a set of vectors [pic], we can find a set of orthogonal vectors [pic] via Gram-Schmidt process:
[pic]
Section 6 Applications
1. Differential Operators
Definition of differential operator:
Let
[pic]
Then,
[pic]
Example 1:
Let
[pic]
Then,
[pic]
Example 2:
Let
[pic]
Then,
[pic]
Note:
In example 2,
[pic],
where
[pic].
Then,
[pic].
Theorem:
[pic]
Theorem:
Let A be an [pic] matrix and x be a [pic] vector. Then,
[pic]
[proof:]
[pic]
Then, the k’th element of [pic] is
[pic][pic]
while the k’th element of [pic] is
[pic].
Therefore,
[pic].
Corollary:
Let A be an [pic] symmetric matrix, Then,
[pic].
Example 3:
[pic].
Then,
[pic]
[pic]
Example 3:
For standard linear regression model
[pic]
The least square estimate b is the minimizer of
[pic].
To find b, we need to solve
[pic].
Thus,
[pic]
Theorem:
[pic]
Then,
[pic]
where
[pic]
Note:
Let [pic] be a function of x. Then,
[pic].
Example 4:
Let
[pic],
where X is an [pic] matrix, I is an [pic] identity matrix, and [pic] is a constant. Then,
[pic]
2. Vectors of random variable
In this section, the following topics will be discussed:
• Expectation and covariance of vectors of random variables
• Mean and variance of quadratic forms
• Independence of random variables and chi-square distribution
Expectation and covariance
Let [pic] be random variables. Let
[pic]
be the random matrix.
Definition:
[pic].
Let [pic] and [pic] be the [pic] and [pic] random vectors, respectively. The covariance matrix is
[pic]
and the variance matrix is
[pic]
Theorem:
[pic] are two matrices, then
[pic].
[proof:]
Let
[pic]
where
[pic]
Thus,
[pic]
Let
[pic]
where
[pic]
Thus,
[pic]
Since [pic] for every [pic], then [pic].
Results:
• [pic]
• [pic]
Mean and variance of quadratic Forms
Theorem:
Let [pic] be an [pic] vector of random variables and [pic] be an [pic] symmetric matrix. If [pic] and
[pic]. Then,
[pic],
where [pic] is the sum of the diagonal elements of the matrix M.
[proof:]
[pic]
Then,
[pic]
On the other hand,
[pic]
Then,
[pic]
Thus,
[pic]
Theorem:
[pic],
where [pic] and [pic].
Note:
For a random variable X, [pic] and [pic]. Then
[pic].
Corollary:
If [pic] are independently normally distributed and have common variance [pic]. Then
[pic].
Theorem:
If [pic] are independently normally distributed and have common variance [pic]. Then
[pic].
Independence of random variables and chi-square distribution
Definition of Independence:
Let [pic] and [pic] be the [pic] and [pic] random vectors, respectively. Let [pic] and [pic] be the density functions of X and Y, respectively. Two random vectors X and Y are said to be (statistically) independent if the joint density function
[pic]
Chi-Square Distribution:
[pic] has the density function
[pic],
where [pic] is gamma function. Then, the moment generating function is
[pic]
and the cumulant generating function is
[pic].
Thus,
[pic]
and
[pic]
Theorem:
If [pic] and [pic] is statistically independent of [pic]. Then, [pic].
[proof:]
[pic]
Thus,
[pic]
Therefore, [pic].
3. Multivariate normal distribution
In this chapter, the following topics will be discussed:
• Definition
• Moment generating function and independence of normal variables
• Quadratic forms in normal variable
Definition
Intuition:
Let [pic]. Then, the density function is
[pic]
Definition (Multivariate Normal Random Variable):
A random vector
[pic]
with [pic] has the density function
[pic]
Theorem:
[pic]
[proof:]
Since [pic] is positive definite, [pic], where [pic] is a real orthogonal matrix ([pic]) and [pic]. Then,
[pic]. Thus,
[pic]
where [pic]. Further,
[pic]
Therefore, if we can prove [pic] and [pic] are mutually independent, then
[pic].
The joint density function of [pic] is
[pic],
where
[pic]
Therefore, the density function of [pic]
[pic]
Therefore, [pic] and [pic] are mutually independent.
Moment generating function and independence of normal random variables
Moment Generating Function of Multivariate Normal Random Variable:
Let
[pic].
Then, the moment generating function for Y is
[pic]
Theorem:
If [pic] and C is a [pic] matrix of rank p, then
[pic].
[proof:]
Let [pic]. Then,
[pic]
Since [pic] is the moment generating function of [pic],
[pic]. ◆
Corollary:
If [pic] then
[pic],
where T is an orthogonal matrix.
Theorem:
If [pic], then the marginal distribution of subset of the elements of Y is also multivariate normal.
[pic]
[pic], then [pic], where
[pic]
Theorem:
Y has a multivariate normal distribution if and only if [pic] is univariate normal for all real vectors a.
[proof:]
[pic]
Suppose [pic]. [pic] is univariate normal. Also,
[pic].
Then, [pic]. Since
[pic]
Since
[pic],
is the moment generating function of [pic], thus Y has a multivariate distribution [pic].
[pic]
By the previous theorem. ◆
Quadratic form in normal variables
Theorem:
If [pic] and let P be an [pic] symmetric matrix of rank r. Then,
[pic]
is distributed as [pic] if and only if [pic] (i.e., P is idempotent).
[proof]
[pic]
Suppose [pic] and [pic]. Then, P has r eigenvalues equal to 1 and [pic] eigenvalues equal to 0. Thus, without loss generalization,
[pic]where T is an orthogonal matrix. Then,
[pic]
Since [pic] and [pic], thus
[pic].
[pic] are i.i.d. normal random variables with common variance [pic]. Therefore,
[pic]
[pic]
Since P is symmetric, [pic], where T is an orthogonal matrix and [pic] is a diagonal matrix with elements [pic]. Thus, let [pic]. Since [pic],
[pic].
That is, [pic] are independent normal random variable with variance [pic]. Then,
[pic]
The moment generating function of [pic] is
[pic][pic]
Also, since Q is distributed as [pic], the moment generating function is also equal to [pic]. Thus, for every t,
[pic]
Further,
[pic].
By the uniqueness of polynomial roots, we must have [pic]. Then, [pic] by the following result:
a matrix P is symmetric, then P is idempotent and rank r if and only if it has r eigenvalues equal to 1 and n-r eigenvalues equal to 0. ◆
Important Result:
Let [pic] and let [pic] and [pic]be both distributed as chi-square. Then, [pic] and [pic] are independent if and only if [pic].
Useful Lemma:
If [pic], [pic] and [pic] is semi-positive definite, then
• [pic]
• [pic] is idempotent.
Theorem:
If [pic] and let
[pic]
If [pic], then [pic] and [pic] are independent and [pic].
[proof:]
We first prove [pic]. [pic], thus
[pic]
Since [pic], [pic] is any vector in [pic]. Therefore, [pic] is semidefinite. By the above useful lemma, [pic] is idempotent. Further, by the previous theorem,
[pic]
since
[pic]
We now prove [pic] and [pic] are independent. Since
[pic]
By the previous important result, the proof is complete. ◆
6.4 Linear regression
Let
[pic].
Denote
[pic].
In linear algebra,
[pic]
is the linear combination of the column vector of [pic]. That is,
[pic].
Then,
[pic]
Least square method is to find the appropriate [pic] such that the distance between [pic] and [pic] is smaller than the one between [pic] and the other linear combination of the column vectors of [pic], for example, [pic]. Intuitively, [pic] is the information provided by covariates [pic] to interpret the response [pic]. Thus, [pic] is the information which interprets [pic] most accurately. Further,
[pic]
If we choose the estimate [pic] of [pic] such that [pic] is orthogonal every vector in [pic], then [pic]. Thus,
[pic].
That is, if we choose [pic] satisfying [pic], then
[pic]
and for any other estimate [pic] of [pic],
[pic].
Thus, [pic] satisfying [pic] is the least square estimate. Therefore,
[pic]
Since
[pic],
[pic] is called the projection matrix or hat matrix. [pic] projects the response vector [pic] on the space spanned by the covariate vectors. The vector of residuals is
[pic].
We have the following two important theorems.
Theorem:
1. [pic] and [pic] are idempotent.
2. [pic]
3. [pic]
4.
[pic]
[proof:]
1.
[pic]
and
[pic].
2.
Since [pic] is idempotent, [pic]. Thus,
[pic]
Similarly,
[pic]
3.
[pic]
4.
[pic]
Thus,
[pic]
Therefore,
[pic]
Theorem:
If [pic] where [pic] is a [pic] matrix of rank [pic]. Then,
1. [pic]
2. [pic]
3. [pic]
4. [pic] is independent of
[pic].
[proof:]
1.
Since for a normal random variable [pic],
[pic]
thus for [pic]
[pic]
2.
[pic]
Thus,
[pic].
3.
[pic] and [pic], thus
[pic]Since
[pic],
[pic].
4.
Let
[pic]
where
[pic]
and
[pic]
Since
[pic]
and by the previous result,
[pic],
therefore,
[pic]
is independent of
[pic].
[pic]
6.5 Principal component analysis
Definition:
Suppose the data [pic] generated by the random variable [pic]. Suppose the covariance matrix of Z is
[pic]
Let [pic] [pic] the linear combination of [pic]. Then,
[pic]
and
[pic],
where [pic].
The principal components are those uncorrelated linear combinations [pic] whose variance [pic] are as large as possible, where [pic] are [pic] vectors.
|The procedure to obtain the principal components is as follows: |
| |
|First principal component [pic] linear combination [pic] that maximizes |
|[pic] |
|subject to [pic]and [pic][pic] for any [pic] |
|Second principal component [pic] linear combination [pic] that maximizes [pic] subject to [pic], [pic] and [pic].[pic] [pic] |
|maximize [pic] and is also uncorrelated to the first principal component. |
|[pic] |
|[pic] |
|At the i’th step, |
|i’th principal component [pic] linear combination [pic] that maximizes |
|[pic] subject to [pic], [pic] and [pic].[pic] [pic] maximize [pic] and is also uncorrelated to the first (i-1) principal |
|component. |
Intuitively, these principal components with large variance contain “important” information. On the other hand, those principal components with small variance might be “redundant”. For example, suppose we have 4 variables, [pic] and [pic]. Let [pic] and [pic]. Also, suppose [pic] are mutually uncorrelated. Thus, among these 4 variables, only 3 of them are required since two of them are the same. As using the procedure to obtain the principal components above, then the first principal component is
[pic],
the second principal component is
[pic],
the third principal component is
, [pic]
and the fourth principal component is
[pic].
Therefore, the fourth principal component is redundant. That is, only 3 “important” pieces of information hidden in [pic] and [pic].
Theorem:
[pic] are the eigenvectors of [pic] corresponding to eigenvalues
[pic]. In addition, the variance of the principal components are the eigenvalues [pic]. That is [pic].
[justification:]
Since [pic] is symmetric and nonsigular, [pic], where P is an orthonormal matrix, [pic] is a diagonal matrix with diagonal elements [pic], the i’th column of P is the orthonormal vector [pic] ([pic] and [pic] is the eigenvalue of [pic] corresponding to [pic]. Thus,
[pic].
For any unit vector [pic] ([pic]
is a basis of [pic]),
[pic][pic], [pic],
[pic],
and
[pic].
Thus, [pic] is the first principal component and [pic].
Similarly, for any vector c satisfying [pic], then
[pic]
where [pic] and . [pic]. Then,
[pic]
and
[pic].
Thus, [pic] is the second principal component and [pic].
The other principal components can be justified similarly.
Estimation:
The above principal components are the theoretical principal components. To find the “estimated” principal components, we estimate the theoretical variance-covariance matrix [pic] by the sample variance-covariance [pic],
[pic],
where
[pic][pic],
and where [pic]. Then, suppose [pic] are orthonormal eigenvectors of [pic] corresponding to the eigenvalues [pic]. Thus, the i’th estimated principal component is [pic] and the estimated variance of the i’th estimated principal component is [pic].
6.6 Discriminant Analysis:
Suppose we have two populations. Let [pic] be the [pic] observations from population 1 and let [pic] be [pic] observations from population2. Note that [pic],[pic] are [pic] vectors. The Fisher’s discriminant method is to project these [pic] vectors to the real values via a linear function [pic] and try to separate the two populations as much as possible, where a is some [pic] vector.
|Fisher’s discriminant method is as follows: |
|Find the vector [pic] maximizing the separation function [pic], |
|[pic], |
|where [pic] and |
|[pic] |
Intuition of Fisher’s discriminant method:
[pic] [pic] [pic]
[pic] [pic] [pic]
R [pic] [pic]
As far as possible by finding [pic]
Intuitively, [pic] measures the difference between the transformed means [pic] relative to the sample standard deviation [pic]. If the transformed observations [pic] and [pic] are completely separated, [pic] should be large as the random variation of the transformed data reflected by [pic] is also considered.
Important result:
The vector [pic] maximizing the separation [pic] is
[pic]
, where
[pic],
[pic],
and where [pic]
[pic] and [pic].
Justification:
[pic].
Similarly, [pic].
Also,
[pic]
[pic].
Similarly,
[pic]
Thus,
[pic]
[pic]
[pic]
[pic]
Thus,
[pic]
[pic] can be found by solving the equation based on the first derivative of [pic],
[pic]
Further simplification gives
[pic].
Multiplied by the inverse of the matrix [pic] on the two sides gives
[pic],[pic]
Since [pic] is a real number,
[pic].
-----------------------
[pic]
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- 2 5 tangent normal and binormal vectors
- lecture 25 principal normal and curvature
- curvature and normal vectors
- 2 4 tangent normal and binormal vectors
- cosmosworks centrifugal loads
- phe 01 elementary mechanics 2 credits
- minimal surfaces and holomorphic functions
- appendix i researchgate
- researchgate
- motion and forces test review humble independent school
Related searches
- 4 basic concepts of development
- basic concepts of information systems
- development of basic concepts list
- the basic concepts of information systems
- describe the basic concepts of information systems
- basic concepts of education
- basic concepts and types of information systems
- basic concepts of information system
- basic concepts of evolution
- basic concepts of chemistry pdf
- basic math concepts and fundamentals
- basic concepts of mathematics pdf