Lecture 1. Random vectors and multivariate normal distribution

Lecture 1. Random vectors and multivariate normal distribution

1.1 Moments of random vector

A random vector X of size p is a column vector consisting of p random variables X1, . . . , Xp

and is X = (X1, . . . , Xp) . The mean or expectation of X is defined by the vector of

expectations,

E(X1)

? E(X) =

...

,

E(Xp)

which exists if E|Xi| < for all i = 1, . . . , p.

Lemma 1. Let X be a random vector of size p and Y be a random vector of size q. For any non-random matrices A(m?p), B(m?q), C(1?n), and D(m?n),

E(AX + BY ) = AE(X) + BE(Y ),

E(AXC + D) = AE(X)C + D.

For a random vector X of size p satisfying E(Xi2) < for all i = 1, . . . , p, the variance? covariance matrix (or just covariance matrix) of X is

Cov(X) = E[(X - EX)(X - EX) ].

The covariance matrix of X is a p ? p square, symmetric matrix. In particular, ij = Cov(Xi, Xj) = Cov(Xj, Xi) = ji.

Some properties:

1. Cov(X) = E(XX ) - E(X)E(X) .

2. If c = c(p?1) is a constant, Cov(X + c) = Cov(X). 3. If A(m?p) is a constant, Cov(AX) = ACov(X)A . Lemma 2. The p ? p matrix is a covariance matrix if and only if it is non-negative definite.

1.2 Multivariate normal distribution - nonsingular case

Recall that the univariate normal distribution with mean ? and variance 2 has density

f (x)

=

(2

2)-

1 2

1 exp[- (x

-

?)-2(x

-

?)].

2

Similarly, the multivariate normal distribution for the special case of nonsingular covariance matrix is defined as follows.

Definition 1. Let ? Rp and (p?p) > 0. A random vector X Rp has p-variate normal distribution with mean ? and covariance matrix if it has probability density function

f (x)

=

|2

|-

1 2

exp

- 1 (x - ?) -1(x - ?)

,

(1)

2

for x Rp. We use the notation X Np(?, ). Theorem 3. If X Np(?, ) for > 0, then

1.

Y

=

-

1 2

(X

-

?)

Np(0,

Ip),

2.

X

=L

1 2

Y

+ ? where Y

Np(0, Ip),

3. E(X) = ? and Cov(X) = ,

4. for any fixed v Rp , v X is univariate normal. 5. U = (X - ?) -1(X - ?) 2(p).

Example 1 (Bivariate normal).

1.2.1 Geometry of multivariate normal

The multivariate normal distribution has location parameter ? and the shape parameter > 0. In particular, let's look into the contour of equal density

Ec = {x Rp : f (x) = c0} = {x Rp : (x - ?) -1(x - ?) = c2}.

Moreover, consider the spectral decomposition of = UU where U = [u1, . . . , up] and = diag(1, . . . , p) with 1 2 . . . p > 0. The Ec, for any c > 0, is an ellipsoid centered around ? with principal axes ui of length proportional to i. If = Ip, the ellipsoid is the surface of a sphere of radius c centered at ?.

As an example, consider a bivariate normal distribution N2(0, ) with

=

2 1

1 2

=

cos(/4) sin(/4)

- sin(/4) cos(/4)

30 01

cos(/4) sin(/4)

- sin(/4) cos(/4)

.

The location of the distribution is the origin (? = 0), and the shape () of the distribution is determined by the ellipse given by the two principal axes (one at 45 degree line, the other at -45 degree line). Figure 1 shows the density function and the corresponding Ec for c = 0.5, 1, 1.5, 2, . . ..

2

Figure 1: Bivariate normal density and its contours. Notice that an ellipses in the plane can represent a bivariate normal distribution. In higher dimensions d > 2, ellipsoids play the similar role.

1.3 General multivariate normal distribution

The characteristic function of a random vector X is defined as X (t) = E(eit X ), for t Rp.

Note that the characteristic function is C-valued, and always exists. We collect some important facts.

1. X(t) = Y (t) if and only if X =L Y . 2. If X and Y are independent, then X+Y (t) = X(t)Y (t). 3. Xn X if and only if Xn(t) X(t) for all t.

An important corollary follows from the uniqueness of the characteristic function. Corollary 4 (Cramer?Wold device). If X is a p ? 1 random vector then its distribution is uniquely determined by the distributions of linear functions of t X, for every t Rp.

Corollary 4 paves the way to the definition of (general) multivariate normal distribution. Definition 2. A random vector X Rp has a multivariate normal distribution if t X is an univariate normal for all t Rp.

The definition says that X is MVN if every projection of X onto a 1-dimensional subspace is normal, with a convention that a degenerate distribution c has a normal distribution with variance 0, i.e., c N (c, 0). The definition does not require that Cov(X) is nonsingular.

3

Theorem 5. The characteristic function of a multivariate normal distribution with mean ? and covariance matrix 0 is, for t Rp,

1 (t) = exp[it ? - t t].

2

If > 0, then the pdf exists and is the same as (1).

In the following, the notation X N (?, ) is valid for a non-negative definite . However, whenever -1 appears in the statement, is assumed to be positive definite.

Proposition 6. If X Np(?, ) and Y = AX + b for A(q?p) and b(q?1), then Y Nq(A? + b, AA ).

Next two results are concerning independence and conditional distributions of normal random vectors. Let X1 and X2 be the partition of X whose dimensions are r and s, r + s = p, and suppose ? and are partitioned accordingly. That is,

X=

X1 X2

Np

?1 ?2

,

11 21

12 22

.

Proposition 7. The normal random vectors X1 and X2 are independent if and only if Cov(X1, X2) = 12 = 0.

Proposition 8. The conditional distribution of X1 given X2 = x2 is

Nr(?1 + 12-221(x2 - ?2), 11 - 12-22121)

Proof.

Consider

new

random

vectors

X

1

=

X1

-

12-221X 2

and

X

2

= X2,

X =

X

1

X

2

= AX,

A=

Ir 0(s?r)

-12-221 Is

.

By Proposition 6, X is multivariate normal. An inspection of the covariance matrix of X

leads

that

X

1

and

X

2

are

independent.

The

result

follows

by

writing

X1

=

X

1

+

12 -221 X 2 ,

and

that

the

distribution

(law)

of

X1

given

X2

=

x2

is

L(X 1

|

X2

=

x2)

=

L(X

1

+

12-221X 2

|

X2

=

x2)

=

L(X

1

+

12-221x2

|

X2

=

x2),

which

is

a

MVN

of

dimension

r.

4

1.4 Multivariate Central Limit Theorem

If X1, X2, . . . Rp are i.i.d. with E(Xi) = ? and Cov(X) = , then

n

n-

1 2

(Xj - ?) Np(0, )

j=1

as n ,

or equivalently,

n

1 2

(X? n

-

?)

Np(0,

)

as n ,

where

X? n

=

1 2

n j=1

X

j

.

The delta-method can be used for asymptotic normality of h(X? n) for some function h : Rp R. In particular, denote h(x) for the gradient of h at x. Using the first two

terms of Taylor series,

h(X? n) = h(?) + (h(?)) (X? n - ?) + Op( X? n - ? 22),

Then Slutsky's theorem gives the result, n(h(X? n) - h(?)) = (h(?)) n(X? n - ?) + Op(n(X? n - ?) (X? n - ?)) (h(?)) Np(0, ) as n , = Np(0, (h(?)) (h(?)))

1.5 Quadratic forms in normal random vectors

Let X Np(?, ). A quadratic form in X is a random variable of the form

pp

Y = X AX =

Xi aij Xj ,

i=1 j=1

where A is a p ? p symmetric matrix and Xi is the ith element of X. We are interested in the distribution of quadratic forms and the conditions under which two quadratic forms are independent.

Example 2. A special case: If X Np(0, Ip) and A = Ip,

p

Y = X AX = X X = Xi2 2(p).

i=1

Fact 1. Recall the following:

1. A p ? p matrix A is idempotent if A2 = A. 2. If A is symmetric, then A = , where = diag(i) and is orthogonal. 3. If A is symmetric idempotent,

5

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download