Estimating an ARMA Process - Statistics Department

Statistics 910, #12

1

Estimating an ARMA Process

Overview

1. Main ideas 2. Fitting autoregressions 3. Fitting with moving average components 4. Standard errors 5. Examples 6. Appendix: Simple estimators for autoregressions

Main ideas

Efficiency Maximum likelihood is nice, if you know the right distribution. For time series, its more motivation for least squares. If the data are Gaussian, then ML is efficient; if the data are not Gaussian, then ML amounts to complicated weighted least squares.

Gaussian log-likelihood for the stationary process {Xt} that generates X = (X1, . . . , Xn) is (minus twice the log of the likelihood)

-2 (?, , , 2) = n log 2 + log |n| + (X - ?) -n 1(X - ?) . (1)

Think of the covariances (k) as functions of the parameters of the

process,

(k) = 2g(k; , ) .

(2)

To find the maximum likelihood estimates of ?, , and for an ARMA(p, q) process is "simply a numerical minimization" of the negative log likelihood.

"All you need to do" is express the covariances in (1) as functions of

the unknown parameters. For example, for the AR(1) process Xt = 1Xt-1 + wt with ? = 0 (given), (0) = 2/(1 - 21), and (h) = |h|(0).

Statistics 910, #12

2

Recursion The models we consider are causal, with time "flowing" in one direction. Hence, it is useful to decompose the joint distribution of X in the log-likelihood (1) as a sequence of one-sided conditional distributions:

f (x1, . . . , xn) = f (x1)f (x2|x1) f (x3|x1, x2) ? ? ? f (xn|x1, . . . , xn-1) .

MLEs for AR(1) It's useful to solve for the MLE in closed form for the simplest of models. The log-likelihood (4) simplifies for a Gaussian AR(1) process:

(, 2) = log f (X1)f (X2|X1) ? ? ? f (Xn|Xn-1)

2 = log N (0, 1 - 2 )f (X2|X1) ? ? ? f (Xn|Xn-1)

=

n -

log(22)

+

1

log(1

-

2)

-

2

2

n

X12(1 - 2) + (Xt - Xt-1)2

/22 .

t=2

SS

The derivatives that give the MLE's for 2 and are:

n SS

2 = - 22 + 24

and

= X12 +

(Xt - 2

Xt-1)Xt-1

-

1

- 2

where SS denotes the sum of squares from the exponential term in the likelihood. Setting these to zero and solving gives:

^2 = SS/n

and

^

^2 1 - ^2

- X12

n

= (Xt - ^Xt-1)Xt-1 .

t=2

Since the l.h.s. of this expression has approximate expected value zero

(note the expression for the variance of Xt, the MLE can be seen to be quite similar to the usual LS estimator.

Statistics 910, #12

3

Initial values Again thinking recursively, the likelihood looks a lot like that in the normal linear model if we only knew enough to get started. If we condition on X1, X2, . . . , Xp and w1, w2, . . . , wq (i.e., assume we know these), then

-2 log f (xnp+1|xp1, w1q) = c + (n - p) log 2

n

+

(wt = xt - 1xt-1 - ? ? ? - pxt-p - 1wt-1 - ? ? ? - qwt-q)2

t=p+1

by the change of variables from xt to wt, as in the normal linear model. This expression becomes amenable to least squares. This approach is called "conditional least squares" in R.

What does this leave out? Conditioning on the initial values does not leave out too much, really. We still

? Need values for w1, . . . , wq, and ? Would like to gain further information from X1, . . . , Xp.

Autoregressions

Why start with autoregressions? Several reasons:

? These often fit quite well (don't need the MA terms) because we know that we can approximate Xt = jXt-j. This 'Markovian' approach has become more popular because of its speed.

? Estimation is fast (MLEs require some iteration).

? Avoid estimation of initial wts. ? Sampling properties are well-known (essentially those of normal

linear model with stochastic explanatory variables).

Backwards Even if we don't want the AR model itself, these are often

used to estimate the initial errors, w1, w2,, . . . , wq. By fitting an

autoregression backwards in time, we can use the fit to estimate say,

w^t(m) = Xt -

m j=1

^j

Xt+j

(if

we

assume

normality,

the

process

is

reversible).

Statistics 910, #12

4

MLE for autoregression In the AR(p) case,

f (x1, . . . , xn) = f (x1, x2, . . . , xp) f (xp+1|xp1) ? ? ? f (xn|xnn--p1)

messy

simple

e-

1 2

Pn

p+1

wt2 / 2

= f (x1, x2, . . . , xp) (22)-(n-p)/2 .

where wt = (xt - ?) - j j(xt-j - ?) for t = p + 1, . . . , n. But for the initial terms (the "messy" part), we have the same likelihood as

in the normal linear model, and the MLEs are those that we would

get from the least squares regression of xt on xt-1, . . . , xt-p and a constant. (If we call the constant in that regression 0, then ?^ = ^0/(1-^1-? ? ?-^p). But for ignoring the contribution from x1, . . . , xp, least squares matches maximum likelihood in the AR(p) case. Hence,

maximum likelihood cannot improve the estimates much unless p is

large relative to n.

Recursion = triangular factorization A recursion captures the full likelihood. For an AR(p) model with coefficients p = (p1, p2, . . . pp) express the lower-order coefficients as functions of p (e.g., find (0) and 11 = Corr(Xt, Xt-1) in terms of p). If we can do that, it is simple to model

f (x1, . . . , xp) = f (x1)f (x2|x1) ? ? ? f (xp|xp1-1) .

The prior AR(1) example shows this.

In general, use the Levinson recursion to obtain a triangular decomposition of the covariance matrix n. This is done by converting the correlated variables X1, . . . , Xn into a collection, say U1, U2, . . ., Un of uncorrelated variables. One has many ways of doing this, such as the Gram-Schmidt or Cholesky factorization. In the following, let Pj denote the projection onto the random variables in Xj (as in fitting a regression).

Following the Cholesky factorization, construct

U1 = X1 U2 = X2 - P1X2 = X2 - 1,1X1

Statistics 910, #12

5

U3 = X3 - P12X3 = X3 - 2,2X1 - 2,1X2 U4 = X4 - P123X4 = X4 - 3,3X1 - 3,2X2 - 3,1X3

j-1

Uj = Xj - j-1,kXj-k

k=1

(This sequence of projections differs from those used in the numerically superior modified Gram-Schmidt method. GS sweeps X1 from all of the others first, filling the first column of L rather than recursively one row at a time.) Let L denote a lower triangular matrix that begins

1

0

0 0 0 ...

-11

1

0 0 0 ...

L

=

-22

-21

1

0

0

...

(3)

-33

-32

-31

1

0

...

...

...

...

with diagonal elements Lkk = 1 and off-diagonal elements Lkj = -k-1,k-j, j = 1, . . . , k - 1. Since the Uj are uncorrelated, we have

Dn = Var(U = LX) = LnL -n 1 = L -1Dn-1L-1 , (4)

where Dn is the diagonal matrix with the conditional variances

k

k2 = Var Xt - kjXt-j

j=1

along the diagonal.

Comments

? It follows that 2 = limn |n+1|/|n|. (Where have we seen this ratio of determinants before? Toeplitz matrices, Szeg?o's theorem, and Levinson's recursion itself.)

? If the process is indeed AR(p), the lower triangular matrix L is banded, with p subdiagonal stripes. The element Lkj = 0 for j < k - p.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download