Estimating an ARMA Process - Statistics Department
Statistics 910, #12
1
Estimating an ARMA Process
Overview
1. Main ideas 2. Fitting autoregressions 3. Fitting with moving average components 4. Standard errors 5. Examples 6. Appendix: Simple estimators for autoregressions
Main ideas
Efficiency Maximum likelihood is nice, if you know the right distribution. For time series, its more motivation for least squares. If the data are Gaussian, then ML is efficient; if the data are not Gaussian, then ML amounts to complicated weighted least squares.
Gaussian log-likelihood for the stationary process {Xt} that generates X = (X1, . . . , Xn) is (minus twice the log of the likelihood)
-2 (?, , , 2) = n log 2 + log |n| + (X - ?) -n 1(X - ?) . (1)
Think of the covariances (k) as functions of the parameters of the
process,
(k) = 2g(k; , ) .
(2)
To find the maximum likelihood estimates of ?, , and for an ARMA(p, q) process is "simply a numerical minimization" of the negative log likelihood.
"All you need to do" is express the covariances in (1) as functions of
the unknown parameters. For example, for the AR(1) process Xt = 1Xt-1 + wt with ? = 0 (given), (0) = 2/(1 - 21), and (h) = |h|(0).
Statistics 910, #12
2
Recursion The models we consider are causal, with time "flowing" in one direction. Hence, it is useful to decompose the joint distribution of X in the log-likelihood (1) as a sequence of one-sided conditional distributions:
f (x1, . . . , xn) = f (x1)f (x2|x1) f (x3|x1, x2) ? ? ? f (xn|x1, . . . , xn-1) .
MLEs for AR(1) It's useful to solve for the MLE in closed form for the simplest of models. The log-likelihood (4) simplifies for a Gaussian AR(1) process:
(, 2) = log f (X1)f (X2|X1) ? ? ? f (Xn|Xn-1)
2 = log N (0, 1 - 2 )f (X2|X1) ? ? ? f (Xn|Xn-1)
=
n -
log(22)
+
1
log(1
-
2)
-
2
2
n
X12(1 - 2) + (Xt - Xt-1)2
/22 .
t=2
SS
The derivatives that give the MLE's for 2 and are:
n SS
2 = - 22 + 24
and
= X12 +
(Xt - 2
Xt-1)Xt-1
-
1
- 2
where SS denotes the sum of squares from the exponential term in the likelihood. Setting these to zero and solving gives:
^2 = SS/n
and
^
^2 1 - ^2
- X12
n
= (Xt - ^Xt-1)Xt-1 .
t=2
Since the l.h.s. of this expression has approximate expected value zero
(note the expression for the variance of Xt, the MLE can be seen to be quite similar to the usual LS estimator.
Statistics 910, #12
3
Initial values Again thinking recursively, the likelihood looks a lot like that in the normal linear model if we only knew enough to get started. If we condition on X1, X2, . . . , Xp and w1, w2, . . . , wq (i.e., assume we know these), then
-2 log f (xnp+1|xp1, w1q) = c + (n - p) log 2
n
+
(wt = xt - 1xt-1 - ? ? ? - pxt-p - 1wt-1 - ? ? ? - qwt-q)2
t=p+1
by the change of variables from xt to wt, as in the normal linear model. This expression becomes amenable to least squares. This approach is called "conditional least squares" in R.
What does this leave out? Conditioning on the initial values does not leave out too much, really. We still
? Need values for w1, . . . , wq, and ? Would like to gain further information from X1, . . . , Xp.
Autoregressions
Why start with autoregressions? Several reasons:
? These often fit quite well (don't need the MA terms) because we know that we can approximate Xt = jXt-j. This 'Markovian' approach has become more popular because of its speed.
? Estimation is fast (MLEs require some iteration).
? Avoid estimation of initial wts. ? Sampling properties are well-known (essentially those of normal
linear model with stochastic explanatory variables).
Backwards Even if we don't want the AR model itself, these are often
used to estimate the initial errors, w1, w2,, . . . , wq. By fitting an
autoregression backwards in time, we can use the fit to estimate say,
w^t(m) = Xt -
m j=1
^j
Xt+j
(if
we
assume
normality,
the
process
is
reversible).
Statistics 910, #12
4
MLE for autoregression In the AR(p) case,
f (x1, . . . , xn) = f (x1, x2, . . . , xp) f (xp+1|xp1) ? ? ? f (xn|xnn--p1)
messy
simple
e-
1 2
Pn
p+1
wt2 / 2
= f (x1, x2, . . . , xp) (22)-(n-p)/2 .
where wt = (xt - ?) - j j(xt-j - ?) for t = p + 1, . . . , n. But for the initial terms (the "messy" part), we have the same likelihood as
in the normal linear model, and the MLEs are those that we would
get from the least squares regression of xt on xt-1, . . . , xt-p and a constant. (If we call the constant in that regression 0, then ?^ = ^0/(1-^1-? ? ?-^p). But for ignoring the contribution from x1, . . . , xp, least squares matches maximum likelihood in the AR(p) case. Hence,
maximum likelihood cannot improve the estimates much unless p is
large relative to n.
Recursion = triangular factorization A recursion captures the full likelihood. For an AR(p) model with coefficients p = (p1, p2, . . . pp) express the lower-order coefficients as functions of p (e.g., find (0) and 11 = Corr(Xt, Xt-1) in terms of p). If we can do that, it is simple to model
f (x1, . . . , xp) = f (x1)f (x2|x1) ? ? ? f (xp|xp1-1) .
The prior AR(1) example shows this.
In general, use the Levinson recursion to obtain a triangular decomposition of the covariance matrix n. This is done by converting the correlated variables X1, . . . , Xn into a collection, say U1, U2, . . ., Un of uncorrelated variables. One has many ways of doing this, such as the Gram-Schmidt or Cholesky factorization. In the following, let Pj denote the projection onto the random variables in Xj (as in fitting a regression).
Following the Cholesky factorization, construct
U1 = X1 U2 = X2 - P1X2 = X2 - 1,1X1
Statistics 910, #12
5
U3 = X3 - P12X3 = X3 - 2,2X1 - 2,1X2 U4 = X4 - P123X4 = X4 - 3,3X1 - 3,2X2 - 3,1X3
j-1
Uj = Xj - j-1,kXj-k
k=1
(This sequence of projections differs from those used in the numerically superior modified Gram-Schmidt method. GS sweeps X1 from all of the others first, filling the first column of L rather than recursively one row at a time.) Let L denote a lower triangular matrix that begins
1
0
0 0 0 ...
-11
1
0 0 0 ...
L
=
-22
-21
1
0
0
...
(3)
-33
-32
-31
1
0
...
...
...
...
with diagonal elements Lkk = 1 and off-diagonal elements Lkj = -k-1,k-j, j = 1, . . . , k - 1. Since the Uj are uncorrelated, we have
Dn = Var(U = LX) = LnL -n 1 = L -1Dn-1L-1 , (4)
where Dn is the diagonal matrix with the conditional variances
k
k2 = Var Xt - kjXt-j
j=1
along the diagonal.
Comments
? It follows that 2 = limn |n+1|/|n|. (Where have we seen this ratio of determinants before? Toeplitz matrices, Szeg?o's theorem, and Levinson's recursion itself.)
? If the process is indeed AR(p), the lower triangular matrix L is banded, with p subdiagonal stripes. The element Lkj = 0 for j < k - p.
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- this slide set hand hygiene in healthcare settings
- last but not least mit
- 2 3 bounds of sets of real numbers ohio state university
- full inclusion the benefits and disadvantages of
- estimating an arma process statistics department
- adjustment reason codes reason code description
- comp481 review problems turing machines and un
- when action is not least edwin f taylor
Related searches
- process of becoming an attorney
- arma pacis fulcra
- department of statistics rankings
- department of statistics south africa
- department of vital statistics florida
- department of education statistics 2016
- arma records management certification
- requisition process in an organization
- kansas department of vital statistics topeka
- department of statistics us
- department of labor statistics wages
- arma roofing llc