Maximum Likelihood Estimation
[Pages:24]Maximum Likelihood Estimation
Eric Zivot
May 14, 2001 This version: November 15, 2009
1 Maximum Likelihood Estimation
1.1 The Likelihood Function
Let X1, . . . , Xn be an iid sample with probability density function (pdf) f (xi; ),
where is a (k ? 1) vector of parameters that characterize f (xi; ). For example, if
Xi~N (, 2)
then
f (xi; )
=
(22)-1/2
exp(-
1 22
(xi
-
)2)
and
=
(, 2)0.
The
joint density of the sample is, by independence, equal to the product of the marginal
densities
Yn f (x1, . . . , xn; ) = f (x1; ) ? ? ? f (xn; ) = f (xi; ).
i=1
The joint density is an n dimensional function of the data x1, . . . , xn given the parameter vector . The joint density1 satisfies
ZZ
f (x1, . . . , xn; ) 0
? ? ? f (x1, . . . , xn; )dx1 ? ? ? dxn = 1.
The likelihood function is defined as the joint density treated as a functions of the parameters :
Yn L(|x1, . . . , xn) = f (x1, . . . , xn; ) = f (xi; ).
i=1
Notice that the likelihood function is a k dimensional function of given the data x1, . . . , xn. It is important to keep in mind that the likelihood function, being a function of and not the data, is not a proper pdf. It is always positive but
ZZ ? ? ? L(|x1, . . . , xn)d1 ? ? ? dk 6= 1.
1 If X1, . . . , Xn are discrete random variables, then f (x1, . . . , xn; ) = Pr(X1 = x1, . . . , Xn = xn) for a fixed value of .
1
To simplify notation, let the vector x = (x1, . . . , xn) denote the observed sample. Then the joint pdf and likelihood function may be expressed as f (x; ) and L(|x).
Example 1 Bernoulli Sampling
Let Xi~ Bernoulli(). That is, Xi = 1 with probability and Xi = 0 with probability 1 - where 0 1. The pdf for Xi is
f (xi; ) = xi(1 - )1-xi, xi = 0, 1
Let X1, . . . , Xn be an iid sample with Xi~ Bernoulli(). The joint density/likelihood function is given by
f (x; )
=
L(|x)
=
Yn xi (1
-
)1-xi
=
Sn
i=1
xi
(1
-
)n-Sni=1 xi
i=1
For a given value of and observed sample x, f (x; ) gives the probability of observing the sample. For example, suppose n = 5 and x = (0, . . . , 0). Now some values of are more likely to have generated this sample than others. In particular, it is more likely that is close to zero than one. To see this, note that the likelihood function for this sample is
L(|(0, . . . , 0)) = (1 - )5
This function is illustrated in figure xxx. The likelihood function has a clear maximum at = 0. That is, = 0 is the value of that makes the observed sample x = (0, . . . , 0) most likely (highest probability)
Similarly, suppose x = (1, . . . , 1). Then the likelihood function is
L(|(1, . . . , 1)) = 5
which is illustrated in figure xxx. Now the likelihood function has a maximum at = 1.
Example 2 Normal Sampling
Let X1, . . . , Xn be an iid sample with Xi~N(, 2). The pdf for Xi is
?
?
f (xi; ) = (22)-1/2 exp
-
1 22
(xi
-
)2
, - < < , 2 > 0, - < x <
so that = (, 2)0. The likelihood function is given by
L(|x)
=
Yn (22)-1/2
exp
? 1
- 22
(xi
-
? )2
i=1
?
!
=
(22)-n/2 exp
1 - 22
Xn (xi
-
)2
i=1
2
Figure xxx illustrates the normal likelihood for a representative sample of size n = 25.
Notice that the likelihood has the same bell-shape of a bivariate normal density
Suppose 2 = 1. Then
?
!
L(|x) = L(|x) = (2)-n/2 exp
-
1 2
Xn (xi
-
)2
i=1
Now
Xn (xi - )2
=
Xn (xi
-
x?
+
x?
-
)2
=
Xn
? (xi
-
x?)2
+
2(xi
-
x?)(x
-
?)
+
(x?
-
)2?
i=1
i=1
i=1
Xn = (xi - x?)2 + n(x? - )2
i=1
so that
?"
#!
L(|x) = (2)-n/2 exp
-1 2
Xn (xi - x?)2 + n(x? - )2
i=1
Since both (xi - x?)2 and (x? - )2 are positive it is clear that L(|x) is maximized at = x?. This is illustrated in figure xxx.
Example 3 Linear Regression Model with Normal Errors
Consider the linear regression
yi = x0i + i, i = 1, . . . , n (1?k)(k?1) i|xi ~ iid N (0, 2)
The pdf of i|xi is
?
?
f (i|xi; 2) = (22)-1/2 exp
-
1 22
2i
The Jacobian of the transformation for i to yi is one so the pdf of yi|xi is normal
with mean x0i and variance 2 :
?
?
f (yi|xi; ) = (22)-1/2 exp
1 - 22
(yi
-
x0i)2
where = (0, 2)0. Given an iid sample of n observations, y and X, the joint density
of the sample is
?
!
f (y|X; )
=
(22)-n/2 exp
1 - 22
Xn (yi
-
x0i)2
?
i=1
?
=
(22)-n/2 exp
-
1 22
(y
-
X)0
(y
-
X)
3
The log-likelihood function is then
ln
L(|y,
X)
=
-n 2
ln(2)
-
n 2
ln(2)
-
1 22
(y
-
X)0(y
-
X)
Example 4 AR(1) model with Normal Errors To be completed
1.2 The Maximum Likelihood Estimator
Suppose we have a random sample from the pdf f (xi; ) and we are interested in estimating . The previous example motives an estimator as the value of that
makes the observed sample most likely. Formally, the maximum likelihood estimator, denoted ^mle, is the value of that maximizes L(|x). That is, ^mle solves
max L(|x)
It is often quite difficult to directly maximize L(|x). It usually much easier to
maximize the log-likelihood function ln L(|x). Since ln(?) is a monotonic function
the value of the that maximizes ln L(|x) will also maximize L(|x). Therefore, we may also define ^mle as the value of that solves
max ln L(|x)
With random sampling, the log-likelihood has the particularly simple form
? Yn
! Xn
ln L(|x) = ln f (xi; ) = ln f (xi; )
i=1
i=1
Since the MLE is defined as a maximization problem, we would like know the conditions under which we may determine the MLE using the techniques of calculus. A regular pdf f (x; ) provides a sufficient set of such conditions. We say the f (x; ) is regular if
1. The support of the random variables X, SX = {x : f (x; ) > 0}, does not depend on
2. f (x; ) is at least three times differentiable with respect to
3. The true value of lies in a compact set
4
If f (x; ) is regular then we may find the MLE by differentiating ln L(|x) and solving the first order conditions
ln L(^mle|x) = 0
Since is (k ? 1) the first order conditions define k, potentially nonlinear, equations
in k unknown values:
ln L(^mle|x)
ln L(^mle|x)
=
... 1
ln L(^mle|x)
k
The vector of derivatives of the log-likelihood function is called the score vector
and is denoted
S(|x) = ln L(|x)
By definition, the MLE satisfies
S(^mle|x) = 0
Under random sampling the score for the sample becomes the sum of the scores for
each observation xi :
S(|x)
=
Xn
ln f (xi; )
=
Xn S(|xi)
i=1
i=1
where
S(|xi)
=
ln f (xi;)
is
the
score
associated
with
xi.
Example 5 Bernoulli example continued
The log-likelihood function is
ln L(|X)
=
ln
? Sn i=1
xi
(1
-
)n-Sni=1
?
xi
?
!
Xn
Xn
= xi ln() + n - xi ln(1 - )
i=1
i=1
The score function for the Bernoulli log-likelihood is
?
!
S(|x)
=
ln L(|x)
=
1
Xn xi
-
1
1 -
Xn n - xi
i=1
i=1
The MLE satisfies S(^mle|x) = 0, which after a little algebra, produces the MLE
^mle
=
1 n
Xn xi.
i=1
Hence, the sample average is the MLE for in the Bernoulli model.
5
Example 6 Normal example continued
Since the normal pdf is regular, we may determine the MLE for = (, 2) by maximizing the log-likelihood
ln
L(|x)
=
n -
2
ln(2)
-
n 2
ln(2)
-
1 22
Xn (xi
-
)2.
i=1
The sample score is a (2 ? 1) vector given by
?
!
ln L(|x)
S(|x) =
ln L(|x)
2
where
ln L(|x)
=
1 Xn 2 (xi - )
i=1
ln L(|x) 2
=
- n (2)-1 2
+
1 (2)-2 2
Xn (xi
-
)2
i=1
Note that the score vector for an observation is
? S(|xi) =
ln f (|xi)
ln f (|xi)
2
!? =
(2)-1(xi - )
-
1 2
(2)-1
+
1 2
(2)-2(xi
-
)2
?
so
that
S(|x)
=
Pn
i=1
S(|xi).
Solving S(^mle|x) = 0 gives the normal equations
ln L(^mle|x)
=
1 ^2mle
Xn (xi
i=1
-
^mle)
=
0
ln L(^mle|x) 2
=
-
n 2
(^2mle)-1
+
1 2
(^2mle)-2
Xn (xi
-
^mle)2
=
0
i=1
Solving the first equation for ^mle gives 1 Xn
^mle = n xi = x? i=1
Hence, the sample average is the MLE for . Using ^mle = x? and solving the second
equation for ^2mle gives
^2mle
=
1 n
Xn (xi
-
x?)2.
i=1
Notice that ^2mle is not equal to the sample variance.
6
Example 7 Linear regression example continued
The log-likelihood is
ln L(|y, X) = -n ln(2) - n ln(2)
2
2
-
1 22
(y
-
X )0 (y
-
X)
The
MLE
of
satisfies
S(^mle|y, X)
=
0
where
S(|y, X)
=
ln
L(|y,
X)
is
the
score vector. Now
ln L(|y, X)
=
1 22
[y0y
-
2y0 X
+
0X0X]
= -(2)-1[-X0y + X0X]
ln L(|y, X) 2
=
- n (2)-1 + 1 (2)-2(y - X)0(y - X)
2
2
Solving
ln L(|y,X)
=
0
for
gives
^mle = (X0X)-1X0y = ^OLS
Next,
solving
ln L(|y,X) 2
=
0
for
2
gives
^2mle
=
1 n
(y
-
X bmle )0 (y
-
X bmle)
6=
^2OLS
=
n
1 -
(y k
-
X bOLS )0 (y
-
X bOLS )
1.3 Properties of the Score Function
The matrix of second derivatives of the log-likelihood is called the Hessian
2 ln L(|x)
H(|x) =
2 ln L(|x) 0
=
... 21
2 ln L(|x)
k1
??? ...
???
2 ln L(|x)
1k
...
2 ln L(|x)
2k
The information matrix is defined as minus the expectation of the Hessian
I(|x) = -E[H(|x)]
If we have random sampling then
H(|x)
=
Xn
2 ln f (|xi) 0
=
Xn H (|xi )
i=1
i=1
7
and Xn
I(|x) = - E[H(|xi)] = -nE[H(|xi)] = nI(|xi)
i=1
The last result says that the sample information matrix is equal to n times the information matrix for an observation.
The following proposition relates some properties of the score function to the information matrix.
Proposition 8 Let f (xi; ) be a regular pdf. Then R
1. E[S(|xi)] = S(|xi)f (xi; )dxi = 0
2. If is a scalar then
Z
var(S(|xi) = E[S(|xi)2] = S(|xi)2f (xi; )dxi = I(|xi)
If is a vector then
Z
var(S(|xi) = E[S(|xi)S(|xi)0] = S(|xi)S(|xi)0f (xi; )dxi = I(|xi)
Proof. For part 1, we have E[S(|xi)] = = = = = = =
Z
S(|xi)f (xi; )dxi
Z
Z
ln
f (xi;
)
f
(xi;
)dxi
1
Z f (xi; ) f (xi; )f (xi; )dxi
Z f (xi; )dxi
f (xi; )dxi
?1 0.
The key part to the proof is the ability to interchange the order of differentiation and
integration.
For part 2, consider the scalar case for simplicity. Now, proceeding as above we
get
E[S(|xi)2]
=
Z
Z S(|xi)2f (xi; )dxi =
?
ln
f (xi;
)
?2
f
(xi;
)dxi
Z? 1
?2
Z
?
1
?2
=
f (xi; ) f (xi; ) f (xi; )dxi = f (xi; ) f (xi; ) dxi
8
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- square roots via newton s method mit mathematics
- math 70300 lehman
- homework solutions cornell university
- probability distributions
- euler maclaurin summation formula
- techniques for finding the distribution of a transformation
- propagation of errors—basic rules
- section 15 2 limits and continuity university of portland
- convex functions usm
- section 3 university of manitoba
Related searches
- likelihood rating scale
- likelihood to recommend
- likelihood to recommend scale
- 5 point likelihood scale
- estimation of population mean calculator
- sample size for estimation minitab
- estimation calculator math
- project management cost estimation methods
- estimation in project management
- press ganey likelihood to recommend
- project cost estimation techniques
- likelihood to recommend question