6 Finite Sample Theory of Order Statistics and Extremes

[Pages:24]6 Finite Sample Theory of Order Statistics and Extremes

The ordered values of a sample of observations are called the order statistics of the sample, and the smallest and the largest called the extremes. Order statistics and extremes are among the most important functions of a set of random variables that we study in probability and statistics. There is natural interest in studying the highs and lows of a sequence, and the other order statistics help in understanding concentration of probability in a distribution, or equivalently, the diversity in the population represented by the distribution. Order statistics are also useful in statistical inference, where estimates of parameters are often based on some suitable functions of the order statistics. In particular, the median is of very special importance. There is a well developed theory of the order statistics of a fixed number n of observations from a fixed distribution, as also an asymptotic theory where n goes to infinity. We will discuss the case of fixed n in this chapter. Distribution theory for order statistics when the observations are from a discrete distribution is complex, both notationally and algebraically, because of the fact that there could be several observations which are actually equal. These ties among the sample values make the distribution theory cumbersome. We therefore concentrate on the continuous case.

Principal references for this chapter are the books by David (1980). Reiss (1989), Galambos (1987), Resnick (2007), and Leadbetter, Lindgren, and Rootz?en (1983). Specific other references are given in the sections.

6.1 Basic Distribution Theory

Definition 6.1. Let X1.X2, ? ? ? , Xn be any n real valued random variables. Let X(1) X(2) ? ? ? X(n) denote the ordered values of X1.X2, ? ? ? , Xn. Then, X(1), X(2), ? ? ? , X(n) are called the order statistics of X1.X2, ? ? ? , Xn.

Remark: Thus, the minimum among X1.X2, ? ? ? , Xn is the first order statistic, and the maximum the nth order statistic. The middle value among X1.X2, ? ? ? , Xn is called the median. But it needs to be defined precisely, because there is really no middle value when n is an even integer. Here is our definition.

Definition 6.2. Let X1.X2, ? ? ? , Xn be any n real valued random variables. Then, the median of

X1.X2, ? ? ? , Xn is defined to be Mn = X(m+1) if n = 2m + 1 (an odd integer), and Mn = X(m) if

n = 2m (an even integer). That is, in either case, the median is the order statistic X(k) where k

is

the

smallest

integer

n 2

.

Example 6.1. Suppose .3, .53, .68, .06, .73, .48, .87, .42, .89, .44 are ten independent observations

from the U [0, 1] distribution. Then, the order statistics are .06, .3, .42, .44, .48, .53, .68, .73, .87,

.89.

Thus,

X(1)

=

.06, X(n)

=

.89,

and

since

n 2

=

5, Mn

=

X(5)

=

.48.

An important connection to understand is the connection order statistics have with the empir-

ical CDF, a function of immense theoretical and methodological importance in both probability

and statistics.

Definition 6.3. Let X1, X2, ? ? ? , Xn be any n real valued random variables. The empirical CDF

of X1.X2, ? ? ? , Xn, also called the empirical CDF of the sample, is the function

Fn(x)

=

# {Xi

: Xi n

x} ,

189

i.e., Fn(x) measures the proportion of sample values that are x for a given x.

Remark: Therefore, by its definition, Fn(x) = 0 whenever x < X(1), and Fn(x) = 1 whenever

x X(n).

It is

also a

constant ,

namely,

k n

,

for

all

x-values

in the interval [X(k), X(k+1)).

So Fn

satisfies all the properties of being a valid CDF. Indeed, it is the CDF of a discrete distribution,

which

puts

an

equal

probability

of

1 n

at

the

sample

values

X1, X2, ? ? ? , Xn.

This

calls

for

another

definition.

Definition

6.4.

Let

Pn

denote

the

discrete

distribution

which

assigns

probability

1 n

to

each

Xi.

Then, Pn is called the empirical measure of the sample.

Definition 6.5. Let Qn(p) = Fn-1(p) be the quantile function corresponding to Fn. Then, Qn = Fn-1 is called the quantile function of X1, X2, ? ? ? , Xn, or the empirical quantile function.

We can now relate the median and the order statistics to the quantile function Fn-1.

Proposition Let X1, X2, ? ? ? , Xn be n random variables. Then,

(a)X(i)

=

Fn-1(

i );

n

(b)Mn

=

Fn-1(

1 2

).

We now specialize to the case where X1, X2, ? ? ? , Xn are independent random variables with a

common density function f (x) and CDF F (x), and work out the fundamental distribution theory

of the order statistics X(1), X(2), ? ? ? , X(n).

Theorem 6.1. (Joint Density of All the Order Statistics) Let X1, X2, ? ? ? , Xn be independent random variables with a common density function f (x). Then, the joint density function of X(1), X(2), ? ? ? , X(n) is given by

f1,2,???,n(y1, y2, ? ? ? , yn) = n!f (y1)f (y2) ? ? ? f (yn)I{y1 ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download