Probability Distributions

Probability Distributions

CEE 201L. Uncertainty, Design, and Optimization Department of Civil and Environmental Engineering Duke University

Philip Scott Harvey, Henri P. Gavin and Jeffrey T. Scruggs Spring 2022

In the context of random variables, capital italics (X) represent an uncertain quantity (a random variable) and lower case italics (x) represent a particular value of that random variable. Random variables can be discrete or continuous.

? Discrete random variables can take on values that are members of a (finite or infinite) set of discrete values. If X can take on only positive whole numbers (the number of times a team can win over all time), then X is a discrete random variable with an infinitely large population. If X can take on only whole numbers, between 0 and 23, (the hour of a day) then X is a discrete random variable with a finite population.

? Continuous random variables can take on any value within finite or infinite bounds. The population of potential values of any continuous random variable is infinitely large.

This document focuses on continuous random variables.

1 Probability distributions of continuous random variables

The properties of a random variable (rv) X distributed over the domain x X x^ are fully described by its probability density function or its cumulative distribution function.

The probability density function (PDF) of X is the function fX (x) such that for any two numbers a and b within the domain x a b x^,

b

P [a < X b] = fX (x) dx

a

For fX (x) to be a proper distribution, it must satisfy the following two conditions:

? The PDF fX (x) is not negative; fX (x) 0 for all values of x between x and x^.

x^

? The rule of total probability holds; the total area under fX (x) is 1; fX (x) dx = 1.

x

The cumulative distribution function (CDF) of X is the function FX (x) that gives, for any specified value b between x and x^, the probability that the random variable X is less than or equal to the value b is written as P [X b]. The CDF is defined by

x

FX (x) = P [X x] = fX (s) ds ,

-

2 CEE 201L. Uncertainty, Design, and Optimization ? Duke University ? Spring 2022 ? P.S.H., H.P.G. and J.T.S.

where s is a dummy variable of integration. So, P [a < X b] = FX (b) - FX (a)

By the first fundamental theorem of calculus, the functions fX (x) and FX (x) are related as

fX (x)

=

d dx

FX

(x)

Some important characteristics of CDF's of X are:

? CDF's, FX (x), are monotonic non-decreasing functions of x.

? For any number a, P [X > a] = 1 - P [X a] = 1 - FX (a)

b

? For any two numbers a and b, with a b, P [a < X b] = FX (b) - FX (a) = fX (x)dx

a

(CC) BY-NC-ND April 27, 2022 PSH, HPG, JTS

Probability Distributions

3

2 Statistics of random variables

The expected or mean value of a continuous random variable X with PDF fX (x) is the centroid of the probability density.

?X = E[X] = x fX (x) dx

-

The expected value of an arbitrary function of X, g(X), with respect to the PDF fX (x) is

?g(X) = E[g(X)] =

g(x) fX (x) dx

-

The variance of a continuous random variable X with PDF fX (x) and mean ?X gives a quantitative measure of how much spread or dispersion there is in the distribution of x values. The variance is the expectation of (X - ?X )2

X2 V[X] = E[(X - ?X )2]

= (x - ?X )2 fX (x) dx

-

=

(x2 - 2?X x + ?2X ) fX (x) dx

-

=

x2fX (x) dx - 2?X

xfX (x) dx + ?2X

fX (x) dx

-

-

-

= E[X2] - 2?X E[X] + ?2X ... but ?X = E[X] so ...

= E[X2] - ?2X ... the mean of the square minus the square of the mean

The standard deviation (s.d.) of X is X = V[X]. The coefficient of variation (c.o.v.) of X is the standard deviation as a fraction of the mean:

cX =

X ?X

... for ?X = 0

The c.o.v. is a normalized measure of dispersion and is dimensionless.

A mode of a probability density function, fX (x), is a value of x such that the PDF is maximized;

d dx fX (x) x=xmode = 0 . A multi-modal distribution is a distribution with multiple modes.

The median value, xmed, is is the value of x such that

P [X xmed] = P [X > xmed] = FX (xmed) = 1 - FX (xmed) = 0.5 .

(CC) BY-NC-ND April 27, 2022 PSH, HPG, JTS

4 CEE 201L. Uncertainty, Design, and Optimization ? Duke University ? Spring 2022 ? P.S.H., H.P.G. and J.T.S.

3 Statistics from a sample of values of a random variable (Sample Statistics)

Consider a fixed sample of m specific observed numerical values {x1, ? ? ? , xm} drawn from a population with CDF FX (x). If X is is a continuous random variable, it can take on any value within potentially infinite bounds. In such cases the population is infinitely large, and it is impossible to know it's distribution FX (x) exactly. A random sample of the population can, however, be used to estimate the population statistics. A few sample statistics are:

? xmax and xmin: the maximum and minimum values of the sample {x1, ? ? ? , xm}

xmin = miin(xi), i = 1, ? ? ? , m

xmax = miax(xi), i = 1, ? ? ? , m

? xavg : the arithmetic average of values of the sample {x1, ? ? ? , xm}

... is the estimate of the population mean, ?^X

?^X

xavg

=

1 m

m

xi

i=1

? xgm : the geometric average of values of the sample {x1, ? ? ? , xm}

m

1/m

xgm =

xi

i=1

? xhm : the harmonic average of values of the sample {x1, ? ? ? , xm}

xhm =

m1 i=1 xi

-1

? xmed: the median value of the sample, for which half of the sample is greater than xmed.

? xmad: the average absolute deviation of the sample,

xmad

=

1 m

m

|xi

i=1

- xavg|

? xsd: the standard deviation of values in the the sample, ... is the estimate of the population standard deviation, ^X

^X

x2sd

=

1 m-1

m

(xi

i=1

- xavg)2

? xcov: the coefficient of variation of the sample

xcov =

xsd xavg

? Sample statistics of a function of a sample {g(x1), g(x2), ? ? ? , g(xm)} are analogously ...

g(x)min = min(g(xi)), i = 1, ? ? ? , m

i

g(x)max = max(g(xi)), i = 1, ? ? ? , m

i

g(x)avg =

1 m

m

g(xi)

i=1

g(x)2sd

=

1 m-1

m

(g(xi) - g(x)avg)2

i=1

Importantly, note that in general, g(x)min = g(xmin), g(x)avg = g(xavg), et cetera.

(CC) BY-NC-ND April 27, 2022 PSH, HPG, JTS

Probability Distributions

5

4 Empirical PDFs, CDFs, and exceedance rates

A PDF and a CDF of a sample of values can be computed directly from the sample. without assuming any particular probability distribution

A sample of m random values can be sorted into increasing numerical order, so that

x1 x2 ? ? ? xi-1 xi xi+1 ? ? ? xN-1 xm.

In the ordered sample there are i data points less than or equal to xi. So, if the sample is representative of the population, and the sample is "big enough" the probability that a random X is less than or equal to the ith ordered value is i/m. In other words, P [X xi] = i/m. Unless we know that no value of X can exceed xm, we must accept some probability that X > xm. So, P [X xm] should be less than 1. In such cases, the unbiased estimate1 E[FX (xi)] for P [X xi] is i/(m + 1)

The empirical CDF computed from a ordered sample of m values is

F^X (xi)

=

m

i +

1

The empirical PDF is basically a histogram of the data. The following Matlab lines plot empirical CDFs and PDFs from a vector of random data, x.

1 m = length(x);

% number of values in the sample

2 x = sort(x);

% sort the sample

3 x_avg = sum( x )/ m ;

% average value

of the sample

4 x_med = x (round( m /2));

% median value

of the sample

5 x_sd = sqrt(var(x));

% standard deviation

of ths sample

6 x_cov = abs( x_sd / x_avg );

% coefficient of variation of the sample

7 nBins = floor(m/50);

% number of bins in the histogram

8 [fx ,xx] = hist (x,nBins );

% compute the histogram

9

fx = fx / m * nBins /(max( x ) -min( x ))); % s c a l e t h e h i s t o g r a m t o a PDF

10 F_x = ([1: m ])/( m +1);

% e m p i r i c a l CDF

11 subplot (121); bar( xx , fx );

% p l o t e m p i r i c a l PDF

12 subplot (122); s t a i r s ( sort ( x ) , F_x ); % p l o t e m p i r i c a l CDF

13 p r o b a b i l i t y _ X _ g t _ 1 = sum(x >1) / ( m +1) % f r a c t i o n o f t h e sample f o r which X > 1

1.4

1

P[X>1] = 0.237

1.2

0.8

x = 0.786 avg

1

0.8

0.6

x = 0.387 sd

PDF f (x) X

CDF F (x) X

0.6

0.4

0.2

0 x -x x

x +x

avg sd avg avg sd

0.4

x = 0.492 cov

0.2

x = 0.703 med

0

0

0.5 x 1

1.5

2

2.5

3

med

x

The number of values in the sample greater than xi is (m - i). If the sample is representative, the probability of a value exceeding xi is Prob[X > xi] = 1 - FX (xi) 1 - i/(m + 1). If the m observations were collected over a period of time T , the average exceedance rate (number of events greater than xi per unit time) is (xi) = (1 - FX (xi))(m/T ) (1 - i/(m + 1))(m/T ).

1E.J. Gumbel, Statistics of extremes, Columbia Univ Press, 1958 Lasse Makkonen, "Problems in the extreme value analysis," Structural Safety 2008:30:405-419

(CC) BY-NC-ND April 27, 2022 PSH, HPG, JTS

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download