Stochastic Processes



An Introductory Example

Van Bael, S. and S. Pruett-Jones. 1996. Exponential population growth on Monk Parakeets in the United States. Wilson Bulletin 108(3):584-588.

Abstract:[pic]

[pic] [pic]

Van Bael and Pruett-Jones used records from the National Audubon Society Christmas Bird Count (CBC) during the winters from 1971-72 to 1994-95. The CBC is a citizen science effort where thousands of citizens count birds in designated 15-mile diameter sample areas (). This event has taken place for over one hundred years during the winter holiday season (December 14 to January 5), and has resulted in a large amount of data.

IN-CLASS ACTIVITY

The following data set from the CBC was used in Van Bael and Pruett-Jones (1996) as a proxy for population size. We will use this data set to build a model that describes the population growth of this species.

|Year |NumberOfMonkParakeetsPerPartyHour(effor|

| |t) |

|76 |0.030 |

|77 |0.031 |

|78 |0.035 |

|79 |0.027 |

|80 |0.041 |

|81 |0.048 |

|82 |0.026 |

|83 |0.048 |

|84 |0.064 |

|85 |0.083 |

|86 |0.177 |

|87 |0.121 |

|88 |0.234 |

|89 |0.200 |

|90 |0.393 |

|91 |0.414 |

|92 |0.436 |

|93 |0.387 |

|94 |0.387 |

|95 |0.501 |

a) Graph the data in EXCEL. Set year 1976 equal to 76 on your graph. Describe the growth. (Tab “Parakeet Data”)

b) The accompanying paper mentions “exponential population growth” in its title. How can you decide whether an exponential function is a suitable description? (We will talk more about this in the second module.)

c) What assumptions are made to infer population size from the CBC data?

Exponential Growth

Deterministic Models

Many species reproduce in distinct seasons. Populations of such species are then best modeled using discrete time models. In their simplest form where the population size at generation t+1, denoted by Nt+1, only depends on the population size at time t, denoted by Nt, we can model this recursively as

[pic]

As the initial condition, we need to specify the population size at time 0, N0. If the population size from one generation to the next is multiplied by a constant factor, exponential growth results. The recursive equation for this type of growth is

(1) [pic]

where R is a nonnegative constant, called the growth parameter. We assume that [pic]. To use this recursion, we need to specify the population size at time 0, N0. This model can be solved explicitly, that is, we can find a function g(t) that describes the population size explicitly as a function of t. Here is how to find the function:

[pic]

Thus,

(2) [pic]

is the solution of (1). It is an exponential function. Figure 1 shows the behavior of the graph of this function.

[pic]

Figure 1: Exponential growth (R=1.1) and decay (R=0.9)

Note that for [pic], the population size is decreasing exponentially (exponential decay), whereas for [pic], it is increasing exponentially (exponential growth). We can express the long-term behavior formally using limits. Namely,

[pic]

In this model, the population size is viewed as a continuous variable. This is an approximation since a population size is measured in number of individuals, which is a discrete variable. In the model, if the initial population size is positive, the population size will remain positive for all times. As a consequence, if a population is declining in size, the population size will approach zero but never be equal to zero, and population sizes that are unrealistic, namely positive but less than one, will result. Modeling a population size with a continuous variable has the advantage that analytical tools are better developed for continuous variables than for discrete variables. Having tools for rigorous analysis allows us to go beyond simulations, which makes the results more robust.

A Very Brief Introduction to EXCEL and MATLAB

IN-CLASS ACTIVITY

EXCEL

We will use an EXCEL spreadsheet (Tab: “Logistic Growth”) to code exponential growth for [pic] and determine the population size for t=1,2,…,20 when the population size is 1 at time 0. Let’s take the recursive equation [Equation (1)] and the explicit function [Equation (2)] and set up a spreadsheet in the following way.

|  |A |B |C |D |

|1 |Discrete Time Logistic Growth |

|2 | | | | |

|3 |Parameters | | | |

|4 |R |N_0 | | |

|5 |0.5 |1 | | |

|6 | | | | |

|7 |Time |Equation (1) |Equation (2) | |

|8 |0 |1 |1 | |

|9 |1 |0.5 |0.5 | |

|10 |2 |0.25 |0.25 | |

|11 |3 |0.125 |0.125 | |

|12 |4 |0.0625 |0.0625 | |

• In Row 5, enter the parameters.

• In Cell A8, enter 0; in Cell A9, enter 1. Highlight both Cells A8 and A9 and drag them down to time 20.

• In Cell B8, enter “=$B$5”, in Cell B9, enter “=$A$5*B8”. Explain why.

• In Cell C8, enter “=$B$5*$A$5^A8”. Explain why.

• Drag Cell B9 down to time 20; drag C8 down to time 20. Note that the two equations yield the same answers (which you should expect).

• You can now change the parameter R in cell A5 and explore the behavior.

Office 2003: Graphical explorations are very valuable and we will graph Nt as a function of t using Equations (1). To graph this, highlight the array you want to include in the graph (i.e., cells A7-A28 and B7-B28), and click on the Chart Wizard on your spreadsheet (follow instructions during lecture). Choose XY (Scatter) from the options and click on Next. The Wizard will guide you through the steps to label the graph. It is important to choose the appropriate chart type. Since this is a discrete time model, you need to plot the values at the discrete times 0,1,2,…, which you can do by choosing the appropriate Chart sub-type. Sometimes it is useful to connect the dots but you should make sure that the markers remain visible.

Office 2007: Graphical explorations are very valuable and we will graph Nt as a function of t using Equations (1). To graph this, highlight the array you want to include in the graph (i.e., cells A7-A28 and B7-B28), and click on the Insert tab. Choose Scatter in the Charts group, and click on the Scatter with only Markers option. This will plot population size as a function of time. It is important to choose the appropriate chart type. Since this is a discrete time model, you need to plot the values at the discrete times 0,1,2,…, which you can do by choosing the appropriate Chart sub-type. Sometimes it is useful to connect the dots but you should make sure that the markers remain visible. Label the graph.

MATLAB

MATLAB is a much more sophisticated software program than EXCEL. We will use both of these programs. EXCEL is very easy to use and gives you immediate feedback. It is excellent for exploring models. However, if you work on a research project, MATLAB is typically more appropriate (unless the model is really simple).

In MATLAB, you write your programs in m-files. We’ll use exponential growth to explain how you get a program running, including creating a plot. Write the following into an m-file. Save the m-file as “expgrowth.”

%exponential growth

clear

r=1.5; %parameter for growth

endtime=21; %end time

popsize=zeros(1,endtime); %initializing the population row vector

popsize(1)=20; %initializing the population size at time 0

generation=0:1:endtime-1; %specifying the generation vector

for i=1:endtime-1

popsize(i+1)=r*popsize(i); %recursive form

end

plot(generation, popsize,'kx')

xlabel('Generation')

ylabel('Population Size')

title(['Exponential Growth with parameter R=', num2str(r)])

To run this in MATLAB, you need to go to the Command Window and type the file name, i.e., expgrowth. This will generate a graph. Make sure that MATLAB looks for the file in the appropriate directory. You might need to change the directory. The command for changing directories is cd(‘directory name’)

Stochastic Models

Stochastic or random processes are ubiquitous in biology. Yet, stochasticity is rarely included in models of population or community dynamics. The theory of stochastic processes is quite well developed but is in general not as straightforward as the analysis of deterministic processes. Simulations can help a great deal to get a better sense of the effects of stochasticity. Let’s again look at the simplest population growth model, exponential growth in discrete time:

[pic]

The parameter R, which is assumed to be positive, is the growth parameter. We investigated the behavior of this model and found that the solution is

[pic]

We can define a critical value where the behavior changes, namely [pic]. For [pic], the population goes extinct, whereas for [pic], the population will grow indefinitely provided [pic]. We assumed in this model that the parameter R is the same in each generation. This, of course, need not be. There are many factors that may change the parameter R from time step to time step. In Figure 2, we compare deterministic exponential growth and stochastic exponential growth. In this figure, we set R=1.05 in the deterministic case and let R vary randomly between 0.85 and 1.25 (i.e., [pic]) in the stochastic case (we will give a precise definition below of “vary randomly”). Figure 2 shows a single realization of this stochastic process and compares it to a deterministic process with the same value of R.

[pic]

Figure 2: Stochastic and deterministic exponential growth.

The simulation of the exponential growth was done in EXCEL. EXCEL has a random number generator that generates pseudo-random variables that are uniformly distributed in the interval (0,1). The function is RAND() and when entered this way into a cell will generate a pseudo-random number. Since this is a function you need to type ‘=RAND()’ into the cell (without the quotation marks). Try it! Use the F9 key to get a different realization. To get an idea of how a population behaves in a random environment, we need to repeat this simulation many times. This is better done in MATLAB.

Before we can code up exponential growth in a random environment in either EXCEL or MATLAB, we need to develop the theory further. We will begin our discussion with defining the uniform distribution and then explain how a computer can generate random numbers that follow this distribution. We will then introduce a number of other distributions and explain how to simulate these.

The Uniform Distribution

A random variable X is uniformly distributed over the interval (a,b) if the probability that the random variable falls into a subinterval of a given length that is contained in (a,b) is proportional to the length of the subinterval.

[pic]

Figure 3: The uniform distribution over the interval (a,b).

[pic]

Another way to describe this distribution is to use the distribution function. A distribution function of a random variable X is defined as [pic]. A distribution function is a nondecreasing function between 0 and 1. It has the property

[pic] and [pic]

If X is uniformly distributed over the interval (0,1), then the distribution function is given by

[pic]

Figure 4: The distribution function of a uniform distribution.

The distribution function of this random variable is a continuous function (Figure 4). We therefore call the random variable a continuous random variable. We can use the distribution function to compute probabilities, such as [pic]. Namely,

[pic]

The uniform distribution can also be described by a density function. A density function [pic] is defined as

[pic]

and has the properties

[pic]

It follows from the fundamental theorem of calculus that [pic] where [pic] is differentiable. Therefore, the density function of a uniformly distributed random variable over the interval (0,1) is given by

[pic]

Task 1

The density function of a uniformly distributed random variable over the interval (a,b), [pic], is given by

[pic]

Find the distribution function and verify that [pic], provided [pic].

Below, we will need the expectation of a random variable. The expected value of a random variable X that is distributed according to a probability distribution with density [pic] is given by

[pic]

if the integral is defined.

Example: Find the expected value of a uniformly distributed random variable over the interval [pic] with [pic].

Solution: Using the definition [pic] and the result of Task 1, we need to evaluate

[pic]

We find that the expected value of X is the midpoint of the interval [pic].

To define the expected value of a function of a random variable, let X be a random variable that is distributed according to a probability distribution with density [pic] and let [pic] be a function. Then

[pic]

if the integral is defined.

Task 2

Let X be a uniformly distributed random variable on the interval (a,b), [pic], with density function

[pic]

(a) Find [pic] for general values of a and b. (b) Find [pic] for general values of a and b. (c) Compare your results in (a) and (b) for a=1 and b=2.

Random Number Generator

Generating pseudo-random numbers on a computer is surprisingly simple. Here is an algorithm that generates pseudo-random variables that are uniformly distributed over (0,1). We define three numbers, a, c, and m, and a positive number, called seed, [pic]. The following algorithm defines a sequence of numbers that generates both a new seed and the pseudo-random number:

(3) [pic]

Here is an example with [pic] and [pic]:

[pic]

The new seed is [pic] and the pseudo-random variable is [pic].

The numbers that are generated this way are not truly random. In fact, with [pic], there are at most 6075 different seeds this algorithm can produce, namely all the integers between 0 and 6074. Once the algorithm produces an integer that has already come up, the pseudo-random numbers will repeat the previous numbers. The shorter the cycle length, the worse the algorithm. In fact, the algorithm we gave above is a really bad random number generator since its cycle length is quite short, less than [pic].

There are tables that list combinations of numbers for a, c, and m that produce better sequences of pseudo-random numbers. There are also more sophisticated algorithms. Researchers have developed criteria to check whether a random number generator is “good.”

IN-CLASS ACTIVITY

Set up a spreadsheet (Tabb: “Random Number Generator”) to generate 10 pseudo-random variables using the algorithm provided in Equation (3) with [pic] and [pic].

|  |A |B |C |D |E |

|1 | | | | | |

|2 | |a |c |m |I_0 |

|3 | |106 |1283 |6075 |4567 |

|4 | | | | | |

|5 | |j |aI_j+c |seed |random number |

|6 | |  |  |4567 |  |

|7 | |0 |485385 |5460 |0.898765432 |

|8 | |1 |  |  |  |

|9 | |2 |  |  |  |

|10 | |3 |  |  |  |

|11 | |4 |  |  |  |

|12 | |5 |  |  |  |

|13 | |6 |  |  |  |

|14 | |7 |  |  |  |

|15 | |8 |  |  |  |

|16 | |9 |  |  |  |

|17 | |10 |  |  |  |

Cell C7: =($B$3*D6+$C$3)

Cell D7: =MOD(C7,$D$3)

Cell E7: =D7/$D$3

EXCEL can produce pseudo-random numbers as well. The random number generator in EXCEL can be called using the command RAND(). It is known to be a bad random number generator since it has a short cycle length. Even though the random number generator in EXCEL should never be used for scientific research, it is good enough for educational purposes to illustrate effects of stochasticity, provided not too many pseudo-random variables are called. If you enter “=RAND()” into a cell in an EXCEL spreadsheet and hit Enter, you will see a number between 0 and 1 appearing in the cell. Every time you hit the F9 key, EXCEL will generate a new pseudo-random number. Try it!

Generating Other Random Variables

Discussion[1]: Suppose you wanted to generate pseudo-random variables that come from a uniform distribution over the interval (0,2) but you only have access to a random number generator that generates uniform random variables over the interval (0,1). How would you proceed?

We wish to generalize this approach to arbitrary continuous distributions. The following result will help us (Figure 5).

Theorem. If X is distributed according to a distribution given by the distribution function [pic], then [pic] is uniformly distributed over the interval (0,1).

Proof. The proof is very short: For [pic],

[pic].

How do we use this theorem? We generate a random variable that is uniformly distributed over (0,1). Call this random variable U. We next compute the inverse function of [pic], denoted by [pic]. The random variable we seek is then [pic].

[pic]

Figure 5: Finding random variables with arbitrary, continuous distribution functions.

Example: Suppose a random number generator that generates uniformly distributed pseudo-random numbers in the interval between (0,1) generated the following two pseudo-random variables: 0.3858 and 0.8627. Use these two pseudo-random variables to generate uniformly distributed pseudo-random variables in the interval (0,2).

Solution: Informally, we already know that we need to multiply each of the pseudorandom variables by 2 since we “stretch” the interval (0,1) by 2 to get a uniform distribution on (0,2). But let’s see how we would proceed more formally. The distribution function of a uniform random variable in the interval (0,2) is given by [pic] for [pic]. To invert the distribution function, we compute

[pic]

which means that [pic] (after interchanging the roles of x and y). Hence, we find

[pic]

The Bernoulli Distribution

The Bernoulli distribution is a discrete distribution. We say that a random variable X is Bernoulli distributed with parameter p if X takes on two values, 0 and 1, with probabilities p and 1-p, respectively. That is,

[pic]

We can use a random number generator to generate Bernoulli distributed random variables as follows. Let U denote the random variable that is uniform on (0,1) and by X the Bernoulli random variable with parameter p. Generate a uniform random variable and assume that the outcome is u. If u is between 0 and p, then assign the value 1 to X, otherwise, assign the value 0.

IN-CLASS ACTIVITY

We will use the random number generator RAND() in EXCEL to generate Bernoulli pseudo-random variables with parameter [pic]. Suppose the parameter p is stored in the cell C3, then the command “=IF(RAND() ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download