Module II



Module II

Lecture 4

Special Probability Distributions

Certain probability distributions occur with such regularity in real-life situations that they have been given their own names and it is worth studying their properties.

In this section, we look at three probability distributions that arise in almost every aspect of business.

The Binomial Distribution

Consider the following situations:

a) You audit a transaction, it is either in compliance with procedures or it is not;

b) You hire a person, that person is either a female or a male;

c) You visit a customer, it either leads to a sale or it doesn’t;

d) You lower the price of a product, sales either increase or they don’t;

e) You have a model for the stock market, you predict that it will go up at least 30 points, it either goes up 30 or more points or it doesn’t;

f) You have an intermittent problem on your company’s network, on any given day the problem either appears or doesn’t appear.

All of these situations are ones where the Binomial Distribution may be applicable.

There is a canonical definition for the binomial distribution. This is, a set of assumptions, which, if they hold, indicate that the binomial distribution may be applied to a particular situation.

Let us suppose that in a given situation only one of two possible things can occur. For example, if we flip a fair coin then we can only get the outcomes heads or tails. Flipping the coin is called an experiment in statistical jargon, and heads or tails are called the possible outcomes of the experiment. Each repetition of the experiment is called a trial.

The binomial distribution applies in the following situation:

a) The outcome of any trial can only take on two possible values, say success and failure;

b) There is a constant probability p of success on each trial;

c) The experiment is repeated n times (i.e. n trials are conducted);

d) The trials are statistically independent (i.e. the outcome of past trials does not affect subsequent trials);

then if x equals the number of successes in the n trials, we have:

for x = 0, 1, 2, …… n.

For example, if we flipped a fair coin ten times, and let x equal the number of heads, the above formula would give the following probabilities:

[pic]

Graphically, the probability distribution looks like:

[pic]

If we used a biased coin so that the probability of getting a head is only .3, then the probability distribution would look like:

[pic]

If the coin were extremely biased so that the probability of heads was only .05, then the distribution would look like:

[pic]

EXCEL allows one to compute the binomial probability distribution directly. The form of the function is:

=binomdist(x, n, p, condition),

where x is the value of interest, n is the number of trials, p is the probability of success and condition is either “false” or “true”.

If you specify the following command,

=binomdist(3, 10, .50, false),

then EXCEL will compute the probability that x = 3.

If you use the command,

=binomdist(3, 10, .50, true),

then Excel will compute the probability that [pic]. In other words EXCEL will accumulate the probabilities for x = 0, x = 1, x = 2, and x = 3 and report the total.

The following table shows the use of both conditions in the case where n = 10, and p = .5:

[pic]

One can show that,

E(x) = np,

and,

If instead of x, the number of successes, we are interested in

that is the proportion of successes in n trials, then one can show that

and,

In our case,

E(x) = 10 * .5 = 5,

and,

Let us apply the binomial distribution to a more practical problem then flipping coins.

Suppose that you are going to hire 10 persons from a pool of qualified candidates which is 30 % women. You find that only 1 woman, and 9 men were hired. Is this evidence that the firm is discriminating against women in hiring?

The first question is to determine if the binomial distribution is applicable.

Clearly each hire can only be a man or woman so there are only two possible outcomes.

The hires are probably independent of one another.

The major problem is whether or not the probability of success, p, is constant from trial to trial. If there were only a total of 20 applicants, 6 women and 14 men, then if you hired one of the women on the first hire, that would leave 5 women and 14 men which would mean the probability of hiring a woman for the second hire could only be

5 / 19 = .2632,

which is very large change from the initial probability of .30. On the other hand if the hiring pool consisted of 100 applicants, 30 women and 70 men, then if you hired one of the women on the first hire, that would leave 29 women and 70 men, which would mean the probability of hiring a woman for the second hire would be

29 / 99 = .2929

which is a very small change.

The actual probability distribution to use is called the hypergeometric distribution. However it is well know that if the probability p in the binomial distribution does not change much from trial to trial, then the results from the hypergeometric distribution and the binomial distribution are almost identical.

Assuming that the probability of hiring a woman does not change much over the 10 hires, then we can reasonably assume that the probability is approximately constant over the 10 hires and the assumptions of the binomial distribution are approximately fulfilled.

The next problem we face is that any value between 0 and 10 can possibly occur. Indeed the probability distribution for this situation, i.e. binomial with n = 10, and p = .3, is given in the table below:

[pic]

Notice that although any value is possible the values are not all equally probable. For example it would not at all seem odd if we hired 3 women or 2 women or 4 women since these values all have reasonably high probabilities. On the other hand, it would seem odd if we hired 10 women since the probability of this outcome is approximately 6 chances in 1, 000, 000. Almost as rare as winning the Texas Lottery!!

Statistical logic works like this:

a) define what you think is a rare event (most users of statistics define rare as 1 chance in 20 [.05] or 1 chance in 100 [.01]);

b) if the probability of the observed result or anything more extreme is less than what you define as rare, then the assumed value of p is suspect.

In our case we observed 1 female hire. More extreme is to hire 0 women. Therefore we want the probability of observing 1 or fewer women. This can be obtained directly from the above table or by using the =binomdist(1, 10, .3, true) command.

The result is a probability of .149308. This is roughly a chance of 1 in 7 which most people would not think is rare. Accordingly this data would not be suggestive of disproportionate hiring of women. Of course if it happened more than once, it might be indicative. Suppose a month later the same thing happens. The probability of hiring 1 woman in 10 hires from a pool that is 30% women twice would be:

(.149308) * (.149308) = .0223

which for many people would make one suspect of a fair hiring environment.

Graphically, the probability distribution we would expect for x = the number of women hired when you are hiring for 10 positions (n) from a pool of qualified applicants which is 30% female ( p = .30) is given below with the observed and more extreme value highlighted:

Now let us leave the percentage of women in the qualified pool the same as previously (i.e. p =.3) but now hire for 50 positions (n=50). And assume again we only hire 10% women (x=5). Then the probability distribution would look like:

As can be seen, hiring only 5 women in 50 hires from a pool of 30 % women, is a relatively rare event. Using the binomdist function I can compute:

P(x 5.

For example in the case discussed previously with n = 50 and p = .3, we would have

( = 50 * .3 = 15

[pic]

and in this case np = 50(.3) = 15 and n(1-p) = 50 (.7) = 35 so that the approximation should be good.

The two curves are plotted below:

[pic]

Notice however that the normal curve is continuous while the binomial distribution is discrete with nothing between the values of say 17 and 18.

This distinction between discrete and continuous is important. For the binomial distribution

[pic]

so that a distinction must be made between "less than or equal" and "less than". For the normal distribution however,

[pic]

sin

ce there is no probability that x will exactly equal 10 (by exactly we mean to an infinite number of decimal places).

We can get around this problem of continuous versus discrete by the use of what is called the "continuity correction". Simply it says that when using the normal distribution to approximate a discrete distribution (such as the Binomial or Poisson), assume that any discrete value, say 10, actually goes half way between the previous discrete value and the subsequent discrete value. In other words when using the normal distribution we assume that 10 actually goes from 9.5 to 10.5; 29 would go from 28.5 to 29.5, etc. If we let k be one of the discrete values, then the following five relationships illustrate the use of the continuity correction in all possible cases:

[pic]

[pic]

[pic]

[pic]

[pic]

Fortunately, in EXCEL we can use the binomdist function for most cases and need to use the normal approximation to the Binomial only infrequently.

The Poisson distribution can also be approximated by the normal distribution if the Poisson parameter ( > 5. By taking ( as given and taking the standard deviation as the square root of (, we can approximate the Poisson distribution with the normal distribution.

The following graph shows the Poisson distribution with mean of 5 and the normal distribution with a mean of 5 and standard deviation equal to 2.2361 (the square root of 5):

[pic]

Again if using the normal distribution to compute probabilities for the Poisson distribution one must correct for continuity in the same way as was done for the binomial distribution.

It is very easy to simulate data which follows the normal distribution using the norminv function in EXCEL.

The first step is to generate a set of random numbers using the rand() function as we have done before. Some sample data is shown below:

[pic]

Now suppose we were looking at an investment with a mean return of 8% with a risk (sd) of 2%. Then we could generate 50 normally distributed returns by applying the function

=norminv(random number, .08, .02)

to the 50 random numbers previously generated to get:

[pic]

Finally, one should check that the simulation is approximately on target.

For this data, the sample mean of the simulated values is .0748 compared to the theoretical value of .08. The standard deviation of the simulated values if .0210 compared to the theoretical value of .02. Finally, I have done a histogram of the simulated values which is shown below:

[pic]

As can be seen it is approximately normally distributed.

What if we wanted to simulate correlated investments as we studied earlier in this module? Specifically suppose we wanted to simulate 25 years of returns on two investments. The first having a mean return of .08 and a standard deviation of .02 and the second having a mean return of .12 with a standard deviation of .05. And suppose that the investments are correlated with a correlation coefficient of -.4.

The difficult part is generating the values so they are correlated. Fortunately there is a theoretical result that says if z1 and z2 are two independent random variables each with mean 0 and standard deviation 1, and we define two new variables x1 and x2 with the equations:

[pic]

[pic]

then x1 and x2 will both still have mean 0 and standard deviation 1, but now the x's will be correlated with correlation coefficient r.

Let us illustrate this procedure in steps. First generate two columns of 25 random numbers using the =rand() EXCEL function. The data would look like:

[pic]

Next use the EXCEL function =normsinv(random number) to generate two columns of uncorrelated normal random variables with mean zero and standard deviation 1. The data would look something like this:

[pic]

Using the =correl(z1, z2) we get a correlation of .017693 compared to the theoretical value of 0.

Next we implement the formula given above to induce the appropriate correlation, in this case r = -.4. The formula would look like:

[pic]

The actual formula contained in the cell for x1 in shown on the following screen shot:

[pic]

The actual formula for the cell for x2 is shown in the screen shot below:

[pic]

After transforming all the pairs, one would get the following result:

[pic]

Finally we need to adjust the generated values to have the appropriate means and standard deviations. The first investment is supposed to have a mean of .08 and a standard deviation of .02, therefore we create the new variable

.08 + .02 * x1

The second investment is supposed to have a mean of .12 and a standard deviation of .05, so we create the new variable

.12 + .05 * x2

The final results are shown below:

[pic]

As can be seen the simulation results agree reasonably with the theoretical values.

I could now simulate what would happen for a 25 year period into the future if I invested $10,000 in each investment. I just need to add one to the simulated returns and cumulate the investment history as shown below:

[pic]

By simply pressing the F9 key, I would recompute all of the values in the above simulation to get results such as:

[pic]

And another realization of a 25 year investment as:

[pic]

By repeatedly recording the Total Portfolio Value after 25 years for each simulation, I could obtain the mean and variation of the results. This is illustrated in the table below for ten simulations of the twenty five year period:

[pic]

Notice that the SD (risk) of the return after 25 years of this portfolio is very high. By Chebyshev's rule the 25 year return would vary between approximately $153,200 and $280,200 at least 75 % of the time.

Note: 10 is a very small number of simulations. 100 would be much better

Simulating a Decision Problem

In Lecture 3 of this module we discussed how to use decision trees to help make decisions in an uncertain environment. It is possible to simulate the problem outlined by the decision tree as an alternative to the formal procedures outlined earlier. Below is decision tree of that problem.

[pic]

The first step in simulating the above tree is to devise a way to simulate the uncertain outcomes, that is, the outcomes over which we have no control. In the above tree there are three such outcomes. These are identified by branches labeled with probabilities. Looking at the tree from left to right, we first encounter the uncertainty in what the consultants will say. That is:

[pic]

To simulate this distribution we need only simulate a random number using the =rand() function in EXCEL. If the random number is between 0.00 and 0.22, then the outcome corresponds to the Consultants saying “Low”. If it is between 0.22 and 0.66, then the outcome corresponds to the Consultants saying “Medium” and if the outcome is greater than .66, the outcome corresponds to the Consultants saying “High”.

Again moving from left to right, we encounter the two possible outcomes, A and B. For Investment 1, the probability table would look like:

[pic]

In order to simulate this outcome, we need to generate one random number and compare it to 0.50. If it is less than or equal to 0.50, then we indicate that Outcome A has occurred, otherwise outcome B occurs.

Simulation of Investment 2 is more complicated since the probability distribution depends on what the consultants say. For example if the Consultants say “Low”, then we simulate the Actual State from the following table:

[pic]

If the generated random number is between 0.0000 and 0.8182, then we record that the Actual State was Low; if the random number is between 0.8182 and 0.9318, we record that the Actual State was Medium; and if the random number is greater than 0.9318 we record that the Actual State was High.

If the Consultants say Medium, then the appropriate distribution to simulate is:

[pic]

If the generated random number is between 0.0000 and 0.0227, then we record that the Actual State was Low; if the random number is between 0.0227 and 0.9318, we record that the Actual State was Medium; and if the random number is greater than 0.9318 we record that the Actual State was High.

Finally, if the consultants say High, then the appropriate distribution for the simulation is given by:

[pic]

If the generated random number is between 0.0000 and 0.0294, then we record that the Actual State was Low; if the random number is between 0.0294 and 0.2500, we record that the Actual State was Medium; and if the random number is greater than 0.2500 we record that the Actual State was High.

In the EXCEL file “simulatedecision.xls”, the above procedures are implemented. The results for ten simulations are shown below:

[pic]

We have now finished with all of the random components of the tree, that is, we can now simulate the outcomes over which the manager has no control. Now it is necessary to list the possible decision that the manager can make.

Again looking at the decision tree from left to right, the first major decision is whether or not to hire the Consultants. Following the “Don’t Hire Consultants” branch, the manager must next decide whether to pick Investment 1 or Investment 2.

On the other hand, if one decides to hire the consultants, the consultants will give their report and then the manager must decide between Investment 1 and Investment 2. Since we do not know what the consultant’s will say, a strategy must specify what investment the manager will pick for every possible prediction the consultant’s could make. That is, we must specify an investment choice for the three possible predictions of Low, Medium, and High. Since there are two possible choices for each of the consultant’s possible predictions, there are [pic] possible strategies that must be specified. These are shown below:

[pic]

In total there are 10 possible strategies that the manager can make, two if one doesn’t hire the consultants and eight if one does.

One now adds ten columns, one for each strategy, to the random columns that we specified before. The results would look like:

[pic]

Examine “Repeat 1”. For the No Consultant Investment 1 strategy, the simulation indicated that outcome B occurred for investment 1, so are gain would have been 7 (million dollars). For the No Consultant Investment 2 strategy, the actual state was Medium so we would gain 6 (million dollars). The other eight strategies all involved hiring the consultants at a cost of 1 (million dollars). The simulation indicates that the Consultants said Medium so for Hire Consultant Strategies I, IV, VI and VIII we would gain 6 (million dollars) since the Investment 1 result was B which paid 7 (million dollars) but we paid one million for the consultants. For the Hire Consultant Strategies II, III, V, and VII we would gain 5 (million dollars) since we chose Investment 2 which pays 6 (million dollars) for the Medium actual state but the cost of the consultants was one million dollars. Try working out the results for “Repeat 5”.

Below is an example of fifty repetitions of the simulation along with the average, standard deviation and ratio of mean to standard deviation for all ten strategies.

[pic]

Notice that the highest average of 7.26 million is given by the strategy Don’t

Hire Consultants and go with Investment 2. This agrees with our earlier theoretical results. Similarly, the highest ratio is achieved by the strategy Don’t Hire Consultants and go with Investment 1, which also agrees with our earlier theoretical results.

A simulation of fifty cases is quite small and sometimes the results deviate from theory. For example in the fifty cases simulated below, the highest ratio is computed for Strategy VII rather than the strategy Don’t Hire Consultants and go with Investment I. However, most of the time, Strategy VII does not have the highest ratio. It is important, when simulating data, to make sure that you simulate the situation several hundred times so that random deviations from theory do not mislead decision making.

[pic]

-----------------------

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download