Hyper geometric Probability Distribution



Hyper geometric Distribution: Examples and FormulaIn real life situations, we study a probability in different perspectives. Many times it is not a straight forward choosing item/items from a lot. Sometimes we may have to go by certain guidelines and have to choose. Also the choice we make every time may not be replaced and thereby affecting the probability of subsequent choices. For example, from a lot of 20 marbles, which include 15 blue and 5 red, you may have to draw a lot of 10 marbles. The probability distribution for different combinations of colors of marbles is an example of hyper geometric distribution.?The hyper geometric distribution is a?probability distribution?that’s very similar to the?binomial distribution.?In fact, the?binomial distribution?is a very good approximation of the hyper geometric distribution as long as you are sampling 5% or less of the?population.Therefore, in order to understand the hyper geometric distribution, you should be very familiar with the?binomial distribution.?Plus, you should be fairly comfortable with the?combinations?formula.Hyper geometric Probability DistributionHyper geometric probability distribution is similar to a binomial distribution but?‘without replacement’?in this case.?To generalize, a probability distribution to be hyper geometric,?The sample from a population of N is not replaced.All the items of the sample size are open for trial.The total number of items are classified into two groups, marked (M) and unmarked (N – M)The quantum of M and N are known and fixed.The sample size X is out of M and ‘x’ is the particular number of successes out of X.From the above stipulations one can logically conclude that ‘x’ must be less than or equal to minimum of the two values, M and N and greater than the maximum of the two values 0 and (M + n – N).Now, let us study how to go about further on hyper geometric probability distribution. When a closer study is made, the concept may be clearly understood. Firstly, you must correctly identify the quantum of M because in some cases it may not be explicitly known. The case in this probability distribution is ‘without replacement’. So, in the first outcome ‘x’ items are drawn in different combinations from a total of ‘M’, limiting the scope for the second outcome for drawing only (n – x) items from the available total of (N – M). Therefore, the net successful outcomes is the product of the combination of M items taking ‘x’ at a time and the combination of (N – M) items taking (n – x) at a time. Obviously, the total possible outcome is again a combination of N items taking ‘n’ items at a time. If you need a brush up, see:What are Combinations?Binomial Distributions.Hyper geometric Distribution FormulaThe (somewhat formal) definition for the hyper geometric distribution, where X is a?random variable,?is:Where:K is the number of successes in the populationk is the number of observed successesN is the population sizen is the number of draws(sample size)You?could?just plug your values into the formula, but a much easier way is just to think through the problem, using your knowledge of?combinations.Hyper geometric Distribution Example 1A deck of cards contains 20 cards: 6 red cards and 14 black cards. 5 cards are drawn randomly?without replacement. What is the probability that exactly 4 red cards are drawn?The probability of choosing exactly 4 red cards is:P(4 red cards) = # samples with 4 red cards and 1 black card / # of possible 4 card samplesUsing the?combinations formula, the problem?becomes:In shorthand, the above formula can be written as:(6C4×14C1)/20C5where6C4 means that out of 6 possible red cards, we are choosing 4.14C1 means that out of a possible 14 black cards, we’re choosing 1.Solution = (6C4×14C1)/20C5 = 15×14/15504 = 0.0135The?binomial distribution?doesn’t apply here, because the cards are not replaced once they are drawn. In other words, the trials are not?independent events.?For example, for 1 red card, the probability is 6/20 on the first draw. If that card is red, the probability of choosing another red card falls to 5/19.Hyper geometric Distribution Example 2A small voting district has 101 female voters and 95 male voters. A?random sample?of 10 voters is drawn. What is the probability exactly 7 of the voters will be female?101C7×95C3/(196C10)= (17199613200×138415) /18257282924056176 = 0.130Where:101C7 is the number of ways of choosing 7 females from 101 and95C3 is the number of ways of choosing 3 male voters from 900*196C10 is the total voters (196) of which we are choosing 10Question?1:?A box of 20 marbles contains 15 blue and 5 red. You need to draw a lot of 10 marbles at random. Find the probability of drawing 6 blue marbles in the lot drawn.Question?2:?In the same box described in Example 1, 10 green marbles are added. Suppose 10 marbles are drawn at random, find the probability of drawing a combination of 5 blue marbles, 2 red marbles and 3 green marbles.Question?3:?A public accounts committee of 5 persons to be set out of 12 members from ruling party and 8 members from opposition party. What is the probability of 3 members of opposition party is selected??The Binomial DistributionThe binomial distribution is a?probability distribution?that summarizes the likelihood that a value will take one of two independent values under a given set of parameters or assumptions. The underlying assumptions of the binomial distribution are that there is only one outcome for each trial, that each trial has the same probability of success and that each trial is?mutually exclusive, or independent of each other In?probability theory?and?statistics, the?binomial distribution?with parameters?n?and?p?is the?discrete probability distribution?of the number of successes in a sequence of?n?independent?yes/no experiments, each of which yields success with?probability?p. A success/failure experiment is also called a Bernoulli experiment or?Bernoulli trial; when?n?= 1, the binomial distribution is a?Bernoulli distribution. The binomial distribution is the basis for the popular?binomial test?of?statistical significance.The binomial distribution is frequently used to model the number of successes in a sample of size?n?drawn?with replacement?from a population of size?N.?If the sampling is carried out without replacement, the draws are not independent and so the resulting distribution is a?hyper geometric distribution, not a binomial one. However, for?N?much larger than?n, the binomial distribution remains a good approximation, and is widely usedIn many cases, it is appropriate to summarize a group of independent observations by the number of observations in the group that represent one of two outcomes. For example, the proportion of individuals in a random sample who support one of two political candidates fits this description. In this case, the?statistic??is the?count?X?of voters who support the candidate divided by the total number of individuals in the group?n. This provides an estimate of the?parameter?p, the proportion of individuals who support the candidate in the entire population.The?binomial distribution?describes the behavior of a count variable?X?if the following conditions apply:1:?The number of observations n is fixed.2:?Each observation is independent.3:?Each observation represents one of two outcomes ("success" or "failure").4:?The probability of "success" p is the same for each outcome.If these conditions are met, then?X?has a binomial distribution with parameters?n?and?p, abbreviated?B(n,p).ExampleSuppose individuals with a certain gene have a 0.70 probability of eventually contracting a certain disease. If 100 individuals with the gene participate in a lifetime study, then the distribution of the random variable describing the number of individuals who will contract the disease is distributed?B(100,0.7).Note: The sampling distribution of a count variable is only well-described by the binomial distribution is cases where the population size is significantly larger than the sample size. As a general rule, the binomial distribution should not be applied to observations from a?simple random sample (SRS)?unless the population size is at least 10 times larger than the sample size.To find probabilities from a binomial distribution, one may either calculate them directly, use a binomial table, or use a computer. The number of sixes rolled by a single die in 20 rolls has a?B(20,1/6)?distribution. The probability of rolling more than 2 sixes in 20 rolls,?P(X>2), is equal to 1 -?P(X<2) = 1 - (P(X=0) + P(X=1) + P(X=2)). Using the MINITAB command "cdf" with subcommand "binomial n=20 p=0.166667" gives the cumulative distribution function as follows:Binomial with n = 20 and p = 0.166667 x P( X < = x) 0 0.0261 1 0.1304 2 0.3287 3 0.5665 4 0.7687 5 0.8982 6 0.9629 7 0.9887 8 0.9972 9 0.9994 The corresponding graphs for the probability density function and cumulative distribution function for the?B(20,1/6)?distribution are shown below:?Since the probability of 2 or fewer sixes is equal to 0.3287, the probability of rolling more than 2 sixes = 1 - 0.3287 = 0.6713.The probability that a random variable?X?with binomial distribution?B(n,p)?is equal to the value?k, where?k = 0, 1,....,n?, is given by?, where?.?The latter expression is known as the?binomial coefficient, stated as "n choose k," or the number of possible ways to choose?k?"successes" from?n?observations. For example, the number of ways to achieve 2 heads in a set of four tosses is "4 choose 2", or 4!/2!2! = (4*3) / (2*1) = 6. The possibilities are {HHTT, HTHT, HTTH, TTHH, THHT, THTH}, where "H" represents a head and "T" represents a tail. The binomial coefficient multiplies the probability of?one?of these possibilities (which is (1/2)?(1/2)? = 1/16 for a fair coin) by the number of ways the outcome may be achieved, for a total probability of 6/16.Mean and Variance of the Binomial DistributionThe binomial distribution for a random variable?X?with parameters?n?and?p?represents the sum of?n?independent variables?Z?which may assume the values 0 or 1. If the probability that each?Z variable assumes the value 1 is equal to?p, then the?mean?of each variable is equal to?1*p + 0*(1-p) = p, and the?variance?is equal to?p(1-p).?By the addition properties for independent random variables, the mean and variance of the binomial distribution are equal to the sum of the means and variances of the?n?independent?Z?variables, so?These definitions are intuitively logical. Imagine, for example, 8 flips of a coin. If the coin is fair, then?p?= 0.5. One would expect the mean number of heads to be half the flips, or?np?= 8*0.5 = 4. The variance is equal to?np(1-p)?= 8*0.5*0.5 = 2.Sample ProportionsIf we know that the count?X?of "successes" in a group of?n?observations with sucess probability?p?has a binomial distribution with mean?np?and variance?np(1-p), then we are able to derive information about the distribution of the?sample proportion, the count of successes?X?divided by the number of observations?n. By the multiplicative properties of the mean, the mean of the distribution of?X/n?is equal to the mean of?X?divided by?n, or?np/n = p. This proves that the sample proportion??is an?unbiased estimator?of the population proportion?p. The variance of?X/n?is equal to the variance of?X?divided by?n?, or?(np(1-p))/n? = (p(1-p))/n?. This formula indicates that as the size of the sample increases, the variance decreases. In the example of rolling a six-sided die 20 times, the probability?p?of rolling a six on any roll is 1/6, and the count?X?of sixes has a?B(20, 1/6)?distribution. The mean of this distribution is 20/6 = 3.33, and the variance is 20*1/6*5/6 = 100/36 = 2.78. The mean of the?proportion?of sixes in the 20 rolls,?X/20, is equal to?p?= 1/6 = 0.167, and the variance of the proportion is equal to (1/6*5/6)/20 = 0.007.Normal Approximations for Counts and ProportionsFor large values of?n, the distributions of the count?X?and the sample proportion??are approximately?normal. This result follows from the?Central Limit Theorem. The mean and variance for the approximately normal distribution of?X?are?np?and?np(1-p), identical to the mean and variance of the binomial(n,p) distribution. Similarly, the mean and variance for the approximately normal distribution of the sample proportion are?p?and?(p(1-p)/n).Note: Because the normal approximation is not accurate for small values of?n, a good rule of thumb is to use the normal approximation only if?np>10 and?np(1-p)>10.For example, consider a population of voters in a given state. The true proportion of voters who favor candidate A is equal to 0.40. Given a sample of 200 voters, what is the probability that more than half of the voters support candidate A?The count?X?of voters in the sample of 200 who support candidate A is distributed?B(200,0.4). The mean of the distribution is equal to 200*0.4 = 80, and the variance is equal to 200*0.4*0.6 = 48. The standard deviation is the square root of the variance, 6.93. The probability that more than half of the voters in the sample support candidate A is equal to the probability that?X?is greater than 100, which is equal to 1-?P(X<?100).To use the normal approximation to calculate this probability, we should first acknowledge that the normal distribution is?continuous?and apply the?continuity correction. This means that the probability for a single discrete value, such as 100, is extended to the probability of the?interval?(99.5,100.5). Because we are interested in the probability that?X?is less than or equal to 100, the normal approximation applies to the upper limit of the interval, 100.5. If we were interested in the probability that?X?is strictly less than 100, then we would apply the normal approximation to the lower end of the interval, 99.5.So, applying the continuity correction and standardizing the variable?X?gives the following:?1 -?P(X<?100)?= 1 -?P(X<?100.5)?= 1 -?P(Z<?(100.5 - 80)/6.93)?= 1 -?P(Z<?20.5/6.93)?= 1 -?P(Z<?2.96) = 1 - (0.9985) = 0.0015. Since the value 100 is nearly three standard deviations away from the mean 80, the probability of observing a count this high is extremely small.Median of binomial Distribution If?np?is an integer, then the mean, median, and mode coincide and equal?np. Any median?m?must lie within the interval ?np??≤?m?≤??np?.A median?m?cannot lie too far away from the mean:?|m???np| ≤ min{?ln 2, max{p, 1 ??p}?}.The median is unique and equal to?m?=?round (np) in cases when either?p?≤ 1 ? ln 2?or?p?≥ ln 2?or |m???np|?≤?min{p,?1???p} (except for the case when?p?=?? and?n?is odd). When?p?=?1/2 and?n?is odd, any number?m?in the interval ?(n???1)?≤?m?≤??(n?+?1) is a median of the binomial distribution. If?p?=?1/2 and?n?is even, then?m?=?n/2 is the unique median.The Poisson Probability DistributionThe Poisson Distribution was developed by the French mathematician Simeon Denis Poisson in 1837. The Poisson random variable?satisfies the following conditions:The?Poisson probability is: P(x; μ) = (e-μ) (μx) / x!where`x = 0, 1, 2, 3...``e = 2.71828` (but use your calculator's?e?button)`μ =` mean number of successes in the given time interval or region of spaceThe number of successes in two disjoint time intervals is independent.The probability of a success during a small time interval is proportional to the entire length of the time interval.Apart from disjoint time intervals, the Poisson random variable also applies to?disjoint regions of space.Applicationsthe number of deaths by horse kicking in the Pakistan army (first application)birth defects and genetic mutationsrare diseases (like Leukemia, but not AIDS because it is infectious and so not independent) - especially in legal casescar accidentstraffic flow and ideal gap distancenumber of typing errors on a pagehairs found in McDonald's hamburgersspread of an endangered animal in Africafailure of a machine in one monthThe?probability distribution of a Poisson random variable?X?representing the number of successes occurring in a given time interval or a specified region of space is given by the formula: Mean and Variance of Poisson DistributionIf?μ?is the average number of successes occurring in a given time interval or region in the Poisson distribution, then the mean and the variance of the Poisson distribution are both equal to?μ. E(X) =?μ and V(X) =?σ2?=?μ Note: In a Poisson distribution, only?one?parameter,?μ?is needed to determine the probability of an event.Example 1A life insurance salesman sells on the average `3` life insurance policies per week. Use Poisson's law to calculate the probability that in a given week he will sellSome policies`2` or more policies but less than `5` policies.Assuming that there are `5` working days per week, what is the probability that in a given day he will sell one policy?Example 2Number of flawsFrequency`0``4``1``3``2``5``3``2``4``4``5``1``6``1`Twenty sheets of aluminum alloy were examined for surface flaws. The frequency of the number of sheets with a given number of flaws per sheet was as follows:What is the probability of finding a sheet chosen at random which contains 3 or more surface flaws?Example 3If electricity power failures occur according to a Poisson distribution with an average of `3` failures every twenty weeks, calculate the probability that there will not be more than one failure during a particular week.Example 4Vehicles pass through a junction on a busy road at an average rate of `300` per hour.Find the probability that none passes in a given minute.What is the expected number passing in two minutes?Find the probability that this expected number actually pass through in a given two-minute period.A company makes electric motors. The probability an electric motor is defective is `0.01`. What is the probability that a sample of `300` electric motors will contain exactly `5` defective motors? ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download