jumps between the ‘steps’ of the cdf. For example, the ...

Joel Anderson

ST 371-002

Lecture Summary for 2/15/2011

Homework 0

First, the definition of a probability mass function p(x) and a cumulative distribution function

F(x) is reviewed:

Graphically, the drawings of a pmf and a cdf (regarding discrete random variables) are similar to

histograms and step functions. The pmf consists of several lines drawn at each point with the

height scaled to match the relative probability of that event (similar to a histogram). A cdf will

start at zero and increase each time it hits an event on the x axis by a set amount, and then remain

constant until it hits another event, creating a step-like feel. For examples of the two functions,

see Figure 3.3 on page 93 and Figure 3.6 on page 97 of the Devore textbook.

We now turn to finding different characteristics of probability sets that are characterized by

pmf��s and/or cdf��s. First, to find the median of such a set use the cdf to find where F(x p) = ?,

where xp is the median. If an exact value cannot be found, then it must be estimated through

interpolation. An example is shown below:

To find the median, find the 40th quartile and the 70th quartile (which are easily found at y=1 and

y=2, respectively). Then we interpolate:

If a probability mass function is not known, it can be derived from the cdf by measuring the

jumps between the ��steps�� of the cdf. For example, the above function would yield the following

pmf, written as a table for continuous use:

0

1

2

3

4

5

6

y

0

0.4

0.3

0.2

0.1

0

0

p(y)

We can also check our resulting pmf by making sure that all of the values add to 1:

Using this method, we can go back to a pmf from a cdf, or rederive a cdf using integration. In

short, having one of the two makes it possible to get the other.

Next, we look at the expected value of a given pmf, which is symbolized by E(X) or ?x

(remember that greek letters correspond to the actual population, while capital letters correspond

to measurements regarding the sample). The expected value of a probability function can be

thought of as the center of mass or balancing point of the function: what is the point that all of

the probabilities are closest to (the average, in essence). We calculate it with the following

formula:

As an example, we can calculate the expected value of the pmf used earlier:

y

p(y)

y*p(y)

1

0.4

0.4

2

0.3

0.6

3

0.2

0.6

4

0.1

0.4

For more examples of this calculation or pictures showing the balancing point analogy, see the

section in the Devore textbook starting at page 103.

The variance of a probability function carries the same meaning as it does for other data sets,

however it can be calculated using a shortcut formula that uses the expected value of the

function. This makes calculations of the variance much easier.

We can derive the shortcut formula as shown below:

To demonstrate that using these two formulas gives the same result, we calculate the variance of

the previous function:

1

2

3

4

y

0.4

0.3

0.2

0.1

p(y)

0.4

1.2

1.8

1.6

y2*p(y)

2

Variance using shortcut method: 0.4 + 1.2 + 1.8 + 1.6 �C 2 = 5 �C 4 = 1

1

2

y

-1

0

y-?

2

1

0

(y-?)

2

0.4

0

(y-?) p(x)

Variance using original method: 0.4 + 0 + 0.2 + 0.4 = 1

3

1

1

0.2

4

2

4

0.4

We see that the two methods yield the same result, but that the shortcut significantly decreases

the amount of work involved (and in turn decreases the opportunity for error in calculation).

Next, we look at what happens to the values when the function is changed in a linear fashion (i.e.

X becomes aX+b). For example, suppose that the function y represented an amount of skittles

each child has in his or her possession. Next, all children are given one skittle, or perhaps each

one is given as many skittles as they have again (doubling their skittles). Is there an easy way to

find the expected value and variance of the new data set without calculating them afresh?

Of course there is, or I wouldn��t be discussing this.

2

This seems too simple to be true. Why would teachers allow crib sheets on tests if formulas were

ever this easy to remember? We can prove that this transformation is correct using the following

derivation:

Unfortunately, the transformations for the variance and standard deviation are not quite as

simple. They are shown below.

Note that b has no effect on the end result. This is because each of the values will still vary from

the expected value (which has shifted by b units, remember) from the same amount that they did

before, meaning that they will not appear in the variance or standard deviation.

To illustrate the concepts of the cdf further, number 23 on page 99 of the Devore text was done

in class. The problem as well as the solutions may be found in the attached handout passed out at

the end of class.

The final topic covered in this lecture is the Bernoulli variable, as well as the binomial variable

formula that typically accompanies this class of variables. A Bernoulli variable is one that may

only take one of two values: success or failure, 0 or 1, pass or fail, etc... The Bernoulli variable

may be used in a the binomial formula when given the probability of success (however success

may be defined is left to the user) is given as p. Bernoulli variables have special properties

regarding variance and the expected value:

And, in general:

With n being the number of trials the variable is given, we may calculate the probability of

having x successes using this formula:

This formula comes from the fact that you can model n trials as a tree with n levels and 2n

terminal nodes. To find the probability of having x number of successes, we label each edge with

the respective probability, find the probability of each path and add them together. That is what

this formula does. For more example problems, see numbers 27, 37, 39, 46, 47, and 71 in

Chapter 3 of the Devore text.

3

4

5

................
................

In order to avoid copyright disputes, this page is only a partial summary.

To fulfill the demand for quickly locating and searching documents.

It is intelligent file search solution for home and business.

Literature Lottery

To fulfill the demand for quickly locating and searching documents.

Related download

Related searches