Iowa State University



CHAPTER 6 Point Estimation

6.1 Introduction

Statistics may be defined as the science of using data to better understand specific traits of chosen variables. In Chapters 1 through 5 we have encountered many such traits. For example, in relation to a 2-D random variable [pic], classic traits include: [pic](marginal traits for X), [pic](marginal traits for Y), and[pic](joint traits for [pic] ). Notice that all of these traits are simply parameters. This includes even a trait such as [pic]. For each ordered pair [pic], the quantity [pic]is a scalar-valued parameter.

Definition 1.1 Let [pic]denote a parameter associated with a random variable, X, and let [pic]denote a collection of associated data collection variables. Then the estimator [pic]is said to be a point estimator of [pic].

Remark 1.1 The term ‘point’ refers to the fact that we obtain a single number as an estimate; as opposed to, say, an interval estimate.

Much of the material in this chapter will refer to material in the course textbook, so that the reader may go to that material for further clarification and examples.

6.2 The Central Limit Theorem (CLT) and Related Topics (mainly Ch.8 of the textbook)

Even though we have discussed the CLT on numerous occasions, because it plays such a central role in point estimation we present it again here.

Theorem 2.1 (Theorem 8.3 of the textbook) (Central Limit Theorem) Suppose that [pic]and that [pic], and suppose that [pic]. Define [pic]where [pic], with [pic]. Then [pic].

Remark 2.1 In relation to point estimation of [pic]for large n, This theorem states that [pic].

Remark 2.2 The authors state (p.269) that “In practice, this approximation [i.e. that [pic]] is used when [pic], regardless of the actual shape of the population sampled.” Now, while this is, indeed, a ‘rule of thumb’, that does not mean that it is always justified. There are two key issues that must be reckoned with, as will be illustrated in the following example.

Example 2.1. Suppose that [pic]with X ~ Bernoulli(p). Then [pic]and [pic]. In fact, [pic]. Specifically, [pic].

Case 1: n = 30 and p = 0.5.Clearly, the shape is that of a bell curve.

[pic]

Figure 2.1 The pdf of [pic]

Clearly, [pic]is a discrete random variable; whereas to claim that [pic]is to claim that it is a continuous random variable. To obtain this continuous pdf, it is only necessary to scale Figure 1 to have total area equal to one (i.e. divide each number by 1/30, and use a bar plot instead of a stem plot). The result is given in Figure 2.

[pic]

[pic]

Figure 2.2 Comparison of the staircase pdf and the normal pdf. The lower plot shows the error (green) associated with using the normal pdf to estimate the probability of the double arrow region.

Case 2: n = 30 and p = 0.1.

[pic]

[pic]

Figure 2.3 The pdf of [pic](TOP), and the continuous approximations (BOTTOM).

The problem here is that by using the normal approximation, we see that [pic]is significantly less than one! □

We now address some sampling distribution results given in Chapter 8 of the text.

Theorem 2.2 (Theorem 8.8 in the textbook) For [pic], [pic].

Proof: The proof of this theorem relies on two ‘facts’:

(i) For [pic], the random variable [pic].

(ii) For[pic], the random variable [pic].

Claim 2.1 For [pic].

The proof of the claim [pic] follows directly from the linearity of E(*):

[pic]

The proof of the claim [pic] requires a little more work.

Example 2.2 Suppose that [pic]. Obtain the pdf for the point estimator of [pic] given by [pic].

Solution: Write [pic].

Even though [pic], it is sometimes easier (and more insightful) to say that [pic]. In either case, we have

(i) [pic], and

(ii) [pic].

Finally, if n is sufficiently large that the CLT holds, then [pic]. □

Proof: [pic]. □

Theorem 2.3 (Theorem 8.11 of the textbook) Given [pic], suppose that [pic]is not known. Let [pic]and [pic]. Then (i) [pic]and [pic]are independent, and (ii) [pic]. The above should be: [pic] for the related result.

Proof: [This is one reason that a proof can be insightful; not simply a mathematical torture (]

[pic]

But [pic]. And so, we have:

[pic]. Dividing both sides by [pic] gives: [pic]where the rightmost equality follows from the independence result (not proved).

Hence, we have shown that [pic]. From this result, we have

[pic] and [pic]

Remark 2.3 A comparison of Theorems 2.2 and 2.3 reveals the effect of replacing the mean [pic]by its estimator [pic]; namely, that one loses one degree of freedom in relation to the (scaled) chi-squared distribution for the estimator of the variance [pic]. It is not uncommon that persons not well-versed in probability and statistics use [pic]in estimating the variance, even when the true mean is known! The price paid for this ignorance is that the variance estimator will not be as accurate.

Theorem 2.4 (Theorem 8.12 in the textbook) Suppose that [pic] and [pic] are independent. Define [pic]. Then the probability density function for T is:

[pic].

This pdf is called the (student) t distribution with v degrees of freedom and is denoted as tv.

From Theorems 2.3 and 2.4 we immediately have

Theorem 2.5 (Theorem 8.13 in the textbook) If [pic]and [pic]are estimates of [pic]and [pic], respectively, based on [pic], then

[pic].

Explain:

Theorem 2.6 (Theorem 8.14 in the textbook) Suppose that [pic]and [pic]are independent. Then [pic]has an F distribution, and is denoted as[pic].

As an immediate consequence, we have

Theorem 2.7 (Theorem 8.15 in the textbook) Suppose that [pic]are sample variances of [pic], obtained using [pic] and [pic], respectively, and that the elements in each sample are independent, and that the samples are also independent. Then

[pic]has an F distribution with [pic] and [pic]degrees of freedom.

Remark 2.4 It should be clear that this statistic plays a major role in determining whether two variances are equal or not.

Q: Suppose that [pic] is large enough to invoke the CLT. How could you use this to arrive at another test statistic for deciding whether two variances are equal? A: Use their DIFFERENCE

6.3 A Summary of Useful Point Estimation Results

The majority of the results of the last section pertained to the parameters [pic] and [pic]. For this reason, the following summary will address each one of these parameters, individually, as the parameter of primary concern.

Results when µ is of Primary Concern- Suppose that [pic], and that [pic]are iid data collection variables that are to be used to investigate the unknown parameter [pic]. For such an investigation, we will use [pic].

Case 1: [pic]

(Remark 2.1): [pic]for any n, when [pic]is known.

(Theorem 2.5): [pic] for any n, when [pic]is estimated by [pic]

Case 2: [pic]

For n sufficiently large (i.e. the CLT is a good approximation), then:

(Remark 2.1): [pic] when [pic]is known.

(Theorem 2.5): [pic] when [pic]is estimated by [pic]

Results when σ2 is of Primary Concern- Suppose that [pic], and that [pic]are iid data collection variables that are to be used to investigate the unknown parameter [pic]. For such an investigation, we will use [pic], when appropriate.

Case 1: [pic]

(Example 2.2): [pic], when [pic]is known and [pic]

(Theorem 2.3): [pic] for any n, when [pic]is estimated by [pic] (Theorem 2.7): [pic]when [pic].

(Theorem 2.7): [pic]when [pic].

Case 2: [pic]

For n sufficiently large (i.e. the CLT is a good approximation), then:

(Example 2.2): [pic]

Application of the above results to (X,Y)-

For a 2-D random variable, [pic], suppose that they are independent. Then we can define the random variable [pic]. Regardless of the independence assumption, we have [pic]. In view of that assumption, we have [pic]. Hence, we are now in the setting of a 1-D random variable, W, and so many of the above results apply. Even if we drop the independence assumption, we can still apply them. The only difference is that in this situation [pic]. Hence, we need to realize that a larger value of n may be needed to achieve an acceptable level of uncertainty in relation to, for example. [pic]. Specifically, if we assume the data collection variables [pic]are iid, then

[pic].

The perhaps more interesting situation is where are concerned with either [pic] or [pic].

Of these two parameters, [pic] is the more tractable one to address using the above results. To see this, consider the estimator:

[pic].

Define the random variable [pic]. Then we have

[pic].

And so, we are now in the setting where we are concerned with [pic]. In relation to [pic], we have:

[pic].

We also have [pic]. The trick now is to obtain an expression for [pic]. To this end, we begin with:

[pic].

This can be written as:

[pic].

And so, we are led to ask the

Question: Suppose that [pic] and [pic]have [pic]. Then what is the variance of [pic]?

Answer: As noted above, the mean of W is [pic]. Before getting too involved in the mathematics, let’s run some simulations (: The following plot is for [pic]:

[pic]

Figure 3.1 Plot of the simulation-based estimate of [pic] for [pic]. The simulations also resulted in [pic] and [pic].

The Matlab code is given below.

% PROGRAM name: z1z2.m

% This code uses simulations to investigate the pdf of W = Z1*Z2

% where Z1 & Z2 are unit normal r.v.s with cov= r;

nsim = 10000;

r=0.5;

mu = [0 0]; Sigma = [1 r; r 1];

Z = mvnrnd(mu, Sigma, nsim);

plot(Z(:,1),Z(:,2),'.');

pause

W = Z(:,1).*Z(:,2);

Wmax = max(W); Wmin=min(W);

db = (Wmax - Wmin)/50;

bctr = Wmin + db/2:db:Wmax-db/2;

fw = hist(W,bctr);

fw = (nsim*db)^-1 * fw;

bar(bctr,fw)

title('Simulation-based pdf for W with r = 0.5')

Conclusion: The simulations revealed a number of interesting things. For one, the mean was [pic]. One would have thought that for 10,000 simulations it would have been closer to the true mean 0.5. Also, and not unrelated to this, is the fact that [pic]has very long tails. Based on this simple simulation-based analysis, one might be thankful that the mathematical pursuit was not readily undertaken, as it would appear that it will be a formidable undertaking!

QUESTION: How can knowledge of [pic]be used in relation to estimating ρXY from [pic]?

Computation of fXY(w):

Method 1: Use THEOREM 7.2 on p.248 of the book. [A nice example of its application (]

Let [pic]be the joint pdf of the 2-D continuous random variable [pic]. Let [pic]and [pic]. Then [pic] and [pic]. And so, the Jacobian is:

[pic].

Hence, the pdf for [pic] is:

[pic].

We now apply this to: [pic] [see DEFINITION 6.8 on p.220.]

[pic].

Hence,

[pic] (***)

I will proceed with a possibly useful integral, but this will not, in itself, give a closed form for (***). [Table of Integrals, Series, and Products by Gradshteyn & Ryzhik p.307 #3.325]

[pic].

Method 2 (use characteristic functions):

Assume that X and Y are two independent standard normal random variables and let us compute the characteristic function of XY=W.

One knows that [pic]. Hence, [pic]. And so:

[pic].

Hence,

[pic]

where (G&R p.419 entry 3.7542) is:

[pic] p.952, 8.4071

where [pic]is a Bessel function of the third kind, also called a Hankel function. Yuck!!!

For my own possible benefit: The OP mentions the characteristic function of the product of two independent brownian motions, say the processes (Xt)t and (Yt)t. The above yields the distribution of Z1=X1Y1 and, by homogeneity, of Zt=XtYt which is distributed like tZ1 for every t⩾0. However this does not determine the distribution of the process (Zt)t. For example, to compute the distribution of (Zt,Zs) for 0⩽t⩽s, one could write the increment Zs−Zt as Zs−Zt=Xt(Ys−Yt)+Yt(Xs−Xt)+(Xs−Xt)(Ys−Yt) but the terms Xt(Ys−Yt) and Yt(Xs−Xt) show that Zs−Zt is not independent on Zt and that (Zt)t is probably not Markov. [NOTE: Take X and Y two Gaussian random variables with mean 0 and variance 1. Since they have the same variance, X−Y and X+Y are independent.]□

6.4 Simple Hypothesis Testing

Example 4.1 (Book 8.78 on p.293): A random sample of size 25 from a normal population had mean value 47 and standard deviation 7. If we base our decision on Theorem 2.5 (t-distribution), can we say that this information supports the claim that the population mean is 42?

Solution: While not formally stated as such, this is a problem in hypothesis testing. Specifically,

[pic] versus [pic].

The natural rationale for deciding which hypothesis to announce is simple:

If [pic]is ‘sufficiently close’ to42, then we will announce H0.

Regardless of whether or not H0 is true, we know that [pic].

Assuming that, indeed, H0 is true, (i.e.[pic] ), then [pic]. A plot of [pic]is shown below.

[pic]

Figure 4.1 A plot of [pic] for [pic].

Based on our above rationale for announcing [pic] versus [pic], if a measurement of this T is sufficiently large, we should announce [pic], even if [pic]is, in fact, true. Suppose that we were to use a threshold value [pic]. Then our decision rule is:

[pic] = The event that we will announce [pic]. Or, equivalently:

[pic] = The event that we will announce [pic].

If, in fact, [pic]is true, then our false alarm probability is:

[pic]where [pic].

There are a number of ways to proceed from here.

Case 1: Specify the false alarm probability, [pic], that we are willing to risk. Suppose, for example, that we choose [pic](i.e. we are willing to risk wrongly announcing [pic] with a 5% probability.) Then

[pic]where [pic] results in a threshold [pic]tinv(.975,24) = 2.064.

From the above data, we have [pic]. Since this value is greater than 2.064 then we should announce [pic].

Case 2: Use the data to first compute [pic], and then use this as our threshold, [pic]. Then because our value was [pic], this threshold has been met, and we should announce [pic]. In this case, the probability that we are wrong becomes:

[pic]2*tcdf(-3.53,24) = 0.0017.

QUESTION: Are we willing to risk announcing [pic] with the probability of being wrong equal to .0017?

ANSWER: If we were willing to risk a 5% false alarm probability, then surely we would risk an even lower probability of being wrong in announcing [pic]. Hence, we should announce [pic].

Comment: In both cases we ended up announcing [pic]. So, what does it matter which case we abide by? The answer, with a little thought, should be obvious. In case 1 we are announcing [pic] with a 5% probability of being wrong; whereas in case 2 we are announcing [pic] with a 0.17% probability of being wrong. In other words, we were willing to take much less risk, and even then we announced [pic]. □

The false alarm probability obtained in case 2 of the above example has a name. It is called the p-value of the test. It is the smallest false alarm probability that we can achieve in using the given data to announce [pic].

As far as the problem goes, as stated in the textbook, we are finished. However, we have ignored the second type of error that we could make; namely, announcing [pic] true, when, in fact, [pic] is true. And there is a reason that this error (called the type-2 error) is often ignored. It is because in the situation where [pic] is true, we have a numerical value for [pic](in this case, [pic]). However, if [pic] is true, then all we know is that [pic]. And so, to compute a type-2 error probability, we need to specify a value for [pic].

Suppose, for example, that we assume [pic]. Suppose further that we maintain the threshold [pic]. Now we have [pic] , while our chosen test statistics remains [pic]. And so, the question becomes: How does the random variable [pic] relate to the random variable [pic]? To answer this question, write:

[pic] (4.1)

where [pic] is a random variable; and not a pleasant one, at that! To see why this point is important, let’s presume for the moment that we actually know [pic]. In this case, we would have

[pic]. Hence, [pic]. In words, [pic]is a normal random variable with mean [pic] and with variance equal to one. Using the above data, assume [pic] . Then [pic]. This pdf is shown below.

[pic]

Figure 4.2 The pdf for [pic].

Now let’s return to the above two cases.

Case 1 (continued): Our decision rule for announcing [pic] with a 5% false alarm probability relied on the event [pic]. This event is shown as the green double arrow in Figure 4.2. The probability of this event is computed as

[pic]normcdf(2.064,2.143,1) – normcdf(-2.064,2.143,1) = 0.47.

Hence, if, indeed, the true mean is [pic], then under the assumption that the CLT is valid, we have a 47% chance of announcing that [pic]using this decision rule.

Case 2 (continued): Our decision rule for announcing [pic] with false alarm probability 0.17% relied on the event [pic]. This event is shown as the red double arrow in Figure 4.2. The probability of this event is computed as

[pic]normcdf(3.53,2.143,1) – normcdf(-3.53,2.143,1) = 0.92.

Hence, if, indeed, the true mean is [pic], then under the assumption that the CLT is valid, we have a 92% chance of announcing that [pic]using this decision rule.

In conclusion, we see that the smaller that we make the false alarm probability, the greater the type-2 error will be. Taken to the extreme, suppose that no matter what the data says, we will simply not announce [pic]. Then, of course our false alarm probability is zero, since we will never sound that alarm. However, by never sounding the alarm, we are guaranteed that with probability one, we will announce [pic] when [pic] is true.

A reasonable question to ask is: How can one select acceptable false alarm and type-2 error probabilities. One answer is given by the Neymann Pearson Lemma. We will not go into this lemma in these notes. The interested reader might refer to:



Finally, we are in a position to return to the problem entailed in (3.1), where we see that we are not subtracting a number (which would merely shift the mean), but we are subtracting a random variable. Hence, one cannot simply say that (3.1) is a mean-shifted [pic] random variable. We will leave it to the interested reader to pursue this further. One simple approach to begin with would use simulations to arrive at the pdf for (3.1). □

Example 4.2 (Problem 8.79 on p.293 of the textbook) A random sample of size n=12 from a normal population has (sample) mean [pic] and (sample) variance [pic]. If we base our decision on the statistic of Theorem 8.13 (i.e. Theorem 2.5 of these notes), can we say that the given information supports the claim that the mean of the population is [pic]?

Solution: The statistic to be used is [pic]. And so, the corresponding measurement of T obtained from the data is [pic]. Since the sample mean [pic] is less than the speculated true mean [pic] under [pic], a reasonable hypothesis setting would ask the question: Is 27.8 really that much smaller than 28.5. This leads to the decision rule: [pic].

In the event that, indeed, [pic], [pic]. In words, there is a 10% chance that a measurement of T would be contained in the interval [pic]if, indeed, [pic].

And so now, one needs to decide whether or not one should announce [pic]. One the one hand, one could announce [pic] with a reported p-value of 0.10 and let management make the final announcement. On the other hand, if management has, for example, already ‘laid down the rule’ that it will not accept a false alarm probability greater than, say, 0.05, then one should announce [pic]. □

We now proceed to some examples concerning tests for an unknown variance.

Example 4.3 (EXAMPLE 13.6 on p.410 of the textbook) Let X denote the thickness of a part used in a semiconductor. Using the iid data collection variables [pic]it was found that [pic]. The manufacturing process is considered to be in control if [pic]. Assuming X is normally distributed, test the null hypothesis [pic]against the alternative hypothesis [pic] at a 5% significance level (i.e. a 5% false alarm probability).

Solution: From Theorem 2.3 we have [pic]. From the data, assuming [pic]is true, we have [pic]. The threshold value, [pic]is obtained from [pic].

Hence, [pic]= chi2inv(.95,17) = 27.587. Hence, we will announce [pic].

Now let’s obtain the p-value for this test. If we use [pic], Then [pic]is the p-value.

Next, suppose that instead of using the sample mean (which is, by the way, not reported) one had used the design mean thickness. Then we have [pic]. In this case, [pic]is the p-value. And so, at a 5% significance level, we would still announce [pic]. □

Example 4.4 Suppose that X = The act of recording the number of false statements of a certain news network in any given day. We will assume that X has a Poisson distribution, with mean [pic]. Let [pic]be the number of false statements throughout any 5-day work week. In order to determine whether or not the network has decided to increase the number of false statements, we will test

[pic] versus [pic], where [pic]is the average number of false statements in any given week. We desire a false alarm probability of ~10%.

QUESTION: How should we proceed?

ANSWER: Simulations!

%PROGRAM NAME: avgdefects.m

nsim = 40000;

n = 5;

mu = 2;

x=poissrnd(mu,n,nsim);

muhat = mean(x);

db = 1/n;

bctr = 0:db:5;

fmuhat = nsim^-1 *hist(muhat,bctr);

figure(1)

stem(bctr,fmuhat)

hold on

pause

for m = 1:10

x=poissrnd(mu,n,nsim);

muhat = mean(x);

db = 1/n;

bctr = 0:db:5;

fmuhat = nsim^-1 *hist(muhat,bctr);

stem(bctr,fmuhat)

end

6.5 Confidence Intervals

The concept of a confidence interval for an unknown parameter, θ, is intimately connected to the concept of simple hypothesis testing. To see this, consider the following example.

Example 5.1 Let X = The act of recording the body temperature of a person who is in deep meditation. We will assume that [pic]. The data collection random variables [pic] are to be used to investigate the unknown parameter [pic]. We will assume that [pic]is known and that as sample size n=25 will be used. The estimator [pic]will be used in the investigation.

Hypothesis Testing:

In the hypothesis testing portion of the investigation we begin by assuming that, under [pic], the parameter [pic]is a known quantity. Let’s assume that it is 98.6oF. In other words, we are assuming that deep meditation has no influence on [pic]. Since we are investigating whether or not meditation has an effect on [pic] in any way, then out test, formally is:

[pic] versus [pic].

Our decision rule for this test is, in words: If [pic]is sufficiently close to [pic], then we will announce [pic]. The event, call it A, that is described by these words then has the form:

[pic].

Here, the number [pic]is the threshold temperature difference . Suppose that we require that, in the event that [pic], we have

[pic]. (5.1)

In other words, we require a false alarm probability (or significance level) of size 0.10.

In the last section, we used the standard test statistic: [pic]. Then, assuming that [pic]is, indeed, true, it follows that [pic]. To convert (5.1) into an event involving this Z, we use the concept of equivalent events:

[pic]. (5.2)

From (5.2) we obtain

[pic]norminv(0.95 , 0 , 1) = 1.645. (5.3)

Hence, if the magnitude of

[pic] exceeds 1.645, then we will announce [pic].

Now, instead of going through the procedure of using the standard test statistic, Z, let’s use (5.1) more directly. Specifically,

[pic]. (5.4)

Because [pic], from (5.3) we obtain

[pic]norminv(0.95 , 98.6 , 0.21) = 98.945oF. (5.5)

Hence, there was no need to use the standard statistic! Furthermore, the threshold (5.5) is in units of oF, making it much more attractive that the dimensionless units of (5.3).

QUESTION: Given that using (5.1) to arrive at (5.5) is much more direct than using (5.2) and (5.3), then why in the world did we not use the more direct method in the last section?

ANSWER: The method used in the last section is a method that has been used for well over 100 years. It was the only method that one could use when only a unit-normal z-table was available. Only in very recent years has statistical software been designed that precludes the use of this table. However, there are random variables other than the normal type for which a standardized table must be used. One example is [pic]. A linear function of T is NOT a T random variable; whereas a linear function of a normal random variable IS a normal random variable.

Confidence Interval Estimation:

We begin here with the same event that we began hypothesis testing with; namely

[pic]

We then form the same probability that we formed in hypothesis testing; namely

[pic].

A close inspection of equations (5.1) and (5.6) will reveal a slight difference. Specifically, in (5.1) we have the parameter [pic] , which is the known value of [pic](=98.6oF) under [pic]. In contrast, the parameter [pic]in (5.6) is not assumed to be known. The goal of this investigation is to arrive at a [pic] 2-sided confidence interval for this unknown [pic]. From (5.2) we have:

[pic].

Finally, using the concept of equivalent events, we have

[pic]. (5.6)

Notice that the event A has been manipulated an equivalent event that is

[pic] (5.7)

It is the interval (5.7) that is called the 90% 2-sided Confidence Interval (CI) for the unknown mean [pic].

Remark 5.1 The interval (5.7) is an interval whose endpoints are random variables. In other words, the CI described by (5.7) is a random endpoint interval. It is NOT a numerical interval. Even though (5.7) is a random endpoint interval, the interval width is 0.69; that is, it is not random.

To complete this example, suppose that after collecting the data, we arrive at [pic]. Inserting this number into (5.7) gives the numerical interval [96.855 , 97.545] . The interval is not a CI. Rather it is an estimate of the CI. Furthermore, one cannot say that [pic]. Why? Because even though [pic]is not known, it is a number; not a random variable. Hence, this equation makes no sense. Either the number [pic] is in the interval [96.855 , 97.545] or it isn’t. Period! For example, what would you say to someone who asked you the probability that the number 2 is in the interval [1 , 3]? Hopefully, you would be kind in your response to such a person, when saying “Um… the number 2 is in the interval [1 , 3] with probability one.” In summary, a 90% two-sided CI estimate should not be taken to mean that the probability that [pic] is in that interval is 90%. Rather, it should be viewed as one measurement of a random endpoint interval, such that, were repeated measurements to be conducted again and again, then overall, one should expect that 90% of those numerical intervals will include the unknown parameter [pic]. □

A General Procedure for Constructing a CI:

Let [pic]be an unknown parameter for which a [pic] 2-sided CI is desired, and let [pic]be the associated point estimator of [pic].

Step 1: Identify a standard random variable [pic]that is a function of both [pic] and [pic]; say, [pic].

Step 2: Write: [pic]. Then, from this, find the numerical values of [pic] and [pic].

Step 3: Use the concept of equivalent events to arrive at an expression of the form:

[pic].

The desired CI is then the random endpoint interval [pic]. □

The above procedure is now demonstrated in a number of simple examples.

Example 5.2 Arrive at a 98% two-sided CI estimate for [pic], based on [pic], [pic]and n=10. Assume that the data collection variables [pic] are iid [pic].

Solution: Even though the sample mean [pic], we do not have knowledge of [pic], and so will need to use [pic].Since n=10 is too small to invoke the CLT, we will use the test statistic

Step 1: [pic].

Step 2: [pic]tinv(.99,9) = 2.82 and t1 = -2.82.

Step 3: [pic]. Hence, the

desired CI is the random endpoint interval [pic]. Using the above data, we arrive at the CI estimate: [pic] . □

Example 5.3 In an effort to estimate the variability of the temperature, T, of a pulsar, physicist used n=100 random measurements. The data gave the following statistics:

[pic] and [pic].

Obtain a 95% 2-sided CI estimate for [pic].

Solution: We will begin by using the point estimator of [pic]: [pic]where [pic].

Step 1: The most common test statistic in relation to variance is [pic] (c.f. Theorem 2.7).

Step 2: [pic] gives [pic]chi2inv(.025,99) = 73.36

and [pic]chi2inv(.975,99) = 128.42 .

Step 3: [pic].

Hence, the 95% 2-sided CI is the random endpoint interval [pic]. From this and the above data, we obtain the CI estimate:

[pic].

Now, because [pic]is an average of iid random variables, and because n=100 would seem to be large enough to invoke the CLT, repeat the above solution with this knowledge.

Solution:

[pic] has mean [pic] and variance [pic]. Writing [pic], then it follows that

[pic]

and

[pic].

And so, from the CLT, we have

Step1: [pic].

Step 2: [pic] and [pic].

Step 3:

[pic]

Hence, the desired CI is the random endpoint interval

[pic].

And so the CI estimate is: [pic].

This interval is almost exactly equal to the interval obtained above. (

Example 5.4 Suppose that it is desired to investigate the influence of the addition of background music in drawing customers into your new café during the evening hours. To this end, you will observe pedestrians who walk past your café while no music is playing, and while music is playing. Let X = The act of noting whether a pedestrian enters (1) or doesn’t enter (0) your café when no music is playing. Let Y = The act of noting whether a pedestrian enters (1) or doesn’t enter (0) your café when music is playing.

Suppose that you obtained the following data-based results: [pic]for sample size [pic], and [pic]for sample size [pic]. Obtain a 95% 1-sided (upper) CI for [pic].

Solution: We will assume that the actions of the pedestrians are mutually independent. Then we have

[pic]is independent of [pic].

Assumption: For a first analysis, we will assume that the CLT holds in relation to both [pic]and [pic]. This would seem to be reasonable, based on the above p-value estimates and the chart included in Homework #6, Problem 1.

Then [pic] and [pic]. Hence

[pic]

where we defined [pic]. The standard approach [See Theorem 11.8 on p.365 of the text] is to use:

[pic].

Then [pic], and so [pic], and with [pic].

Hence, we have the following equivalent events:

[pic].

Recall that the smallest possible value for [pic] is -1. Hence, the CI estimate is: [-1 , .05+.105] = [-1 , .155] .

Conclusion: Were you to repeat this experiment many times, you would expect that 95% of the time, the increase in clientele due to music would be no more that ~15%.

Investigation of [pic] and its influence on the CI [pic]-

Note: The primary intent of this section is to illustrate that: (i) the CI is a random variable, (i.e. the obtained CI estimate is simply one measurement of this variable), and (ii) the CLT assumption depends upon the size of the events being considered in relation to the size of the increments of the actual sample space for the variable. This section can be skipped with no loss of basic understanding of the above. This material could serve as the basis of a project, as it demonstrates a number of major concepts that could be elaborated upon (e.g. sample spaces, events, estimators, simulation-based pdf’s of more complicated estimators.)

In the above analysis we used estimates of the unknown parameters [pic]and [pic] to arrive at an estimate of the standard deviation [pic]. The value used, .064, is a single measurement of the random variable [pic]. In this section we will investigate the properties of this random variable using simulations. Specifically, we will use [pic]and [pic], and 10,000 simulations of [pic]. The resulting simulation-based estimate of the pdf [pic] is shown in Figure 5.1.

[pic]

Figure 5.1 Simulation-based (nsim = 10,000) estimate of [pic].

In relation to this figure, we have [pic]; which is the true value of [pic]. We also have [pic]. From Figure 5.1, however, we see that [pic]is notably skewed. Hence, this should be noted if one chooses to use [pic] as a measure of the level of uncertainty of [pic].

We are now proceed to investigate the random variable that is the upper random endpoint of the CI [pic]. To this end, we begin by addressing the probability structure of [pic]. Because [pic]is the difference between two averages, and because it would appear that [pic]and [pic]are sufficiently large that the CLT holds; that is, [pic]should be expected to have [pic] that is bell-shaped (i.e. normal). In any case, we have:

[pic] and [pic]

The simulations that resulted in Figure 5.1 also resulted in Figure 5.2 below.

[pic]

Figure 5.2 Simulation-based (nsim = 10,000) estimate of [pic] using a 10-bin scaled histogram.

Now, based on Figure 5.2, one could reasonably argue that the pdf, [pic], has a normal distribution- RIGHT?

Well? Let’s use a 50-bin histogram instead of a 10-bin histogram, and see what we get. We are, after all, using 10,000 simulations. And so we can afford finer resolution. The result is shown in Figure 5.3.

[pic]

Figure 5.3 Simulation-based (nsim = 10,000) estimate of [pic] using a 50-bin scaled histogram.

There are two peaks that clearly stand out in relation to an assumed bell curve shape. Furthermore, they real, in the sense that they appear for repeated simulations!

Conclusion 1: If one is interested in events that are much larger than the bin width resolution, which in Figure 5.3 is ~0.011, then the normal approximation in Figure 5.2 may be acceptable. However, if one is interested in events on the order of 0.011, then Figure 5.3 suggests otherwise.

The fact of the matter is that [pic]is discrete random variable. To obtain its sample space, write

[pic].

The smallest element of [pic]is -1. The event {-1} is equivalent to[pic].

The next smallest event is {-0.99}. This event is equivalent to [pic].

Clearly, the explicit computation of such equivalent events will become more cumbersome as we proceed. As an alternative to this cumbersome approach, we will, instead, simply use a bin width that is much smaller than 0.01 in computing the estimate of [pic]. Using not 50, but 500 bins resulted in Figure 5.4 below.

[pic]

[pic]

Figure 5.4 Simulation-based (nsim = 10,000) estimate of [pic] using a 500-bin scaled histogram. The lower plot includes the zoomed region near 0.08.

What we see from Figure 5.4 is that the elements of [pic]are not spaced at uniform 0.01 increments throughout. At the value [pic]we see that the spacing is much smaller than 0.01. Hence, it is very likely that the largest peak in Figure 5.3 is due to this unequal spacing in [pic].

Now that we have an idea as to the structures of the marginal pdf’s [pic] and [pic], we need to investigate their joint probability structure. To this end, the place to begin is with a scatter plot of [pic] versus [pic]. This is shown in Figure 5.6.

[pic]

Figure 5.6 A scatter plot of [pic] versus [pic] for nsim =10,000 simulations.

Clearly, Figure 5.6 suggests that [pic] and [pic] are correlated. We obtained [pic]and [pic]].

Finally, since [pic], we have:

(i) [pic],

and

(ii) [pic]

or, [pic]

The pdf for assumed [pic]is overlaid against the simulation-based pdf in the figure below.

[pic]

Figure 5.7 Overlaid plots of the pdf of [pic] and the corresponding simulation-based pdf, [pic], for [pic].

Comment: Now, the truth of the matter in the chosen setting (the value of simulations- you know the truth. (!) is that [pic]. While this increase due to playing music may seem minor, it is, in fact, an increase of .05/.13, or 38% in likely customer business! This is significant by any business standards. But is the (upper) random endpoint interval[pic], in fact, a 95% CI estimator? Using the normal approximation in Figure 5.7, the probability to the left of the dashed line is: [pic]norminv(.05,.15,.07) = 0.035. And so, we actually have a 96.5% CI estimator. While this is only 1.5% larger than a 95% CI, it is an increase in the level of confidence.

Summary & Conclusion This example to illustrate how one typically accommodates the unknown standard deviation associated with the estimated difference between two proportions. Because this standard deviation is a function of the unknown proportions, one typically uses the numerical estimates of the same to arrive at a numerical value for it. Hence, the estimator of the standard deviation is a random variable. Rather than using a direct mathematical approach to investigating its influence on the CI, it was easier to resort to simulations. In the course of this simulation-based study, it was discovered that the actual pdf of [pic]can definitely NOT be assumed to be normal if one is interested in event sizes on the order of 0.01. It was also found that [pic] and [pic] are correlated; as opposed to the case of normal random variables, where they are independent. For larger event sizes, it was found that the random variable is well-approximated by a normal pdf. It was also found that it corresponds not to a 95% CI but to a 96.5% CI. A one-sided CI was chosen simply to illustrate contrast to prior examples that involved 2-sided CI’s. It would be interesting to re-visit this example not only with other types of CI’s but also in relation to hypothesis testing. Finally, to tie this example to hypothesis testing, consider the hypothesis test:

[pic] versus [pic].

Furthermore, suppose that we choose a test threshold [pic]. Then if, in fact, [pic], the type-2 error is 0.5 (see Figure 5.4).[Note: The Matlab code used in relation to this example is included in the Appendix.] □

6.6 Summary

This chapter is concerned with estimating parameters associated with a random variable. Section 6.2 included a number of random variables associated with this estimation problem. Many of these random variables (in this context they are also commonly referred to as statistics) relied on the CLT. Section 6.3 addressed some commonly estimated parameters and the probabilistic structure of the corresponding estimators. The results in sections 6.2 and 6.3 were brought to bear on the topics of hypothesis testing and confidence interval estimation in sections 6.4 and 6.5, respectively. In both topics the distribution of the estimator plays the central role. The only real difference between the two topics is that in confidence interval estimation the parameter of interest is unknown, whereas in hypothesis testing it is known under H0.

Appendix Matlab code used in relation to Example 5.4

%PROGRAM NAME: example54.m

px=0.13; nx = 100;

py=0.18; ny = 50;

nsim = 10000;

x = ceil(rand(nx,nsim) - (1-px));

y = ceil(rand(ny,nsim) - (1-py));

mx = mean(x);

my = mean(y);

thetahat = my - mx;

%vx = nx^-1 *mx.*(1-mx);

%vy = ny^-1 *my.*(1-my);

vx = var(x)/nx;

vy = var(y)/ny;

vth = vx + vy;

sth = vth.^0.5;

msth = mean(sth)

ssth = std(sth)

vsth = ssth^2;

figure(1)

db = (.09 - .03)/50;

bvec = .03+db/2 : db : .09-db/2;

fsth = (nsim*db)^-1 *hist(sth,bvec);

bar(bvec,fsth)

title('Simulation-Based Estimate of the pdf of std(thetahat)')

grid

pause

figure(2)

db = (.3 +.25)/500;

bvec = -.25+db/2 : db : .3-db/2;

fth = (nsim*db)^-1 *hist(thetahat,bvec);

bar(bvec,fth)

title('Simulation-Based Estimate of the pdf of thetahat')

grid

mthetahat = mean(thetahat)

sthetahat = std(thetahat)

vthetahat = sthetahat^2

varthetahat = px*(1-px)/nx + py*(1-py)/ny

pause

figure(3)

plot(thetahat,sth,'*')

title('Scatter Plot of thetahat vs std(thetahat)')

grid

pause

%Compute mean and std dev of CI

cmat = cov(thetahat,sth);

cov_th_sth = cmat(1,2)

mCI = mthetahat + 1.645*msth

vCI = vthetahat+1.645^2*vsth + 2*1.645*cov_th_sth;

sCI = vCI^.5

pause

xx = -.1:.0001:.4;

fn = normpdf(xx,mCI,sCI);

CIsim = thetahat +1.645*sth;

db = (.4+.1)/50;

bvec = -.1+db/2:db:.4-db/2;

fCIsim = (db*nsim)^-1 *hist(CIsim,bvec);

figure(4)

bar(bvec,fCIsim)

hold on

plot(xx,fn,'r','LineWidth',2)

grid

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download