STAT 101, Module 3: Numerical Summaries for



STAT 101, Module 9:

Statistical Inference for Mean Comparisons

(Book: chapter 11)

Motivations:

• In the previous two modules we learnt

1. how to construct confidence intervals for population means and population proportions (probabilities), and

2. how to test simple null hypotheses about population means and population proportions.

Now we will learn about confidence intervals and hypothesis testing of differences of means. In the final Module 10, we will do the same with slopes in simple linear regression.

• Examples:

o Is plant manager A more efficient than plant manager B in running the manufacture of the company’s product?

o Of two new types of surgery, does one of them have patients stay fewer days at the hospital?

o Do people who watch more television have a greater tendency toward violence?

o Are people who pop vitamins healthier than those who do not?

o Do people like Coke more than Pepsi?

o Do female and male Penn students differ in their perceptions?

Natural language is not good at casting these questions well. The question cannot be whether all people like Coke better than Pepsi. This is about which product has more people preferring it. Similarly, a superior surgical technique will not always be superior but it may be so in a majority of patients. Hence all questions are really about averages: means and proportions, and comparisons thereof.

• The standard question asked about groups: Is there a mean difference? We can therefore anticipate that the null hypothesis is that of equal means. Remember the devil’s advocate view of hypothesis testing?

Standard errors for differences of means

• Given two groups, we have a population as well as a sample of each.

o Recall: A sample is a finite collection of observed or measured cases, and the corresponding population is the limiting situation of infinitely many or all cases observed or measured. As always, the population is an idealization that will never be realized, but we need it as the unlimited pool from which samples are drawn and as the target that samples are intended to approximate.

o For example, we will have a population of patients subjected to new surgery type 1 and another subjected to new surgery type 2, and correspondingly we will have samples of each.

• Corresponding to groups 1 and 2, we will have population means μ1 and μ2 and sample means [pic] and[pic]. The two samples are generally of different sizes, which we will denote by N1 and N2.

o Notation: We use superscripts in parentheses to denote the group in case of observations and their means. The reason is that we already use subscripts to indicate the case/row number. The i’th case of group 2 would therefore be written Xi(2), for example.

o The random variable may the number of days of hospitalization for a patient undergoing either type of surgery. Hence μ1 and μ2 are the population means of days of hospitalization, while[pic]and[pic]are the sample means of days of hospitalization.

• Both confidence intervals and statistical tests provide statements about population quantities based on sample quantities. The population quantity of interest is now μ1–μ2, and the sample quantity that estimates it is [pic]–[pic].

• At the root of both confidence intervals and hypothesis tests are standard errors. Hence we need to know what the standard error of [pic]–[pic]is and subsequently estimate it. Unfortunately, there are several proposals for standard errors, depending on the situation. We start with the following general formula that motivates all stderrs:

σ([pic]–[pic]) = ( σ([pic])2 + σ([pic])2 )½

If this formula reminds you of Pythagoras, c = (a2 + b2)½, it is for a good reason: this is Pythagoras. In some sense, [pic] and[pic]are orthogonal, namely, uncorrelated. The assumption is that the data of the two groups was collected independently, hence [pic] and[pic] uncorrelated, hence V([pic]–[pic]) = V([pic] ) + V([pic]) – 2C([pic],[pic]) = V([pic] ) + V([pic]). Finally, take square roots.

The formula for the standard error of the mean difference suggests the following standard error estimate:

stderr([pic]–[pic]) = ( stderr([pic])2 + stderr([pic])2 )½

We abbreviate the three terms as stderr1-2 , stderr1 and stderr2 , resp. This formula is the basis of standard error estimates, but the question is how stderr1 and stderr2 are to be calculated. This leads to the next question:

• Standard error estimates are calculated from standard deviations according to the root-N law. A complication with two groups is that there are two ways to estimate a standard deviation:

o Most obviously, we estimate separate SDs in the two groups: estimate σ1 by s1 and σ2 by s2 .

o Less obviously, but most popularly, one assumes that the two groups have the same population SDs: σ1 = σ2. One can then calculate a pooled sample SD for the two groups as follows:

[pic]

So this is the sample SD for both groups!

The reason for going this route is that if it is true that σ1 = σ2 , then the pooled sample SD spooled is more efficient.

Depending on whether one assumes unequal or equal population standard deviations, one will calculate the standard error estimates differently.

How do we know whether to assume equal standard deviations in the two groups or not? There are several possible answers:

o Compare the sample SDs of the two groups and decide not to assume equal population SDs if, for example, s1 and s2 differ by more than 50% from each other. This is a crude method rule, but simple.

o There exists a hypothesis test for the null hypothesis σ1 = σ2. If it rejects, do not assume equal standard deviations, otherwise do. JMP provides this test, but we will not look into it.

o Run both tests and hope that they agree at the intended significance level. If they don’t, check whether the two sample SDs differ by a lot.

• Finally, we can put together standard error estimates depending on whether equal SDs are assumed for the two groups or not:

o If one assumes σ1 ≠ σ2 , one uses s1 and s2 as sample SDs:

stderr1 = s1 / N1 ½, stderr2 = s2 / N2 ½.

o If one assumes σ1 = σ2 , one uses spooled as the sample SD for both groups, hence

stderr1 = spooled / N1 ½, stderr2 = spooled / N2 ½.

As shown in the red box above, the standard error estimate for the mean difference follows in either case from the Pythagorean formula:

stderr1-2 = ( stderr12 + stderr22 )½.

One could prettify the formulas somewhat, but that’s rather useless.

• With these formulas for feeding the standard error estimate of the mean difference, we can play both kinds of statistical inference: confidence intervals and statistical testing:

o A confidence interval for μ1–μ2 with a coverage probability of about 0.95 is

CI = ([pic]–[pic]– 2 stderr1-2, [pic]–[pic]+ 2 stderr1-2

o A test of the null hypothesis H0: μ1 – μ2 = 0 rejects approximately at the 5% significance level if

t = [pic]

falls outside the ±2 interval.

Recall that the test of most interest is that of no mean difference between groups. It is extremely rare that one needs to test differences other than zero.

Group comparisons: Statistical inference in practice

• A practical problem when comparing groups is how to convey the groups. In JMP, we could guess the answer readily: one needs a categorical variable that takes on exactly two labels. Examples would be labels for sex, two surgery type, two managers, two political candidates,…

• JMP: Group comparison looks just like regression, but the predictor variable must be “binary”, that is, categorical with two labels. JMP will then automatically decide that you want to compare groups.

Analyze > Fit Y by X > (select the quantitative variables) > Y, Response; (select the binary grouping variable) > X, Factor > OK

The output will show dot plots, that is, plots of response values on the vertical axis, and the two groups as two different values on the horizontal axis.

Next, to perform the mean comparisons with two-sample t-tests, do:

(click on the tiny red triangle in the top left) > t Test

In order to perform mean comparisons simultaneously for a collection of quantitative variables, depress the ctrl-key while clicking the red triangle and selecting the t-test.

The resulting outputs will show the comparison allowing unequal population SDs in the two groups. We omit the more popular comparison with the equal-SD assumption.

• Examples: The Penn Student Survey (PennStudents.JMP) based on the questionnaire PennStudentsSurvey.htm offers plenty of two-group comparisons. We will compare the sexes with regard to the variables Desired Income, How Athletic, How Popular, How Attractive, Wife Not Work, and Number Kids. Sex is coded as 1=male and 2=female, and the differences in JMP are shown as female–male.

o Desired Income: Below is what is relevant from the output, with everything else removed:

| | |

|Difference |-0.54913 |

|Std Err Dif |0.08842 |

|Upper CL Dif |-0.37529 |

|Lower CL Dif |-0.72297 |

|Confidence |0.95 |

| | |

|t Ratio |-6.21059 |

|Prob > |t| | |t| |0.0244 |

[pic]

o How Popular: Are you surprised? Females and males are almost completely at the same mean level. The CI contains zero; the t-ratio is within ±2; the p-value is close to 1.

| | |

|Difference |0.02472 |

|Std Err Dif |0.13340 |

|Upper CL Dif |0.28700 |

|Lower CL Dif |-0.23756 |

|Confidence |0.95 |

| | |

|t Ratio |0.185321 |

|Prob > |t| |0.8531 |

| | |

[pic]

o How Attractive: A statistically insignificant difference.

| | |

|Difference |0.11624 |

|Std Err Dif |0.11130 |

|Upper CL Dif |0.33507 |

|Lower CL Dif |-0.10259 |

|Confidence |0.95 |

| | |

|t Ratio |1.044384 |

|Prob > |t| |0.2970 |

[pic]

o Wife Not Work: Again, a 7-point scale, 7=definitely not work…

| | |

|Difference |-0.7083 |

|Std Err Dif |0.1644 |

|Upper CL Dif |-0.3849 |

|Lower CL Dif |-1.0316 |

|Confidence |0.95 |

| | |

|t Ratio |-4.30768 |

|Prob > |t| | |t| |0.0023 |

[pic]

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download