If workers are really sick when they take a sick day then ...



Chapter 11: Chi-Square and F Distributions

Chapter Notes

1 Chi-Square: Tests of Independence 2 – 4

& of Homogeneity

2 Chi-Square: Goodness of Fit 5 – 6

3 Testing & Estimating a Single Variance 7 – 10

or Standard Deviation

5 One-Way ANOVA: Comparing Several 11 – 13

Sample Means

§11.1 Chi Square: Tests of Independence & Homogeneity

This section and the next will deal with qualitative data. This first section will deal with 2 way tables – row and columns are used to classify 2 characteristics of the data. Our two goals will be to see if the row and column data is independent or to see if different populations have different characteristics (for instance men vs. women; homogeneity).

First, let’s discuss a new distribution – the Chi-Squared distribution. Symbolically we will use χ2 for this distribution. The Chi-Squared distribution is a skewed distribution which has only positive values. It is a right-skewed distribution. It becomes more symmetric as the sample size, which is what determines the degrees of freedom, increases. For the tests of independence and homogeneity we will be considering right-tailed χ2 tests. The following a picture representing the distribution.

Focus #1 – Independence

Assumptions

1. Random sample

2. Row and column variables are independent under the null and dependent under the alternative hypotheses

3. Expected frequency is ≥ 5 (observed can be anything)

Goal of a Test of Independence

See if the rows and columns are dependent. If they are then finding probabilities is just as we discussed in 4.2 when finding probabilities from a contingency table!

Philosophy Behind the Test

The closer the observed and expected are the smaller the squared deviations and thus the smaller the test statistic. In a Chi-Squared distribution, smaller values are further to the left and thus have larger probabilities (p-values). As such, we see that we need a right tailed test to reject when we have a large difference between observed and expected and thus are seeing dependence. Recall that the probability for independent random variables is what we are testing the observed on, so if there is a large difference then we can make the assumption that they are dependent random variables!

Hypotheses

H0: Rows & Columns are independent

HAor1: Rows & Columns are dependent (not independent)

Critical Value

χ2α, (r–1)(c–1) with confidence level 1 – α and d.f. (#rows – 1)(#columns – 1)

Test Statistic

χ2 = (O – E)2

E

Notation Used for Test Statistic

O Observed values in frequency distribution (always whole numbers)

E Expected values ( (row total)(column total) (not necessary a whole #)

grand total

This is the probability under the assumption of independence. Recall our study

in 4.2. If we can make the assumption of independence, which we generally can

not in a two way table, then this is the expected value.

r The number of rows

c The number of columns

Reject H0 for a p-value less than alpha. or a critical value < test statistic

Using the TI-83/84

Input

1) Find the expected values

2) Enter Observed values in a Matrix “A”(Matrix(Edit(r x c)

a) Enter the observed counts

3) Set up Matrix “B” as a “r x c” to accept the Expected Matrix

4) STAT(TESTS(χ2(Expected: MatrixA(Get by MATRIX(NAMES) &

Observed: Matrix B(Get by MATRIX(NAMES)(Calculate(Enter

Output

1) Test Statistic

2) P-Value of the test statistic

3) Degrees of Freedom (r–1)(c–1)

4) In Matrix B you will find each expected value should you need those.

[The (row x column) ÷ grand]

Example: Recall the data from §3.3 of the marital status and whether or not a person

smokes. Let’s see if smoking and marital status are independent at the

alpha 0.05 level.

| |Married |Divorced |Single | |

|Smokes |54 |38 |11 |103 |

|Not Smoke |146 |62 |39 |247 |

| |200 |100 |50 |350 |

You may recall there were instances when the probability from a contingency table of an intersection was extremely close to the probability if you incorrectly assumed independence – well that’s because the information may well have been independent! However, that assumption can’t be made when the information given is listed in a two-way table!

Focus #2 – Homogeneity (Gender influence is a popular homogeneity test)

H0: Proportions are the same for populations

HA: Proportions are different for the populations

The critical value, expected values and critical values are exactly the same for this test as for the test of independence. It is just the question that we are asking that makes the difference. The focus is on the fact that the population can be broken into subpopulations and that may make a difference in proportions. If there are differences in proportions due to differing subpopulations, then there is dependence – refer back to the hypotheses for dependence.

Example: The claim has been made that males do better in math courses than

females. At the alpha 0.01 level test this claim.

| |Low |Average |High | |

|Male |15 |60 |25 |100 |

|Female |10 |35 |5 |50 |

| |25 |95 |30 |150 |

§11.2 Chi-Square: Goodness of Fit

This is essentially testing the same thing that a 2 sample proportions test does, except this deals with more than 2 samples. This boils down to seeing if the distribution fits a claimed multinomial distribution.

Characteristics of Multinomial Experiment

1. Fixed # of trials

2. Trials are independent

3. Outcomes are classified into one of n≥2 categories

4. Probability of categories remain constant

*Note: This should sound very familiar, as a binomial is a type of multinomial experiment where n=2!

Goal of a Goodness of Fit Test

See how well an observed frequency distribution conforms to a claimed distribution.

Philosophy Behind the Test

The closer the observed and expected are the smaller the squared deviations and thus the smaller the test statistic. In a Chi-Squared distribution, smaller values are further to the left and thus have larger probabilities (p-values). As such, we see that we need a right tailed test to reject when we have a large difference between observed and expected and thus are seeing a non-conforming distribution.

Notation Used for Test Statistic

O Observed values in frequency distribution (always whole numbers)

E Expected values under claimed distribution ( np (not necessarily whole #’s)

k The number of categories

n The total number of trials (equals the sum of the observed)

*Note: If the probability of each category is equal then the expected values are n/k. Computation does not need to change, but realizing that only one computation is necessary can save time.

Hypotheses

H0: p1=p2=p3=…=pn

HAor1: At least one difference

Critical Value

χ2α, k–1 This is always a right-tailed test. Alpha is not split – ever!

Test Statistic

χ2 = (O – E)2

E

Decision

Reject H0 for a p-value less than alpha. or a critical value < test statistic

Using the TI-83/84

1) Find the expected values

2) Enter Observed and Expected values in a Matrix (Matrix(Edit(2 x k)

a) Observed go in first row

b) Expected x 1030 in the second row

3) STAT(TESTS(χ2(MatrixA(Get by MATRIX(NAMES)(Calculate(Enter

Example: If workers are really sick when they take a sick day then there should be

no difference in the frequency of absences given the day of the week. Do

you think that the workers really only call in sick when they are sick?

Test the claim, that workers only call in sick when they are sick, at the

alpha = 0.01 level using the following information from a sample of 100

workers.

|Day |Mon |Tues |Wed |Thurs |Fri |

|# Absent |27 |19 |22 |20 |12 |

Example: Test a car manufacturer’s claim at the 90% confidence level that 30% of

buyers prefer red, 10% prefer white, 15% green, 25% blue, 5% brown and

15% yellow. The following data come from 200 buyers.

|Color |Red |White |Green |Blue |Brown |Yellow |

|# Prefer |64 |14 |38 |49 |6 |29 |

§11.3 Testing a Claim About a Standard Deviation or Variance

The assumptions are the same made for building a confidence interval about the standard deviation.

1) Simple random sample

2) The population must have a normal distribution

The test statistic is the Chi-Square statistic that we are familiar with from CI's as well:

(2 = (n ( 1) s2

(02

Because the Chi-Square distribution is not symmetric we will still need to pay attention to the right and left (2 values when finding the critical values. In a table, the χ2 is given as a right-tailed value only, hence for the left tail we need to look up 1–α or 1–α/2, with the (n–1) degrees of freedom.

Example: The claim is that peanut M&M's have weights that vary

more than the weights of plain M&M's. The weights of

plain M&M's has ( = 0.04g. A sample of 40 peanut

M&M's has weights with s = 0.31 g. (#5 p. 424 Triola)

Step1: State the hypotheses Step2: Find the critical value(s) &

draw the picture

Step3: Find the test statistic & locate it on the diagram in 2.

Step 4: State the conclusion

Example: The Stewart Aviation Products Company uses a

new production method to manufacture aircraft

altimeters. A simple random sample of 81

altimeters is tested in a pressure chamber, and the

errors in altitude are recorded as positive values or

negative values. The sample has s = 52.3 ft. At

( = 0.05 level, test the claim that the new

production line has errors with a standard

deviation different from the old line which had

( = 43.7 ft.

Step1: State the hypotheses Step2: Find the critical value(s) &

draw the picture

Step3: Find the test statistic & locate it on the diagram in 2.

Step 4: State the conclusion

[pic]

[pic]

§11.5 One-Way ANOVA: Comparing Several Means

Recall the tests for comparing two means from chapter 9.5. Well, ANOVA (Analysis of Variance) is a method for comparing more than two sample means. The analysis of variance as its name implies tests whether the differences in variation are great enough to warrant that the distributions from which the means come are indeed different.

To discuss ANOVA we will again need to discuss a new distribution. This new distribution is a quotient of 2 Chi-Squared distributions called the F-distribution. Here are the F-distribution’s characteristics:

1) It only has positive values

2) It is right-skewed

3) The degrees of freedom are paired, one relating to the numerator and one to the

denominator χ2.

We are going to give this a VERY short look. There is a lot to discuss in ANOVA, but I’m going to make sure that you can read output and conduct a test using that output and know vaguely where that output comes from.

We need to know 2 basic things:

Factors are the characteristics by which we distinguish the subpopulations

Error comes from the within group variability

Philosophy Behind the Test

We are comparing the error in the factors to the error within the entire group. Not much difference means that the ratio of measurement of variability will be small and if there is a great degree of variability between factors in comparison to overall variability then the ratio will be large and we will get a test statistic which will be in the rejection region.

Hypotheses

H0: μ1=μ2=μ3=…=μn

HAor1: At least one difference

Critical Value

Fα, dfE,dfT Your book will be discussing BET & W where I will use T & E

BET stands for between treatments & and W stands for within the whole

Test Statistic

We need quite a bit of information to find the test statistic, so let’s discuss the pieces and how to put them together into a nice table to conduct the test.

SSBET(SST) = Sum of the Σx’s of each factor divided by their sample size – sum of all x’s quantity squared divided by total number of observations. This is the estimate of the common population variance based upon the variation among sample means.

SSW(SSE) = Sum of what essentially amounts to the population variances of the factors based upon the samples variances. Find each sum of squares minus sum of x’s squared divided by sample size.

SSTot = SSBET + SSW

DFnumerator = DF for Factors = k – 1 k = # of factors (treatments)

DFdenominator = DF for Within = N – k N = Total # of observations across all

factors

*Note: N–1=DFbet + DFw

MSBET(MST) = SSBET/DFbet

MSW(MSE) = SSW/DFw

F = MSBET

MSW

ANOVA TABLE

|Source |Sum of Sq |DF |Mean Sq |F Ratio |P-Value |

|Between | | | | | |

|Within | | | | | |

|Σxi | | | | | |

|Σxi2 | | | | | |

b) Verify that SSBET is correct

Recall: SSBET = {Σ[(Σxi)2/ni]} – [Σxtot] 2/N

c) Complete the ANOVA table for the data given here. Show calculations.

|Source |DF |SS |MS |F |P |

|Trmt |3 |88425 | | |0.422 |

|Error | |475323 | | | |

|Total |19 |563748 | | | |

d) Give the hypotheses and conduct the test for the question above, using the table

given in c).

e) Confirm these calculations using your calculator.

-----------------------

Σ

Fα,n1–1,n2–1

Area = α

Rejection Region

χ2α,df

0

χ2α,(r–1)(c–1)

Rejection Region

Σ

Rejection Region

χ2α,k–1

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download