The data in Exhibit 5-3 is used to calculate 1: In this ...



CHAPTER 14

EXPERIMENTAL DESIGN AND ANOVA THE AIRSPACE PROBLEM

1. CHAPTER OBJECTIVES

• Motivation for Using a Designed Experiment

• Analysis of Data from One-Way Designs

• Assumptions of ANOVA

• Analysis of Data from Blocked Designs

• Analysis of Data from Two-Way Designs

• Other Types of Experimental Designs

3. ANALYSIS OF DATA FROM ONE-WAY DESIGNS

14.3.1 One-Way Designs: The Basics

A factor is a variable that can be used to differentiate one group or population from another. It is a variable that may be related to he variable of interest.

A level is one of several possible values or settings that the factor can assume.

The response variable is a quantitative variable that you are measuring or observing.

These are all examples of one-way or completely randomized designs.

An experiment has a one-way or completely randomized design if there are several different levels of one factor being studied and the objects or people being observed/ measured are randomly assigned to one of the levels of the factor.

The term one-way refers to the fact that the groups differ with regard to the one factor being studied.

The term completely randomized refers to the fact that individual observations are assigned to the groups in a random manner.

14.3.2 Understanding the Total Variation

Analysis of variance (ANOYA) is the technique used to analyze the variation in the data to determine if more than two population means are equal.

A treatment is a particular setting or combination of settings of the factor(s)

The grand mean or the overall mean is the sample average of all the observations in the experiment. It is labeled (x-bar-bar)[pic]. Now we can rewrite the variance calculations as follows:

[pic]

The total variation or sum of squares total (SST) is a measure of the variability in the entire data set considered as a whole.

SST is calculated as follows:

[pic]

14.3.3 Components of Total Variation

The between groups variation is also called the Sum or Squares between or the Sum of Squares Among and it measures how much of the total variation comes from actual differences in the treatments.

The dot-plot shown in Figure 14.3 displays the sample average for each of the four time treatments. These are called treatment means.

[pic]

A treatment mean is the average of the response variable for a particular treatment.

Between Groups Variation measures how different the individual treatment means are from the overall grand mean. It is often called the sum of squares between or the sum of squares among (SSA).

The formula for sum of squares among (SSA) is

[pic]

Within groups variation measures the variability in the measurements within the groups. It is often called sum of squares within or sum of squares error (SSE).

[pic]

[pic]

[pic]

[pic]

14.3.4 The Mean Square Terms in the ANOVA Table

The mean square among is labeled MSA. The mean square error is labeled MSE and the mean square total is labeled MST.

The formulas for the mean squares are

[pic] [pic] [pic]

14.3.5 Testing the Hypothesis of Equal Means

In general, the null and alternative hypotheses for a one-way designed experiment are shown below:

[pic]

HA: At least one of the population means is different from the others.

The formula for the F test statistic from Section 13.9 is calculated by taking the ratio of the two sample variances:

[pic]

In ANOVA, MSA and MSE are our two sample variances. So the F statistic is calculated as:

[pic]

14.4 ASSUMPTIONS OF ANOV A

The three major assumptions of ANOVA are as follows:

1. The errors are random and independent of each other.

2. Each population has a normal distribution.

3. All of the populations have the same variance.

14.5 ANALYSIS OF DATA FROM BLOCKED DESIGNS

A block is a group or objects or people that have been matched. Are object or person can be matched with itself, meaning that repeated observations are taken on that object or person and these observations form a block?

If the realities of data collection lead you to use blocks, then you must take this into account in your analysis. Your experimental design is called a randomized block design. Instead of using a one-way ANOVA you must use a block ANOVA.

An experiment has a randomized block design if several different levels of one factor are being studied and the objects or people being observed/ measured have been matched. Each object or person is randomly assigned to one of the c levels of the factor.

14.5.1 Partitioning the Total Variation

Like the approach we took with data from a one-way design, the idea is to take the total variability as measured by SST and break it down into its components.

With a block design there is one additional component: the variability between the blocks. It is called the sum of squares blocks and is labeled SSBL.

The sum of squares blocks measures the variability between the blocks. It is labeled SSBL.

For a block design, the variation we see in the data is due to one of three things: the level of the factor, the block, or the error.

Thus, the total variation is divided into three components:

SST = SSA + SSBL + SSE

14.5.2 Using the ANOVA Table in a Block Design

The ANOVA table for such a block design looks just like the ANOVA table for a one-way design with an additional row. It is shown below.

[pic]

[pic]

[pic]

14.6 ANALYSIS OF DATA FROM TWO-WAY DESIGNS

14.6.1 Motivation for a Factorial Design Model

An experimental design is called a factorial design with two factors if there are several different levels of two factors being studied. The first factor is called factor A and there are r levels of factor A. The second factor is called factor B and there are c levels of factor B.

The design is said to have equal replication if the same number of objects or people being observed/measured are randomly selected from each population. The population is described by a specific level for each of the two factors. Each observation is called a replicate. There are n' observations or replicates observed from each population. There are n = n' rc observations in total.

14.6.2 Partitioning the Variation

The sum of squares due to factor A is labeled SSA. It measures the squared differences between the mean of each level of factor A and the grand mean.

The sum of squares due to factor B is labeled SSB. It measures the squared differences between the mean of each level of factor B and the grand mean.

The sum of squares due to the interacting effect of A and B is labeled SSAB. It measures the effect of combining factor A and factor B.

The sum of squares error is labeled SSE. It measures the variability in the measurements within the groups.

Thus, the total variation is divided into four components:

SST = SSA + SSB + SSAB + SSE

14.6.3 Using the ANOVA Table in a Two-Way Design

The ANOVA table for such a design looks just like the ANOVA table for a one-way design with two additional rows. It is shown below:

[pic]

In a two-way ANOVA, three hypothesis tests should be done.

(1) To test the hypothesis of no difference due to factor A we would have the following null and alternative hypotheses:

Ho: There is no difference in the population means due to factor A.

HA: There is a difference in the population means due to factor A.

This hypothesis test is easily done by determining if the F value in the row of the ANOVA table corresponding to factor A is in the rejection region.

(2) To test the hypothesis of no difference due to factor B we would have the following null and alternative hypotheses:

Ho: There is no difference in the population means due to factor B.

HA: There is a difference in the population means due to factor B.

(3) To test the hypothesis of no difference due to the interaction of factors A and B we would have the following null and alternative hypotheses:

Ho: There is no difference in the population means due to the interaction of factors A and B.

HA: There is a difference in the population means due to the interaction of factors A arid B.

This hypothesis test is easily done by determining if the F value in the row of the ANOVA table corresponding to factor B is in the rejection region.

14.6.4 Understanding the interaction Effect

The easiest way to understand this effect is to look at a graph of the sample averages for each of the possible combinations of the two factors.

The line graph shown in Figure 14.7 displays the 20 sample means for airspace.

[pic]

From this graph you can see that the mean airspace decreases the longer the box sits on the shelf, regardless of from what position in the hardroll the box was made.

The airspace behavior is affected by the interaction of the time on the shelf and the position in the hardroll from which it was made. This is what we expected based on our hypothesis test.

If there were no interaction effect, the lines connecting the sample means would be parallel as in Figure 14.8.

[pic]

CHAPTER 17

THE ANALYSIS OF QUALITATIVE DATA

17.1 CHAPTER OBJECTIVES

In this chapter you will learn to use chi-square tests for

• testing whether a particular probability model fits a set of data (goodness of fit test)

• testing equality of proportions for more than two populations

• testing whether two qualitative variables are dependent or independent

17.2 TESTS FOR GOODNESS OF FIT

One method of testing to see if data come from a population with a certain distribution is to perform a test called a chi-square goodness of fit test.

The chi-square goodness of fit test checks to see how well a set of data fit the model for a particular probability distribution.

17.2.1 The Chi-Square Test

For a chi-square goodness of fit test, the hypotheses are:

Ho: The data come from a population with a specific probability distribution (normal, binomial, etc.).

HA: The data do not come from a population with the specified distribution.

The test procedure is to collect a set of sample data and to create a frequency histogram for the data. The test then compares the observed frequency distribution of the data to the frequency distribution that would be expected if the null hypothesis were true.

The observed frequencies are the actual number of observations that fall into each class in a frequency distribution or histogram. The expected frequencies are the number of observations that should fall into each class in a frequency distribution under the hypothesized probability distribution.

A uniform distribution is one in which each outcome or class of outcomes is equally likely to occur.

The chi-square goodness of fit test will give information about the distribution of the number of defective cases in a sample of size 5, which may enable the software manufacturer to obtain additional information about when or how the population changed.

In a chi-square goodness of fit test, we are always trying to decide whether the data we observed fit a particular probability distribution model.

17.2.2 The Chi-Square Statistic

The question is, how can we quantify the "deviation" from what is expected, and how will we know when what we observe deviates by more than it should or more than is likely by pure chance?

To answer these questions we need to define a test statistic that measures the deviation of interest, and has a sampling distribution whose behavior we understand. This test statistic is known as the chi-square statistic and is calculated as

[pic]

oi = observed frequency in the ith class of the frequency distribution

ei = expected frequency in the ith class of the frequency distribution

k = the number of classes in the frequency distribution

This test statistic has a chi-square distribution with k-p-1 degrees of freedom, where p = the number of parameters of the theoretical distribution that were estimated from the data.

17.2.3 The Critical Value and the Decision Rule

To find the appropriate critical value we will need a level of significance for the test,(. Chi-square goodness of fit tests is always one-sided and upper-tailed tests. This is because we are looking at the total deviation and trying to see if that exceeds some reasonable level.

To find the critical value we look up [pic]for the appropriate values of ( and k-p-1 as shown in Figure 17.1.

[pic]

17.2.4 Testing for Normality and Other Considerations

When you are using a chi-square test to test the assumption of normality, you will either decide that the assumption of normality is not appropriate, or that since there is no evidence to the contrary, the assumption is a reasonable one.

17.3 TESTING PROPORTIONS FROM MORE THAN TWO POPULATIONS

17.3.1 Testing Proportions for More Than Two Populations

We want to determine whether the proportion of successes in each population is the same. That is, the general set of hypotheses we wish to test is:

[pic]

For this test the two variables are the population that the sample item comes from (i = 1, 2, . . , c) and whether the item is a success (the characteristic is present) or a failure (the characteristic is not present). All example of the contingency table is shown in Table 17.1.

[pic]

The proportion of successes for each population can be calculated as:

[pic]

17.3.2 Description of the Test

In general, the estimate for ( is calculated by counting all of the successes observed and dividing by the total number of objects sampled. That is, the overall proportion of successes is given by:

[pic]

And the expected number of successes in the sample from population i is

[pic]

17.3.3 Performing the Test

The application of the chi-square statistic to testing proportions is exactly the same:

[pic]

oi is the observed frequency in the ith cell of the table

ei is the expected frequency in the ith cell of the table

The degrees of freedom for a chi-square test involving a table are given by:

Degrees of freedom = (r-1) (c-1)

Where r is the number of rows in the table and c is the number of columns.

17.3.4 Using the Results of the Chi-Square Test for Proportions

You might be wondering (and with good reason) just how you can use the chi-square test for equality of proportions to gain useful information or to make decisions.

When the test does not lead to rejection of the null hypothesis, there is not really anything else to do. However, when the test leads to rejection of the null hypothesis, you conclude that at least one of the populations is different from the others.

If you have done a chi-square test for proportions, then it is likely that you suspect that something is going on that is causing the populations to be different. The results of the test verify that your suspicions are justified. The test result tells you that further investigation is warranted.

17.4 THE CHI-SQUARE TEST FOR INDEPENDENCE

17.4.1 Probability and Independence

Two events are independent if the probability that one event occurs in any given trial of an experiment is not affected or changed by the occurrence of the other event.

In probability, two events, A and B, are independent exactly when

[pic]

17.4.2 Description of the Test

We would like to decide whether or not two qualitative variables are related. The hypotheses that we will test are:

Ho: The two variables are independent of each other.

HA: The two variables are not independent.

A general example of the contingency table for two variables is shown in Table 17.2.

[pic]

To do a chi-square test, we must find the expected frequency for each cell in the table and compare it to the observed frequency.

For the test of independence, the null hypothesis is that the two variables are independent. To find the probabilities for each value of a variable, you use the row or column totals. That is, for row i,

[pic]

And for column j,

[pic]

When we have independence for each cell in the contingency table, the probability that variable 1 will have category i and variable 2 will have category j is

[pic]

To find the expected frequency for any cell we multiply the probability by the sample size, so:

[pic]

17.4.3 Performing the Test for Independence

Once we have the expected and observed frequencies for each cell in the table, nothing new needs to be learned to perform the test. The chi-square test statistic is calculated in the same way as for the test of proportions:

[pic]

And the degrees of freedom are calculated as:

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic]

[pic][pic]

[pic]

[pic]

[pic]

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download