ANALYSIS OF VARIANCE



2.08.14

One-way ANOVA

The concept of an ANOVA

• The one-way ANOVA model

Assumptions

The F test

CI estimate of the ith mean

CI estimate of the difference of 2 means

Bonferoni and Tukey estimates

Two-way ANOVA

• The two-way ANOVA model

Assumptions

The F test

CI estimates

ANALYSIS OF VARIANCE: ONE-WAY

Purpose

Compares multiple means: Tests for differences in the means for more than 2 groups

Examples of Applications

Compare mean strength of concrete developed using 6 different forms

Determine the variety of beans producing the greatest mean yield

How about multiple t-tests?

Ease in interpretation: The number of t-tests increases rapidly as the number of groups to be compared increases, and thus, analysis becomes difficult.

Error Reduction: In completing many analyses, the probability of committing at least one type I error somewhere in the process increases with the number of tests that are completed. The probability of committing at least one type I error in an analysis is called the experiment-wise error rate.

Initial Investigation - Descriptive Comparisons

Look at the means of each process or group (treatment); compare the variances (are they approximately the same?).

Examine side-by-side box plots by treatment. Consider the spread and center points.

Model

Suppose r samples of size n1, n2, …nr are independent samples from normal distributions with a common variance σ2.

Notation

yij is the jth observation in the ith sample

j is the observation number

i is the sample number

|Sample 1 |Sample 2 |… |Sample i |

|x11, x12…x1j |x21, x22…x2j |… |xi1, xi2…xij |

The one-way model is:

yij = μi + εij where μI is the mean by group

[An observation = population mean + random error]

Residuals: The error terms are independent random variables with mean 0 and variance σ2. They look like a random sample from a normal distribution. To check, use a normal plot of the residuals

eij = yij - μI and is found by [pic]- [pic]

eij : residual corresponding to yij

[pic] is the jth observation from the ith sample (observation number is j and sample number is i)

[pic] = the fitted value corresponding to yij. It corresponds to the sample mean.

Assumptions

r samples of size n1, n2,..,nr samples are independently and randomly selected

1. Normality: the values in each group are normally distributed (note that the test is robust relative to this assumption i.e. modest departures will not adversely affect the results.)

2. Homogeniety of Variance: the variances within each population are equal (if sample sizes are equal, modest departures are not serious.) Look at side-by-side box plots and values of si.

3. Independence of Error: Residuals (e: IN(0, σ2)) are independent for each value. [The residuals for one observation are not related to the residual for another.) Construct a normal plot of the residuals or plot the residuals vs. the fits.

Example for Discussion

Consider the following times to complete a task using 3 different machines.

I II III

25 23 21

26 22 23

24 24 20

24 24 21

26 22 20

Compare the time to complete a task using three different machines.

The Hypothesis

Ho: μ1 = μ 2 = … μ r

(All group means are equal)

Ha: not all mean are equal

(At least one group mean is not equal to another)

Source of Variation (why do means vary?)

Between Group Variation (treatment) - The values may vary because the treatments are different. The bigger the effect of the treatment, the more variation in-group means we will find.

Within Group (error) - People and specimens are different; they differ whether we treat them the same or not. There are differences in means because there is variability in samples.

NOTE:

Even if Ho is true (i.e. all group means are equal), there will be differences in the sample means; variability in the samples will create the differences.

If Ho is true, the between group variation will estimate the variability as well as the within group variation.

If Ho is false the between group variation will be larger than the within group...

The F-Test

The variation between or within groups is the basis for the F-test. Consider the following sum of squares.

Total Variation - includes both between group and within group variation (variation without regard to treatment)

SST - Total sum of squares (difference between cell mean and grand means)

SSB = SSTr = Sum of Squares Between groups - Variation from Treatment

SSW = SSE = Random variation-random error - Sum of Squares Within groups

SST = SSB + SSW = SSTr + SSE

MSTr = SSTr/(r-1)

MSE = SSE/(n-r)

F = [pic]

df (numerator) = r -1

df(denominator) = n-r

If Ho is false, the Between-Group variation will be larger than the Within-Group variation.

If Ha is false, the treatment variation and random variation are about the same.

ANOVA Table

|Source |SS |df |MS |F |

|Treatment (Between) |[pic] |r - 1 |[pic] = MSTr |F = [pic] |

|Error |[pic] |n - r |[pic]= MSE | |

|(Within) | | | | |

|Total |SST |n - 1 | | |

Pooled Sample Variance: Estimates the baseline variation present in a response variable,

sp2 may be thought of as an “average” variance for each groups or the weighted average of the sample variances.

(Note that the ANOVA assumes equal variances and equal variances are required to find the pooled sd.)

r = number of samples

ni = sample sizes

si2 = sample variance

sp2 = MSE2

sp2 = [pic]

r = number of samples

ni = sample sizes

si2 = sample variances

R2: fraction of the raw variability in y accounted for by fitting the model to the data

R2 = [pic]

Standardized Residuals – see page 458

Confidence Interval Estimate for the ith Population Mean – estimates a cell mean

[pic] with df = n - r (total sample size-number of groups)

Two-Sided Confidence Interval Estimate of the Difference in PopulationMeans

[pic] with df = n – r

df = n - r

where n is the total sample size and r is the number of groups or cells

Prediction Interval of a Single Observation

This interval predicts or locates a single additional observation from a particular one of the r distributions.

Tolerance Interval

The tolerance interval locates most of a large number of values within one of the r distributions.

Simultaneous Confidence Levels

Bonferroni Inequality

Tukey’s Comparison

EXAMPLE OF ONE WAY ANOVA (Completely Randomized Design)

An experiment was conducted to compare the wearing qualities of three types of paint when subjected to the abrasive action of a slowly rotating cloth-surfaced wheel. Ten paint specimens were tested for each paint type and the number of hours until visible abrasion was apparent was recorded for each specimen.

WEAR DATA FOR 3 TYPES OF PAINT (in hours)

|Type |Hours of Wear |Totals |

|1 |148 76 393 520 236 134 55 166 415 153 |2296 |

|2 |513 264 433 94 535 327 214 165 280 304 |3129 |

|3 |335 643 216 536 128 723 258 380 594 465 |4278 |

Note that the numbers of the following items are used on the attached Minitab printout. Items in italics are my notes regarding the interpretation of the data. For XL, see:

Preliminaries

1. Run the descriptive statistics on the data.

2. Explore the test assumptions

Three (3) samples of size 10 are independently and randomly selected.

This assumption is satisfied with a proper experimental design.

Normality: values in each group are normally distributed (the model is robust relative to this assumption i.e.... modest departures will not adversely effect the results.)

Homogeneity of Variance: the variance within each group should be equal for all populations (σ12= σ 22= σ 32).

Look at the variances found in calculating the descriptive statistics.

Do a dot plot or side-by-side box plots.

My print-out does not provide one of these plots, but you should run one to check the spread of the values.

There is a test called Hartley’s test for Homogeneity of Population Variances that can be run to check this assumptions. It is not routinely run; it is extremely sensitive to departures from normality of the populations. A second test is Bartlett’s Test, also sensitive to departures from normality.

If population variances are extremely different, it may be desirable to do a transformation of the data. For example transform the data by finding the square root (often useful for count or percentage data) or the logarithm of each value of x. The type of transformation depends upon the type of data. Once the data are transformed, perform the ANOVA on the transformed data. Test results for differences in transformed treatment means and apply to the original treatment means. It is often difficult to assign a practical interpretation to the transformed variable and the various treatment means, and some applied statisticians do not use transformations unless there is evidence to suggest sizable differences among the treatment population variances (Mendenhall and Beavers)

Independence of Error: Residual are independent for each value.

Check by doing a normal probability plot of the residuals or plot the residuals vs. fits. If the residuals are normally distributed then the resulting plot will be linear. There is a simple correlation test that can be used to test the hypothesis that the values are correlated; you need a table that I will distribute in class.

The normal probability plot is linear and indicates that this assumption is satisfied.

Note: The ANOVA has been show n to be ‘robust’ to violations of these assumptions.

3. Test the hypotheses:

Ηο: μ1 = μ 2 = μ 3 (Population means are equal)

Ha: At least two of the population means are not the same.

The test is an F-test. It looks at the ratio of the mean square between samples (MSB) to the mean square error, random error within samples (MSE). If the null hypothesis is true, we would expect these values to be approximately the same and the ratio to be one; if Ho is not true, then MSB should be larger than MSE.

In this problem the F ratio is 3.51. With 2 and 27 degrees of freedom, the F-table can be used to find a critical value of 3.35 when α is 0.05. (Recall, α is the probability of rejecting a true Ho.) Since the calculated F exceeds the table value of F, reject Ho and conclude that at least two of the population means are not equal. This conclusion is also verified by p = 0.04.

3. If the null hypothesis is rejected then we need to determine which means are different. This is typically determined by calculating confidence intervals for the difference in population means (μ1 - μ 2 , μ 2 - μ 3 ,μ1 - μ 3).

We have discussed two procedures for determining simultaneous confidence intervals for multiple confidence intervals (there are others).

5. Bonferroni Inequality

[pic]

This process involves calculating a t-interval for the difference in means where the value of t is adjusted to allow for an acceptable family confidence level. When many intervals are constructed, this procedure can be inefficient because the confidence level for each individual interval would be extremely large.

6. Tukey’s Method involves the use of Studentized range distribution. (See Tables D-10-A and D-10-B)

[pic]

These intervals can be most easily computed using computer packages such as Minitab.

7. If there is only one difference, a simple t confidence interval could be calculated for the difference between (usually the largest and smallest) means.

8. It is sometimes desirable to calculate a confidence interval for one of the means; this CI is calculated using the t-interval. The Bonferroni Inequality can be applied in determining t.

[pic]

On the print-out , the graph that is shown following the ANOVA summary table indicates that there is a difference between Paint 1 and Paint 3. This conclusion is verified by the Tukey pairwise comparisons. (Tukey’s method has a 95% CI family rate with individual comparisons at 98%) With 95% confidence, it is asserted that the wear time for Paint 3 is from 12 to 385 hours longer than for Paint 1; thus, Paint 3 appears to possess slightly greater wearing quality than Paint 1. There is no statistically significant difference in the wear time between Paints 1 and 2 or between Paints 2 and 3.

5. sp2 is needed for the calculations described in item 4. It is the pooled standard deviation and may be thought of as an “average” variance for each groups or the weighted average of the sample variances.

sp2 estimates the baseline variation that is present in a response variable (hours until apparent abrasion).

(Note that the ANOVA assumes equal variances and equal variances are required to find the pooled sd.)

From the ANOVA summary table, find the MSE and take the square root to get sp= 167.88.This value is the pooled standard deviation which can be thought of as the average (or base) variation of the hours of wear for any of the three treatments i.e. paint types.

|C1 |C2 |C3 | |C6 |C7 |

|T1 |T2 |T3 | |Hrs |Paint Type |

|148 |513 |335 | |148 |1 |

|76 |264 |643 | |76 |1 |

|393 |433 |216 | |393 |1 |

|520 |94 |536 | |520 |1 |

|236 |535 |128 | |236 |1 |

|134 |327 |723 | |134 |1 |

|55 |214 |258 | |55 |1 |

|166 |165 |380 | |166 |1 |

|415 |280 |594 | |415 |1 |

|153 |304 |465 | |153 |1 |

| | | | |513 |2 |

| | | | |264 |2 |

| | | | |433 |2 |

| | | | |94 |2 |

| | | | |535 |2 |

| | | | |327 |2 |

| | | | |214 |2 |

| | | | |165 |2 |

| | | | |280 |2 |

| | | | |304 |2 |

| | | | |335 |3 |

| | | | |643 |3 |

| | | | |216 |3 |

| | | | |536 |3 |

| | | | |128 |3 |

| | | | |723 |3 |

| | | | |258 |3 |

| | | | |380 |3 |

| | | | |594 |3 |

| | | | |465 |3 |

| | | | | | |

Worksheet size: 100000 cells

MTB > describe c1-c3

Descriptive Statistics

Variable N Mean Median TrMean StDev SE Mean

T1 10 229.6 159.5 215.1 158.2 50.0

T2 10 312.9 292.0 312.5 144.2 45.6

T3 10 427.8 422.5 428.4 196.8 62.2

Variable Minimum Maximum Q1 Q3

T1 55.0 520.0 119.5 398.5

T2 94.0 535.0 201.7 453.0

T3 128.0 723.0 247.5 606.2

MTB > Boxplot c1 c2 c3;

SUBC> Box;

SUBC> Symbol;

SUBC> Outlier;

SUBC> Overlay;

SUBC> ScFrame;

SUBC> ScAnnotation.

MTB >

Note: The Following is obtained from the menu using:

Stat/Anova/One-Way (Unstacked)

Responses in separate columns: C1 C2 C3

MTB > AOVOneway c1 c2 c3.

One-way Analysis of Variance

Analysis of Variance

Source DF SS MS F P

Factor 2 198080 99040 3.51 0.044

Error 27 760987 28185

Total 29 959067

Individual 95% CIs For Mean

Based on Pooled StDev

Level N Mean StDev ----------+---------+---------+------

T1 10 229.6 158.2 (--------*--------)

T2 10 312.9 144.2 (--------*--------)

T3 10 427.8 196.8 (--------*--------)

----------+---------+---------+------

Pooled StDev = 167.9 240 360 480

MTB >

Note: The Following is obtained from the menu using:

Stat/Anova/One-Way

Responses is stacked in C6; I have factors in C7.

(See p2)

I used the following from this dialog box.

Comparisons: Tukey's Family Error Rate

Graphs: Dotplot, Boxplot, Normal Plot of Residuals, Residuals vs Fits

MTB > Name c11 = 'RESI1' c12 = 'FITS1'

MTB > Oneway c6 c7 'RESI1' 'FITS1';

SUBC> Tukey 5;

SUBC> GDotplot;

SUBC> GBoxplot;

SUBC> GNormalplot;

SUBC> GFits.

One-way Analysis of Variance

Analysis of Variance for Hrs

Source DF SS MS F P

Paint Ty 2 198080 99040 3.51 0.044

Error 27 760987 28185

Total 29 959067

Individual 95% CIs For Mean Based on Pooled StDev

Level N Mean StDev ----------+---------+---------+------

1 10 229.6 158.2 (--------*--------)

2 10 312.9 144.2 (--------*--------)

3 10 427.8 196.8 (--------*--------)

----------+---------+---------+------

Pooled StDev = 167.9 240 360 480

Tukey's pairwise comparisons

Family error rate = 0.0500

Individual error rate = 0.0196

Critical value = 3.51

Intervals for (column level mean) - (row level mean)

1 2

2 -270

103

3 -385 -301

-12 71

MTB >

EXAMPLE OF TWO- WAY ANOVA (a Factorial Experiment)

An experiment was conducted to determine the effect of sintering time (two levels) on the compressive strength of two different metals. Five test specimens were sintered for each metal at each of two sintering times. The data (in thousands of pounds per square inch) are shown in the following table.

| |100 minutes |200 minutes |

|Metal 1 |17.1 15.2 16.5 16.7 14.9 |19.4 17.2 18.9 20.7 20.1 |

|Metal 2 |12.3 13.8 10.8 11.6 12.1 |15.6 17.2 16.7 16.1 18.3 |

This experiment represents a factorial experiment; every level of every variable is paired with every level of every other variable. To collect this data, we could process a fixed quantity of metal 1 for 100 minutes and the same quantity of the metal for 200 minutes. Measure the compressive strength on the processed metal samples. This procedure is duplicated for metal 2.

Note this problem involves two factors: metal type and sintering time. The different settings for a given factor are call levels. In this problem, there are two levels of each factor i.e. two types of metal and 2 different times. By completing the ANOVA, we can determine 1) if there is a difference in strength based on metal type, 2) if there is a difference in strength based on the sintering time, or 3) if there is a difference in strength based on a particular combination of the metal type and the sintering time interaction.

The main effects of an independent variable is the effect of the variable averaging over all levels of other variables in the experiment. For example, the main effect of for metal type involves a comparison of the means of metal 1 and metal 2. If the effect of metal type depends on sintering time, then there is an interaction effect.

Note that the numbers of the following items are used on the attached Minitab printout. Items in italics are my notes regarding the interpretation of the data.

Preliminaries

1. Run the descriptive statistics on the data.

2. Plot the data.

In this example, strength is plotted as the dependent variable on the vertical axis; time is plotted on the horizontal axis. There should be a separate set of points based on the metal type, producing two ‘lines’. It can be noted that as you increase the sintering time from 100 to 200 minutes, the compressive strength increases for both metal 1 and metal 2. Since these lines are approximately ‘parallel,’ we can speculate that there is no significant interaction, and if there are statistically significant differences in the compressive strength of the metal, they will be differences resulting from metal type or sintering time. (Note it would enhance the graph to note error bars that reflect CI estimates of each of the sample means; this step is not done automatically by the software). There is an interaction when the magnitude of an effect is greater at one level of a variable than at another level of the variable.

3. Explore the test assumptions

Independence of Error: Residuals are independent for each value.

Check by doing a normal probability plot of the residuals; plot the residuals vs. fits.

The normal plot is approximately linear and thus, indicates no major problems with the normal distribution assumption. Similarly, the plot of the residuals vs. the fits (sample means) does not indicate a trend in σ as a function of mean response.

Common Variance: Variances are the same for each population group.

Check by doing a dot plot of the residuals vs. sample means and check the spread (variability).

3. Test the hypotheses. There are three tests of significance.

Test for Main Effects, Metal Type

Ηο: μ.1 = μ. 2 or αi = 0 (There are no differences in population means for the metal types)

Ha: At least two of the population means are not the same.

In this problem the F ratio is 41.15, with 1 and 16 degrees of freedom, the F-table can be used to find a critical value of F when α is 0.05. From the summary table p< 0.001, and thus, we can reject Ho and conclude that population compressive strengths for metal 1 and metal 2 differ; metal 1 appears to have a the compressive strength for metal 1 appears to be greater than the strength for metal 2.

Test for Main Effects, Sintering Time

Ηο: μ1. = μ2. or βj = 0 (There are no differences in population means for the two sintering times)

Ha: At least two of the population means are not the same.

In this problem the F ratio is 60.99, with 1 and 16 degrees of freedom, the F-table can be used to find a critical value of F when α is 0.05. From the summary table p< 0.001, and thus, we can reject Ho and conclude that population compressive strengths for sintering for 100 and 200 minutes differ; sintering for 200 minutes appears to result in a greater compressive strength.

Test for Interaction Effects

Ηο: or αβij = 0 (Metal type and sintering time do not interact to produce different compressive strengths)

Ha: Metal type and sintering types interact.

In this problem the F ratio is 2.17, with 1 and 16 degrees of freedom, the F-table can be used to find a critical value of F when α is 0.05. From the summary table = 0.16, and thus, we cannot reject Ho; we conclude that population compressive strengths based on the interaction of metal type and sintering time are not different i.e. all combinations of metal/time result in the same compressive strength.

3. If the null hypothesis is rejected then we need to determine which means are different. This is typically determined by calculating confidence intervals for the difference in population means (μ1 - μ 2 , μ 2 - μ 3, μ1 - μ 3).

We have discussed two procedures for determining simultaneous confidence intervals for multiple confidence intervals (there are others).

Bonferroni Inequality

[pic]

Tukey’s Method involves the use of Studentized range distribution

See equation 8-17 on page 456; use table D -10

Difference in strength based on metal type (rows)

[pic]

df = n - IJ = 16; I = 2; J = 2; M common cell size = 5; sp = 1.25

(17.65-14.45) + 1.6

[1.6,4.8] lbs. per sq. inch

With 99% confidence, it is asserted that the compressive strength of metal 1 is from 1.6 to 4.8 pounds per square inch greater than for Metal 2.

Difference in strength based on sintering time (columns)

[pic]

(17.65-14.45) + 1.6

[2.3,5.5] lbs. per sq. inch

With 99% confidence, it is asserted that the compressive strength of for a 100-minute sintering time is from 2.3 to 5.5 pounds per square inch greater than for the 200 minute sintering time.

If there is only one difference, a simple t confidence interval could be calculated for the difference between (usually the largest and smallest) means.

It is sometimes desirable to calculate a confidence interval for one of the cell means; this CI is calculated using the t-interval. The Bonferroni Inequality can be applied in determining t.

[pic]

Example of Two-Way ANOVA

Effect of the sintering time on the compressive strength of two different metals

1. Run the descriptive statistics on the data.

MTB > print c1-c3

Data Display

Row Strength Metal Time

C1 C2 C3

1 17.1 1 1

2 15.2 1 1

3 16.5 1 1

4 16.7 1 1

5 14.9 1 1

6 12.3 2 1

7 13.8 2 1

8 10.8 2 1

9 11.6 2 1

10 12.1 2 1

11 19.4 1 2

12 17.2 1 2

13 18.9 1 2

14 20.7 1 2

15 20.1 1 2

16 15.6 2 2

17 17.2 2 2

18 16.7 2 2

19 16.1 2 2

20 18.3 2 2

MTB > table c2 c3;

SUBC> data c1;

SUBC> means c1;

SUBC> standard deviation c1.

Tabulated Statistics

Rows: Metal Columns: Time

1 2 All

1 17.100 19.400 --

15.200 17.200

16.500 18.900

16.700 20.700

14.900 20.100

16.080 19.260 17.670

0.971 1.339 2.006

2 12.300 15.600 --

13.800 17.200

10.800 16.700

11.600 16.100

12.100 18.300

12.120 16.780 14.450

1.103 1.043 2.656

All -- -- --

14.100 18.020 16.060

2.306 1.729 2.824

Cell Contents --

Strength:Data

Mean

StDev

2. Plot the data and test the hypotheses.

MTB > %Main c2 c3;

SUBC> Response c1.

Minitab Plot [Use: Stat/ANOVA/Twoway/Interaction Plot]

MTB > Name c4 = 'RESI1' c5 = 'FITS1'

MTB > Twoway c1 c3 c2 'RESI1' 'FITS1';

SUBC> Means c3 c2;

SUBC> GNormalplot;

SUBC> GFits.

Two-way Analysis of Variance

Analysis of Variance for Strength

Source DF SS MS F P

Time 1 76.83 76.83 60.99 0.000

Metal 1 51.84 51.84 41.15 0.000

Interaction 1 2.74 2.74 2.17 0.160

Error 16 20.16 1.26

Total 19 151.57

Individual 95% CI

Time Mean --+---------+---------+---------+---------

1 14.10 (----*----)

2 18.02 (----*----)

--+---------+---------+---------+---------

13.50 15.00 16.50 18.00

Individual 95% CI

Metal Mean ------+---------+---------+---------+-----

1 17.67 (-----*------)

2 14.45 (-----*------)

------+---------+---------+---------+-----

14.40 15.60 16.80 18.00

MTB >

|C1 |C2 |C3 |C4 |C5 |

|Strength |Metal |Time |RESI1 |FITS1 |

|17.1 |1 |1 |1.02 |16.08 |

|15.2 |1 |1 |-0.88 |16.08 |

|16.5 |1 |1 |0.42 |16.08 |

|16.7 |1 |1 |0.62 |16.08 |

|14.9 |1 |1 |-1.18 |16.08 |

|12.3 |2 |1 |0.18 |12.12 |

|13.8 |2 |1 |1.68 |12.12 |

|10.8 |2 |1 |-1.32 |12.12 |

|11.6 |2 |1 |-0.52 |12.12 |

|12.1 |2 |1 |-0.02 |12.12 |

|19.4 |1 |2 |0.14 |19.26 |

|17.2 |1 |2 |-2.06 |19.26 |

|18.9 |1 |2 |-0.36 |19.26 |

|20.7 |1 |2 |1.44 |19.26 |

|20.1 |1 |2 |0.84 |19.26 |

|15.6 |2 |2 |-1.18 |16.78 |

|17.2 |2 |2 |0.42 |16.78 |

|16.7 |2 |2 |-0.08 |16.78 |

|16.1 |2 |2 |-0.68 |16.78 |

|18.3 |2 |2 |1.52 |16.78 |

-----------------------

[pic]

[pic]

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download