Lab 1 97



Trinity College, Dublin

Generic Skills Programme

Statistics for Research Students

Laboratory 6: Analysis of Variance

To complete the laboratory exercise, work your way through this handout, which is self contained and self explanatory. Work in pairs (two per machine), and learn from each other. Keep separate logs of your work. The tutor is available to help with technicalities and discuss substantive issues.

Invitations to consider the results of Minitab analysis and their statistical and substantive interpretations are printed in italics. Take some time for this; consult your neighbour or tutor. Enter your responses in a Word document, as if draft contributions to a report on the experiment and its analysis.

Topics:

1. Testing filter membranes

2. A study of river pollution

3. Two sample t-tests and ANOVA

4. Review Exercise

Learning Objectives:

Be able to

– formulate a substantive problem in statistical terms

– conduct an initial data analysis using appropriate numerical and graphical summaries of raw data

– summarise the results of an initial data analysis in management report format

– use Minitab to implement a one-way analysis of variance and explain and interpret the results

– use Minitab to implement Tukey's and Fisher's methods of comparison of several means

– interpret the results of multiple comparisons of means

– compare and contrast Tukey's and Fisher's approaches

– use Minitab to produce standard diagnostic plots for the analysis of variance and explain and interpret the results

– use Minitab to test the statistical significance of differences between several standard deviations and explain and interpret the results

– explain the rationale for weighted least squares in one-way analysis of variance and derive suitable weights

– use Minitab to implement weighted one-way analysis of variance and explain and interpret the results

– implement a comprehensive one-way analysis of variance using the Minitab General Linear Model feature and explain and interpret the results

– demonstrate the correspondence between two-sample t-tests and analysis of variance applied to the comparison of two means

Refresher

To start Minitab,

• click Start, Programs, Minitab 15 for Windows, Minitab.

To access the Excel data files,

• click on the Start button in the bottom left hand corner and choose Run..

• in the dialog box, type \\tholos\shared, as below, and click OK,

[pic]

• in the window that opens double click on the ST1001 folder,

• double click on the GET folder,

• double click on the GenericSkillsData folder.

The data sets for today's Laboratory are in files

MembraneStrength.xls,

IQ Scores.xls (from Laboratory 4) and

RiverPollution.xls.

To access the membrane strengths data:

• click on MembraneStrength.xls, then Open

• copy the two data columns,

• in the Minitab active data window, click in the Name cell for Column 1 (C1),

• from the Minitab Edit menu select Paste Cells.

1 Testing filter membranes

A company that manufactures liquid filters is concerned with improving the burst strength of the membranes which constitute the critical part of the filter. They have conducted a study of four types of filter membrane, labelled A, B, C and D. Membrane A is the standard type currently used by the company. Membrane B is an alternative developed by the company using a new material they have developed. Membranes C and D were supplied by other manufacturers. Following a review of historical data, it was decided to test sample membranes, one from each of 10 batches of each membrane type. The standard measure of burst strength involved increasing the pressure of liquid through the filter until the filter failed and recording the fail pressure. The measurement unit was kilopascal (thousands of Pascals, kPa). The resulting data follow.

|Membrane A |Membrane B |Membrane C |Membrane D |

|95.5 |90.5 |86.3 |89.5 |

|103.2 |98.1 |84.0 |93.4 |

|93.1 |97.8 |86.2 |87.5 |

|89.3 |97.0 |80.2 |89.4 |

|90.4 |98.0 |83.7 |87.9 |

|92.1 |95.2 |93.4 |86.2 |

|93.1 |95.3 |77.1 |89.9 |

|91.9 |97.1 |86.8 |89.5 |

|95.3 |90.5 |83.7 |90.0 |

|84.5 |101.3 |84.9 |95.6 |

1.1 Initial data analysis

Having already copied the data to Minitab

• use the Dotplot command in Minitab's Graph menu, One Y, With Groups, to make standard dotplots of the data,

• use the Individual Value Plot command in Minitab's Graph menu, One Y, With Groups, to make vertical dotplots of the data.

Which plot do you prefer? Why?

Interpret the output. Make tentative conclusions regarding comparisons of strengths of different membrane types, with a corresponding recommendation to the company, keeping in mind the origins of the four membrane types.

1.2 Formal analysis

The formal comparison of membrane strengths may be accomplished by an application of the Analysis of Variance (ANOVA) which, effectively, compares the four sample means. Minitab may be used to achieve this, as follows:

• from the Stat menu, select ANOVA, then One-Way ∙ ∙ ∙,

• select Strength as the Response and Membrane as the Factor,

• click on Graphs ... , uncheck any checked residual plots(we will return to these),

• click OK, OK.

Report on the statistical significance of the differences between the sample means.

Explain the entries in the DF column of the ANOVA table.

Using the Minitab Calc menu, confirm the p-value for F and calculate the critical value for F.

Note that the Cumulative probability function is used to calculate probabilities from F values while the Inverse cumulative probability function is used to calculate F values from probabilities.

1.3 Make pairwise comparisons of membrane strengths

To assist in understanding the pattern of differences found between means of different membrane types, multiple comparisons of the several means may be undertaken. Minitab provides two approaches, corresponding to Tukey's HSD and Fisher's LSD[1]. They may be implemented as follows:

• edit the previous ANOVA dialog, click the Comparisons button,

• check the Tukey's and Fisher's boxes,

• click OK, OK.

The ANOVA table is repeated, followed by the multiple comparisons.

Report on the statistical significance of the differences between the sample means, pairwise, using both Tukey's and Fisher's methods. Display using the underline format (see, e.g., Course Manual, Figure 4.13, p. 21 or Figure 4.16, p. 28).

Compare the width of the Tukey intervals and the corresponding Fisher intervals. By how much do they differ?

Explain the differences between the two methods, having regard to simultaneous and individual confidence levels.

1.4 Diagnostic analysis

The validity of the statistical inferences relies on key assumptions, specifically, homogeneous standard deviation and Normality of errors. These may be assessed using standard diagnostic plots. These may be implemented as follows:

• edit the previous dialog (CTRL+E) and click the Graphs button,

• check the Normal plots of residuals and Residual versus fits boxes,

• click OK,

• click the Comparisons button,

• uncheck the Tukey's and Fisher's boxes, click OK,

• click OK.

Provide interpretations of the diagnostic plots.

What course of action is suggested?

Follow the suggested course of action. Proceed to a conclusion.

Prepare a short report for management with your final recommendation(s).

2 A study of river pollution

Jaffe, Parker and Wilson (1982)[2] have investigated the concentration of several hydrophobic (i.e., not dissolving in, absorbing, or mixing easily with water) organic substances (such as hexachlorobenzene, chlordane, heptachlor, aldrin, dieldrin, endrin) in a river, downstream of an abandoned dump site that had previously been used by the pesticide industry to dispose of its waste products.

It was expected that these hydrophobic substances might have a nonhomogeneous vertical distribution in the river because of differences in density between these compounds and water and because of the adsorption of these compounds on sediments, which could lead to higher concentrations on the bottom. It is important to check this hypothesis because the standard procedure of sampling at six-tenths of the depth could miss the bulk of these pollutants if the distribution were not uniform.

Grab samples were taken at various depths of the river. Ten surface, 10 mid-depth and 10 bottom samples were collected, all within a relatively short period. A gas chromatograph measurement procedure was used to establish the concentrations of a range of pollutants in each sample. The average aldrin and hexachlorobenzene (HCB) concentrations (in nanograms per litre) in the 30 samples are shown below.

| |Aldrin | | | |HCB | |

|Surface |Middepth |Bottom | |Surface |Middepth |Bottom |

|3.08 |5.17 |4.81 | |3.74 |6.03 |5.44 |

|3.58 |6.17 |5.71 | |4.61 |6.55 |6.88 |

|3.81 |6.26 |4.90 | |4.00 |3.55 |5.37 |

|4.31 |4.26 |5.35 | |4.67 |4.59 |5.44 |

|4.35 |3.17 |5.26 | |4.87 |3.77 |5.03 |

|4.40 |3.76 |6.26 | |5.12 |4.81 |6.48 |

|3.67 |4.76 |3.76 | |4.52 |5.85 |3.89 |

|5.17 |4.90 |8.07 | |5.29 |5.74 |5.85 |

|5.17 |6.57 |8.79 | |5.74 |6.77 |6.85 |

|4.35 |5.17 |7.30 | |5.48 |5.64 |7.16 |

2.1 Initial data analysis; Aldrin

Copy the River Pollution data into Minitab columns and conduct an initial data analysis:

• make dotplots and numerical summaries of the Aldrin levels, stratified by depth,

• use the Value Order command in the Editor menu to arrange the depths in order of their typical Aldrin concentrations; refer to Laboratory 1, Part 2, to see how to do this.

Provide a detailed interpretation of the output, as if for an interim management report, including tentative responses to the questions raised above.

2.2 ANOVA

The formal comparison of pollutant levels at the three depths may be accomplished by an application of the Analysis of Variance (ANOVA) which, effectively, compares the three sample means. Minitab may be used to achieve this, as follows:

• from the Stat menu, select ANOVA, then One-Way ∙ ∙ ∙,

• select Aldrin as the Response and Depth as the Factor,

• click on Graphs ... , uncheck any checked residual plots,

• click OK, OK.

Report on the statistical significance of the differences between the Aldrin sample means.

2.3 Make pairwise comparisons of pollutant levels

To assist in understanding the pattern of differences found between pollutant levels, make multiple comparisons of the several means:

• edit the previous ANOVA dialog and click the Comparisons button,

• check the Tukey's box,

• click OK, OK.

Report on the statistical significance of the differences between the Aldrin sample means, pairwise, using Tukey's method. Report specifically on differences from standard (Middepth).

Since Middepth is a "standard", there may be interest in comparing the responses at the other two depth levels to those at the standard. For this purpose, an alternative approach to multiple comparisons is available. To implement this,

• edit the previous dialog (CTRL+E) and click the Comparisons button,

• uncheck the Tukey's box,

• check the Dunnett's box, enter "Middepth" (including quotation marks) in the Control group level box,

• click OK, OK.

Report on the statistical significance of the differences of the Surface and Bottom Aldrin sample means from standard (Middepth).

Compare with the corresponding Tukey differences; explain any variations.

2.4 Diagnostic analysis

Implement the standard diagnostics analysis:

• edit the previous dialog (CTRL+E) and click the Comparisons button,

• uncheck the Dunnett's box, click OK,

• click the Graphs button,

• check the Normal plots of residuals and Residual versus fits boxes,

• click OK, OK.

Provide interpretations of the diagnostic plots.

2.5 Formal comparison of spreads

The diagnostic plot suggests that the spread of pollutant values increases with depth, in parallel with pollutant levels. This merely echoes the initial data analysis. Minitab provides formal tests of the statistical significance of the variation in spread, specifically, variation in standard deviations. This may be achieved as follows:

• from the Stat menu, select ANOVA, then Test for Equal Variances,

• select Aldrin as the Response and Depth as the Factor,

• click OK.

The results appear in a graph window. Results of two significance tests are shown, Bartlett's test and Levene's test. Get help on these by editing the last dialog (CTRL+E) and clicking on the Help button, then click the "Bartlett's and Levene's' tests" link. For more detail, click the Back button, then click "see also", "Methods and formulas".

Interpret the results, recalling that the Normal diagnostic plot supported Normality.

What do you conclude?

2.6 Use weighted least squares to adjust for unequal standard deviations

The estimation process may be adjusted to allow for the possibly unequal standard deviations. The basic principle is that cases with smaller standard deviation should contribute more and cases with larger standard deviation should contribute less to the estimation process. This is achieved by using a weighting process where each case's contribution is weighted by the reciprocal of the relevant standard deviation.

In Minitab, the weighting is implemented as part of the Least Squares estimation process; the weights used there are the reciprocals of the relevant variances. For the cases at each depth, Surface, Middepth and Bottom, these are the squares of the standard deviations shown in the original numerical summaries, see 2.1 above. To calculate the Least Squares weights, use the formula

weight = 1/(standard deviation)2

What are the weights for Surface, Middepth, Bottom?

Next, the weights must be entered in a column:

• name C4 as Weights,

• from the Calc menu, select Make Patterned Data, then Arbitrary Set of Numbers,

• select Weights as the column in which to "Store patterned data",

• enter the calculated weights in the "Arbitrary set of numbers" box,

• enter 10 as the "Number of times to list each value",

• click OK.

Check the assignment of weights in the Worksheet, ensure correct correspondence with Depths.

Now, we are ready to implement the weighted ANOVA. The weighting option is not available using the simple One-Way command; use the General Linear Model command instead:

:

• from Stat, select ANOVA, then General Linear Model,

• select Aldrin as Response,

• enter Depth as Model,

• click on the Options button and enter Weights in the appropriate box, click OK,

• click on the Comparisons button, enter Depth in the Terms box, check Tukey, click OK,

• click on the Graphs button, check Deleted, Normal plot of residuals, Residuals versus fits,

• Click OK, OK.

Review the output, compare point by point with the unweighted output.

Note any qualitative correspondences and differences in results,

Note any quantitative correspondences and differences in results.

Prepare a short report on the effects of weighting, with a final conclusion.

3 Two sample t-tests and ANOVA

Recall the comparison of boys and girls IQ scores in Laboratory 4. As part of a larger study of academic progress by males and females, IQ scores of samples of seventh grade boys and girls in a Mid-West USA school district were measured. Assuming that these samples were representative of all the seventh graders, male and female, in the school district, a basic question is: Is there evidence of a difference in IQ scores for boys and girls?

The data are available in the Week 1 Day 4 section of the Moodle. Copy the data to Minitab and repeat the 2-sample t-test of Laboratory 4; use the "Assume equal variances" option.

One-way ANOVA is intended for use in comparison of several sample means. If so, then it should be applicable to the comparison of two sample means. Minitab allows application of ANOVA to the two samples in separate columns, without the necessity of stacking them in a single column with group identifiers in another column:

• from the Stat menu, select ANOVA, then One-Way (Unstacked),

• select Boys, Girls as the Responses,

• click the Graphs button, check Normal plot of residuals, Residuals versus fits,

• click OK, OK.

To facilitate establishing correspondences,

• calculate the square roots of F and MS(Error) in the ANOVA table.

Prepare short reports on both sets of results. Include commentary on the residual analysis incorporated in the ANOVA command.

How many correspondences can you establish between the two sets of results?

Explain the DF in the first row of the ANOVA table

4 Review Exercise

Carry out a careful analysis of the HCB data in the River Pollution study.

Conclusion

This concludes Laboratory 6. The learning objectives listed at the outset are reproduced here. Check them individually and ensure that you have achieved each one; seek help from the Tutor if necessary.

Learning Objectives:

Be able to

– formulate a substantive problem in statistical terms

– conduct an initial data analysis using appropriate numerical and graphical summaries of raw data

– summarise the results of an initial data analysis in management report format

– use Minitab to implement a one-way analysis of variance and explain and interpret the results

– use Minitab to implement Tukey's and Fisher's methods of comparison of several means

– interpret the results of multiple comparisons of means

– compare and contrast Tukey's and Fisher's approaches

– use Minitab to produce standard diagnostic plots for the analysis of variance and explain and interpret the results

– use Minitab to test the statistical significance of differences between several standard deviations and explain and interpret the results

– explain the rationale for weighted least squares in one-way analysis of variance and derive suitable weights

– use Minitab to implement weighted one-way analysis of variance and explain and interpret the results

– implement a comprehensive one-way analysis of variance using the Minitab General Linear Model feature and explain and interpret the results

– demonstrate the correspondence between two-sample t-tests and analysis of variance applied to the comparison of two means

-----------------------

[1] See Course Manual, Chapter 4, §4.3, pp. 20-28.

[2] Jaffe, P. R., Parker, F. L., and Wilson, D. J. (1982). Distribution of toxic substances in rivers. Journal of the Environmental Engineering Division, Proceedings of the American Society of Civil Engineers, Vol. 108, No. EE4, pp. 639-649.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download