TRANSFORMATIONS - NDSU - North Dakota State University

TRANSFORMATIONS

One of the assumptions of using ANOVA to test for significance is that the errors should be independently and normally distributed.

Randomization is used to break up any correlation of experimental units.

A problem that may influence this assumption is that the errors may be heterogeneous.

There are two types of heterogeneity.

1. Irregular: certain treatments possess considerably more variability that others. e.g. In insecticide trials, the checks may contain considerably more insects that the treated experimental units; therefore, the checks contribute to the Error MS to a larger degree than the treated units. Consequently, the standard deviation will be too large for comparisons among treated experimental units.

This portion of the experiment is not under statistical control.

The best procedure to compensate for this problem is to omit certain portions of the data from the analysis or use orthogonal contrasts.

2. Regular: arises from some type of non-normality of the data in the experiment.

This non-normality is caused by a relationship between the variability of several treatments and the mean.

To correct the problem, the data can be transformed such that the transformed errors are normally distributed.

Ways the Mean and Variance Can Be Related

1. Count data: e.g. number of infested plants per plot, number of lesions per leaf, etc. These type of data may follow a Poisson distribution where the mean equals the variance.

2. Binomial data: data in which only two outcomes are possible. For example, susceptible vs. non-susceptible, present vs. not present, etc.

Detecting the Presence of Variability Heterogeneity for a CRD

Step 1. For each treatment, compute the variance and the mean across replicates.

Step 2. Plot a scatter diagram of the treatment variances vs. the treatment means. The number of points in the scatter diagram equals the number of treatments.

Step 3. Visually examine the scatter diagram to identify the patter of relationship if any.

1

|

|

.

. .

|

|

|

. . ..

| . ...

| .....

|__________________________

Heterogeneous variance where the variance is proportional to the mean.

|

| .

.

. .

|

|

.

.

. .

| .

. .

.

.

| ... .. .

.

.

|

|__________________________

Homogeneous variance

|

..

|

|

|

|

.. . .

|

.

.... .

| .

.

. .. .

|__________________________

Outliers ? transformation will not work.

Causes of outliers 1. Mean(s) have high variability. 2. Errors in collecting data.

To determine if transformations are necessary for others designs, plot residual on the Y-axis and the predicted values on the X-axis.

2

Data Transformations 1. Logarithmic (Log10) transformation Appropriate for data where the standard deviation is proportional to the mean. Helpful when the data are expressed as a percentage of change. These types of data may follow a multiplicative model instead of an additive model. If the data set includes small values (e.g. less than 10), use the transformation Log(Y+1) instead of Log Y (Y is the original data). 2. Square root transformation Useful for count data (data that follow a Poisson distribution). Appropriate for data consisting of small whole numbers. In both these cases the mean may be proportional to the variance. Examples are the number of infested plants per plot, the number of insects caught in a trap, the number of weeds per plot (i.e. data obtained in counting rare events). This transformation also may be appropriate for percentage data where the range is between 0 and 20% or between 80 and 100%. If most of the values in the data set are less than 10, especially if zeros are present, the transformation to use is (Y+0.5)1/2 instead of Y1/2. 3. Arc sine square root transformation - Arc Sine (Y)1/2 Appropriate for data on proportions, binomial data, and data expressed as percent of control. The value of 0% should be substituted by (1/4n) and the value 100% by (100-1/4n), where n is the number of units in which the percentage data were based (i.e. the denominator used in computing the percentage.

3

The following rules may be useful in choosing the proper transformation scale for the percentage data derived from count data.

Rule 1.

For percentage data lying within the range of 20 - 80%, no transformation is needed.

Rule 2.

For percentage data lying within a range of either 0 - 20% or 80 ? 100%, but not both, the square root transformation could be useful.

Rule 3.

For percentage data that do not follow the ranges specified in either Rule 1 or Rule 2 (e.g. percent control data), the Arc Sine square root transformation may be useful.

Determining if a Transformation is Needed

Perform the ANOVA on untransformed data.

Check the residual vs. predicted value plots to determine if a transformation is needed.

If a transformation is needed, transform the data using the appropriate method.

Determine if the transformation corrected the problem of non-normality of the errors.

If the transformation did not correct the problem, then analyze and discuss the nontransformed data.

Performing an ANOVA Using Transformed Data

Perform the ANOVA using the transformed data.

The LSD used to compare differences should be calculated using the transformed error mean square.

Mean separation should be done using the LSD calculated from the transformed data.

When presenting means, untransformed means can be used. However, somewhere in you presentation or paper, it should be mentioned that transformed data were used to perform the ANOVA.

4

The SAS System

Example of Producing Residual Plots Using SAS

SAS Commands

options pageno=1;

data example;

input PLOT BLOC ENTRY HDDT HT

datalines;

3501 1

8

32 84.5 3

3502 1

30 32 82.5 3

3503 1

7

31 73.5 3

3504 1

26 36 64.5 1

3505 1

13 33 75.5 2

3506 1

18 31 76.0 2

3507 1

27 31 78.0 2

3508 1

4

34 81.5 4

3509 1

29 30 72.0 2

3510 1

19 30 80.5 2

3511 1

23 34 70.0 4

3512 1

5

33 85.0 5

3513 1

2

32 87.0 4

3514 1

22 35 79.0 4

3515 1

25 36 89.5 4

3516 1

6

31 83.0 2

3517 1

17 33 83.5 3

3518 1

3

32 84.0 3

3519 1

24 33 82.5 1

3520 1

21 32 80.5 1

3521 1

20 31 80.5 2

3522 1

16 31 75.5 1

3523 1

1

32 88.0 3

3524 1

14 33 82.0 2

3525 1

28 31 79.5 1

3526 1

10 31 85.0 3

3527 1

11 31 89.0 2

3528 1

15 33 75.5 1

3529 1

9

33 79.0 2

3530 1

12 32 105.0 2

3531 2

23 34 72.0 5

3532 2

8

32 88.5 3

3533 2

28 31 81.0 1

3534 2

6

32 81.5 2

3535 2

18 30 81.0 2

3536 2

2

33 84.5 3

3537 2

20 31 86.5 5

3538 2

9

32 79.5 3

3539 2

13 33 79.0 2

3540 2

27 31 73.5 3

3541 2

30 32 82.5 2

3542 2

21 33 77.0 2

3543 2

25 35 84.0 2

3544 2

12 32 93.5 3

3545 2

16 31 69.5 3

3546 2

29 30 72.5 1

3547 2

17 33 82.5 4

3548 2

14 33 80.0 2

LODG YIELD MOIST;

46.8 78.9 68.5 64.5 69.8 74.5 85.5 67.3 77.4 55.5 68.4 64.2 71.2 62.0 66.0 77.3 99.4 79.9 83.1 84.1 81.5 65.1 67.3 72.9 89.9 82.6 70.0 77.3 82.5 57.4 69.7 79.7 87.6 70.3 74.6 70.7 79.5 80.0 68.7 78.2 74.7 72.5 72.6 60.1 65.2 82.7 79.1 74.5

21.5 19.8 20.7 16.7 16.4 16.7 18.0 16.4 16.2 15.3 15.0 13.8 14.7 13.3 14.8 13.6 16.6 16.2 18.0 19.4 19.6 16.0 19.4 18.5 20.6 20.8 16.1 18.9 16.1 16.2 16.5 16.4 16.0 15.8 13.7 15.3 16.6 13.4 14.6 15.4 16.4 16.1 16.0 16.0 15.8 18.4 17.1 19.4

5

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download