V1004637 Errors of Measurement in Statistics - Carnegie Mellon University

TECHNOMETRICS

Errors

VOL. 10, No. 4

of Measurement

NOVEMBER 1968

in Statistics*

W. G. COCHRAN

Harvard

University

In this review of some of the recent work in the study of errora of measurement,

attention is centered on the type of mathematical model used to represent errom of

measurement, on the extent to which standard techniques of analysis become

erroneous and misleading if certain types of errors are present (and the possible

remedial procedures), and the techniques that are available for the numerical study

of errors of measurement.

1. INTRODUCTION.

This paper reviews some of the recent work in the study of errors of measurement as they affect data analysis. This is a difficult field, as evidenced by the

prolonged struggles of the econometricians to find satisfactory methods for

coping with such errors in their investigations of relationships between variables,

and by the slow rate of progress that has rewarded major efforts to study errors

of measurement in sample surveys. It is also a field of growing importance.

For example, the recent rapid increase in data-gathering projects in the social

and medical sciences is producing large bodies of data containing variables

obviously difficult to measure, such as people¡¯s behavior, opinions, feelings,

and motivations. Concurrently, there are signs of a rise in research interest

in addition to that stimulated by problems in econometrics and sample surveys,

and one of the objectives of this review is to encourage research in this area.

Attention will be centered on four aspects of the problem.

(1) The types of mathematical model that have been used to represent

errors of measurement.

(2) The extent to which errors of measurement are automatically taken into

account in the standard techniques of analysis, and the extent to which these

methods become erroneous and misleading if certain types of errors are present.

(3) The amount of harm done by errors of measurement, from the viewpoint

of the investigator, in producing unsuspected biases or reduced precision, and

the remedial procedures that are available to avoid these undesirable consequences.

(4) Techniques for the study of errors of measurement.

Before looking at mathematical models that attempt to describe the statistical properties of errors of measurement, a reminder about some of the

Received June, 1968.

* This work wa+~facilitated by a grant from the National Science Foundation (C&341),

and by the Office of Naval Research, Contract Nonr 1866(37). Thii paper is hmed on the

Rieta lecture of the Institute of Mathematical Statistics, Washington, D. C., December, 1967.

637

638

WILLIAM

G. COCHRAN

different situations in which measurements are made may be useful. The quantity

being measured may be fixed and static through time. It may be fixed at a

point in time, but vary with time. Sometimes we deliberately measure the

wrong quantity, using a substitute measurement that is cheaper or more convenient, as for example in the standard manometer used for measuring blood

pressure by a sleeve wrapped round the arm instead of a direct measurement

of intra-arterial

blood pressure. Sometimes we measure the wrong quantity

because we don¡¯t know how to measure the right quantity. I once spent some

time trying to dev se a practical method of measuring the strength of maternal

affection in the dairy cow, but did not get far, When the quantity being measured

varies with time, we may in effect be using the wrong measurement if we treat

measurements made some months earlier as if they applied to the situation

today. Sometimes the error of measurement is actually a sampling error. The

amount of insect infestation on a plot in a field experiment may be estimated

by taking from the plot a small sample of plants and recording the mean number

of insects per plant in the sample. As Mahalanobis, 1946, has pointed out,

there may be some margin of physical uncertainty in what is being measured

even at a fixed point in time; the yield of wheat may increase by 4-570 owing

to moisture when the rainy season begins in the parts of India.

To turn to the measuring instrument, measurement and recording may be

completely automatic. Some human action may be involved, but of a simple

and easily checkable type, as in reading a clear, stationary dial. The human

action may involve complex subjective judgment to a greater or less degree.

Several different types of human action may enter. In many sample surveys

it is recognized that errors of measurement can arise from the person being

interviewed, from the interviewer, from the supervisor or leader of a team

of interviewers, and from the processor who transmits the information from

the recorded interview on to the punched cards or tapes that will be analysed.

The preceding examples, though incomplete, suggest that a variety of mathematical models, some simple, some quite complex, may be needed to describe

realistically the types of measurement error relevant to different measurement

problems. Since the introduction of terms representing the presence of errors

of measurement nearly always seems to complicate the mathematical analysis,

two obvious maxims in this field are to use as simple a model as will reasonably

represent the facts and to keep oneself aware of other people¡¯s work. The same

model may turn up in very diverse applications and if the consequences of a

certain model have already been worked out, one might as well take advantage

of this. Concurrently, the investigator has a responsibility for producing evidence

that his model does fit adequately. From the literature my impression is that

investigators have needed no urging to use simple models, but are not so well

provided with data that justify the assumptions made in the model, and have

not always worked out the full consequences of the errors of measurement.

For instance, most textbooks on sample survey theory, including my own,

present all the standard theory on the assumption that there are no errors of

measurement, and merely indicate some of the disturbances caused by errors

of measurement briefly in a later chapter.

One common type of error of measurement-the

occasional gross error-will

ERRORS OF MEASUREMENT

639

IN STATISTICS

not be discussed in this paper. Since computers can print and examine residuals

from a fitted model, there is current interest in developing programs by which

any likely gross errors are automatically

signposted for the investigator to

examine and consider. In my opinion, much work of practical interest remains

to be done, for instance, more testing of the performance of such methods in

the presence of more than one gross error, and further study of the merits of

methods like censoring that are less affected by the presence of gross errors.

PART I. SOMESIMPLE

MODELS

2. Continuous variates.

The subscript u = 1, 2, . * * , n, will denote the member of the sample, while

the symbol Y, refers to the recorded measurement. The symbol y,, denotes the

correct or true measurement. For exposition of theory, the symbol yU is sometimes introduced even when we do not know how to make a correct measurement, though some workers are unhappy with this practice.

The error of the measurement on the uth unit is d, = Y, - yU . With some

measurement processes we can conceive of repeated measurements of the same

unit, and with some simple non-destructive processes we can actually carry

out such repeated measurements. The subscript t will refer to the tth trial or

repeated measurement. Thus we write

Y¡±, = Y¡± + CL

(1)

In this representation, both Y,, and dUf have a frequency distribution for each

member of the population (i.e. for fixed u), whereas yU is assumed fixed for any

specific member of the population.

The simplest model is one in which

(2-l)

E(d,, 1 u) = 0;

E(di, 1 u) = 2;

E(&t

E(d,, , CL,,) = 0

, 4, 1 u, v> = 0,

(t # t¡¯);

u # 21.

In this model the errors are unbiased and have constant variance. They are

uncorrelated with the correct values, with one another on different units, and

on different trials for the same unit. This model may be expected to apply

to some of the simplest measurement processes. It has been used even without

evidence that it really applies, because it is sometimes the only model for which

the consequences of the errors have been successfully worked out. The additional assumptions that the d,, are normal and independent in the probability

sense are often adopted when needed. Sometimes we write E(dZ, 1 u) = g: ,

because some units are more difficult to measure precisely than others.

The next stage in the model is to recognize that the measuring instrument

may have an overall bias of amount a, writing

(2)

Y?Lt = yu + a + d,, )

where assumptions (2.1) about the d,, still apply.

As an introduction to the third stage, I would like to jump back historically

to Karl Pearson¡¯s 1902 paper on the mathematical theory of errors of measure-

640

WILLIAM

G. COCHRAN

ment. Pearson was interested in the nature of errors of measurement when the

quantity being measured is fixed and definite, while the measuring instrument

is a human being. He conducted two experiments, each having three persons

as measurers. In the first experiment, each person was presented with the same

set of lines of unequal lengths drawn on paper, and was asked to bisect each

line freehand. The exact middle of each line and the error of measurement

was recorded. In the second experiment the task of the measurer was a little

harder. A bright line of light slowly traversed a white strip on a black screen.

When a bell sounded, each observer drew a mark across a line lying before

him on a sheet of paper, to mark the proportional distance of the beam across

its strip when the bell sounded. Again, the exact position of the bright line

and the error of measurement were carefully noted. The sample sizes were

500 in the bisection series and 520 in the bright line series.

The principal conclusions drawn by Pearson from his analysis of the results

were as follows.

(1) In 5 of the 6 cases (3 measurers, 2 experiments), the mean errors differed

significantly from zero, the sixth being almost significant at the 5yo level.

This is the a term in model (2). These overall biases varied in size and direction

from one person to another.

(2) For a given measurer, the size of t,he bias varied throughout the series

of trials when the errors were grouped in successive sets of 25. This suggests that

with a human measurer the model may need a term a, representing a bias that

varies from one sample member to another.

(3) The errors were not in general normally distributed, but exhibited both

some skewness and some kurtosis.

(4) To come to a result that startled Pearson, the errors of two apparently

independent observers in measuring the same quantity were positively correlated in 5 cases out of 6. This phenomenon is well known in interview surveys,

though the explanations given there do not apply to Pearson¡¯s experiments.

When two independent interviewers question the same person, the respondent

may give an erroneous answer to the first interviewer and simply repeat the

same erroneous answer to the second through a conscious or unconscious effect

of memory. Alternatively,

on a delicate question of opinion, both interviewers

may have the same point of view and may induce the same erroneous answer

because of the way in which they ask the questions. Pearson¡¯s result can have

a simple explanation even when the quantity being measured is fixed and

objective. If two measurers both tend to underestimate high values and overestimate low values, these negative correlations between dUt and yu induce a

positive correlation between the errors dUl , dUl, of the two measurers. Pearson

does not mention this possibility in connection with his results and does not

present the results in a way which enables one to examine whether an effect

of this type was present. Instead, he uses some uncharacteristically

vague

language, such as (p. 412) ¡°certain factors affected by the immediate atmosphere

seem to be common elements of two or more personalities, and there results

from this a tendency in each pair of observers to judge in the same manner¡±,

and later (p. 433), ¡°this psychological or organic correlation.¡±

Pearson¡¯s results suggest several possible elaborations of model (2). The

ERRORS OF MEASUREMENT

641

IN STATISTICS

simplest is to introduce a ¡°variable bias¡± term a, , and to make the additional

assumption that the a, are uncorrelated with the correct values yU ; that is,

(3)

Y,, = yy + a, + c-L ,

cov (a, ) yJ = 0.

With the same algebra, we may be unwilling

uncorrelated, leading to

(4)

Yut = yu + a, + d,, ,

to assume that a, and y, are

cov (a, ) y¡±) # 0.

Further, Pearson¡¯s results indicate that intercorrelations in errors of measurement within the sample may occur. In sample surveys the most prominent

effect of this kind, and the one most studied, arises from the personal biases

of the interviewers, which produce an intra-interviewer

covariance term that

is positive. Pearson¡¯s results also constitute a warning that when we come to

study errors of measurement by havin, v two different observers measure the

same unit, their errors may not be independent.

The introduction of these correlation terms into the mathematical model

complicates the subsequent analysis, and will be postponed until section 17.

Much of the work that has been done outside of sample surveys assumes,

rightly or wrongly, that errors on different members of the sample are uncorrelated. For the present, therefore, we revert to model (4), with the conditions

(2.1)

E(d,, 1 u) = 0;

E(d:, 1 u) = CT:;

E(d,, , d,, 1 u, v) = 0,

-WL

, d,,.) = 0,

t # t¡¯;

u # v.

For some applications it is convenient to amalgamate the terms y, and a,

by writing y: = yU + a, , particularly when no feasible method of measuring

yU itself is known. The symbol y; might be called the operationally correct

value for the uth unit. Model (4) then reduces to the simpler form of model

(l), i.e.

(4)¡¯

Yut = Y: + d,t ,

though with the above difference in interpretation.

One less general form of model (4) is worth mention, since it has already

proved useful. If the relation between the variable bias a, and the correct

value yU is expressible as a linear regression of a, on yU with regression coefficient

y, we may rewrite model (4) as

(44

Yut = a + PY~ + a, + dt., ,

E(G)

= 0,

cov (G , Y¡±) = 0

where P = 1 + y. In (4a) I have reused the symbol a, to represent the deviation

from this regression, so that the covariance of a, and yU is now zero. This model

is the basis of Mandel¡¯s, 1959, theory of errors of measurement as applied to

the analysis of interlaboratory

tests. In Mandel¡¯s application, equation (4a)

applies to a single laboratory, the values of a, 8, ui , and ~2 possibly varying

from laboratory to laboratory. This model may also be appropriate when

different judges are scaling the same set of subjects on a O-10 scale. If the

scaling involves a subjective element, some judges may be reluctant, while

others are willing, to assign very high or very low values. Thus @ will vary

from judge to judge.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download