QUESTIONS, ANSWERS AND STATISTICS Canberra, Australia - …

[Pages:11]ICOTS 2, 1986: Terry Speed

QUESTIONS, ANSWERS AND STATISTICS

T e r r y Speed CSIRO Division of Mathematics and Statistics

Canberra, Australia

A major point, on which Icannot y e t hope for universal agreement,

. . . i s that our focus must be 'on questions, not models.

Models can

- - a n d w i l l g e t u s in deep troubles ifwe expect them t o t e l l us what

the unique proper questions are.

J.W. Tukey (1977)

.1 Introduction

I n my view the value o f statistics, by which 1 mean both data and the techniques we use t o analyse data, stems from i t s use i n helping us t o give answers of a special t y p e t o more or less well defined questions. This is hardly a radical view, and not one with which many would disagree violently, y e t I believe t h a t much of the teaching of statistics and not a l i t t l e statistical practice goes on as if something quite different was the value of statistics. Just what the other t h i n g is I f i n d a l i t t l e hard t o say, b u t it seems t o be something like this: t o summarise, display and otherwise analyse data, o r t o construct, fit, test and evaluate models f o r data, presumably i n the belief t h a t if t h i s is done well, all (answerable) questions i n volving the data can then be answered. Whether t h i s is a fair statement o r not, it is certainly t r u e t h a t statistics and other graduates who f i n d themselves working with statistics i n government or semi-government agencies, business o r industry, i n areas such as health, education, welfare, economics, science and technology, are usually called upon t o answer questions, not t o analyse o r model data, although of course the latter will i n general be part of their approach to providing the answers. The interplay between questions, answers and statistics seems t o me t o be something which should interest teachers of statistics, f o r if students have a good appreciation of this interplay, they will have learned some statistical thinking, not just some statistical methods. Furthermore, I believe t h a t a good understanding of this interplay can help resolve many of the difficulties commonly encountered i n making inferences from data.

My primary aim i n this paper is quite simple. I would like t o encourage you t o seek o u t o r attempt t o discern the main question of interest associated with any given set of data, expressing this question i n the (usually nonstatistical) terminology of the subject area from whence the data came, before you even t h i n k of analysing or modelling the data. Having done this, I would also like t o encourage you t o view analyses, mode.ls etc. simply as means towards t h e end of providing an answer t o the question, where again the answer should be expressed in the terminology of the subject area, although there will always be the associated statement of uncertainty which characterises statistical answers. Finally, and regrettably this last point is b y no means superfluous, I would then encourage you t o ask y o u r -

ICOTS 2, 1986: Terry Speed

self whether t h e answer y o u gave r e a l l y did answer t h e question o r i g i n a l l y posed, a n d n o t some o t h e r question.

A secondary aim, w h i c h I cannot hope t o achieve i n t h e time p e r m i t t e d t o me, w o u l d b e t o show y o u how many common d i f f i c u l t i e s experienced in a t tempting t o d r a w inferences from data can b e resolved by carefully framing the question of interest and the form of answer sought. A few remarks on t h i s aspect are made i n Section 6 below.

2. W h y speak on t h i s topic?

O v e r t h e y e a r s I have h a d many experiences w h i c h have lead me t o t h i n k that the interplay between questions, answers and statistics is worthy of consideration. L e t me b r i e f l y mention four, each o f a d i f f e r e n t t y p e .

T h e f i r s t experience i s a common one f o r me. Someone i s d e s c r i b i n g an

application o f s t a t i s t i c s i n some area, say biology. T h e speaker u s u a l l y

begins with an outline o f t h e background science and goes on t o give an

often detailed description of the data and how they were collected. This

p a r t is new and interesting t o any statisticians listening, most of whom will

b e unfamiliar w i t h t h a t particular p a r t o f biology. Sometimes t h e biologist

who collected the data is present and contributes t o the explanation, b u t

a t a certain stage t h e statistician starts t o explain what she/he did w i t h

t h e data, how t h e y were "analysed". B y now the biologist is quiet, de-

ferring t o the statistician on all matters statistical, and terms like main ef-

fects, regression lines, homoscedacity, interactions, and covariates fly

around the room. Sooner o r later I find myself thinking "Here are the

answers, b u t what was t h e question?" All too frequently in such presenta-

tions neither the statistician nor the biologist has posed the main question

o f biological interest in non-statistical terms, t h a t is, in terms which are

independent o f analyses o r models which may o r may not be appropriate f o r

the data, and I can certainly remember occasions when the analysis p r e -

sented was seen t o be inappropriate once t h e forgotten question was formu-

can lated. O f course many scientific questions

be translated into state-

ments a b o u t parameters i n a statistical model, so t h a t 1 am n o t condemning

all instances of the above practice.

A similar sort of experience is surely familiar t o all who have helped people

with their statistical problems. This time a scientist, say a psychologist,

comes t o me w i t h a set o f data and one o r more questions. She/he knows

some statistics, o r a t least some o f t h e jargon. A f t e r b e i n g b r i e f e d o n t h e

background psychology and t h e mode of collection of t h e data 1 usually say

. . . something l i k e "What questions d o y o u w a n t t o answer w i t h these data?",

implicitly meaning "What psycholoqical questions

?" Not infrequently

t h e answer comes back "Is t h e difference between such and such s i g n i f i -

cant?" meaning, o f course, statistically significant. [I-n m y p e r v e r s i t y I

o f t e n think t o myself: "Well, you should know; it's y o u r data a n d y o u a r e

t h e psychologist! "1 A n o t h e r similar q u e r y m i g h t concern interactions, o r

regression coefficients o f covariates etc. What t h i s has i n common w i t h t h e

previous example is the unwillingness o r inability of the psychologist t o

state h e r / h i s questions o f i n t e r e s t in nonstatistical terms. We s h o u l d all b e

familiar with t h e idea t h a t scientific (e.9. psychological) significance and

ICOTS 2, 1986: Terry Speed

statistical significance a r e n o t necessarily t h e same thing, b u t how many o f us keep in mind the fact that the latter involves an analysis o r a statistical model, and t h a t t h e r e may b e as many answers t o t h i s question as t h e r e are analyses o r models? Surely much o f the blame for such thinking rests w i t h us, t h e teachers o f statistics, who n e v e r fail t o popularize t h e rigid formalism o f Neyman- Pearson testing theory.

My third type of experience concerns recent graduates in statistics, stu-

dents I and m y colleagues have taught and whom we believe should be able

t o operate independently as statisticians. Many o f these graduates g o i n t o

jobs i n big public enterprises: railways, a g r i c u l t u r e bureaux, mining com-

- panies, government departments a n d so on, a n d a few f a r too many f o r

- comfort g e t in t o u c h w i t h u s when t h e y meet a d i f f i c u l t y i n t h e i r new

fact job. It is n o t t h e

that they get in touch which is discomfiting, b u t the

questions they ask! For we then learn how little they have grasped. They

have questions in abundance, often important policy questions, access t o

lots of data, o r a t least the possibility of collecting any data t h a t t h e y

deem necessary, b u t t h e y are quite unsure how t o proceed, how t o answer

the questions. Out there i n the world there are "populations" of real

trains, field plots, cubic metres of ore o r people, and even the simplest

question relating t o a mean o r a proportion o r a sample size can b e f o r -

b i d d i n g . Perhaps t h e y should standardize something t o compare it w i t h

something else, perhaps include t h e variability o f one factor when ana-

lysing another, o r something else again, all things which we feel t h a t a

graduate of our course should be able t o cope with unaided. But how well

did we t r a i n them f o r t h i s experience?

- Finally, a n d b r i e f l y , l e t me castigate m y professional colleagues a n d m y - self, since I am n o exception f o r allowing ourselves t o f o r g e t t h e f u n d a -

mental importance of the interplay of questions, answers and statistics, for in so many o f o u r professional interactions we act as if it is i r r e l e v a n t . How many times have we presented new statistical techniques t o one another, illustrated on sets of "real" data, drawing conclusions about those data concerning questions no one ever asked, o r is ever likely t o ask? A n d how often do we derive statistical models o r demonstrate properties of models which are unrelated t o any set o f data collected so far, and certainly n o t t o a n y questions f r o m a substantive f i e l d o f human endeavour. We are, so we tell ourselves, simply adding t o t h e stock o f statistical methods a n d models, f o r possible l a t e r use. I s it a n y wonder t h a t we o r o u r coworkers then f i n d ourselves using these models and methods in practice, regardless of whether o r not they help us t o answer the main questions of i n t e r e s t . For a discussion o f some closely r e l a t e d issues o f g r e a t relevance t o teachers o f statistics, see t h e t w o excellent articles Preece (1982, 1986).

3. Why t h i s audience?

I don't t h i n k I w i l l b e v e r y wide o f f t h e mark if 1 assume t h a t most o f y o u a t least t h e a c t i v e teachers o f statistics amongst y o u - have come f r o m a

background of mathematics rather than statistics, and that few of you have actually been statisticians before you started teaching the subject. I would f u r t h e r guess t h a t many of you still teach mathematics, and perhaps a t the

ICOTS 2, 1986: Terry Speed

school level, statistics within a mathematics. curriculum. It is on t h i s assumption that I have chosen t o focus on non-mathematical aspects of our subject, ones with which I feel you will generally be less familiar. As I said in the introduction, I hope that my talk will encourage you t o give more attention t o the non-mathematical aspects of statistics i n your teaching, in particular t o spend more time considering real questions of interest with real sets of data.

It is a curious t h i n g t h a t interest i n t h e teaching of statistics i n schools, colleges and universities has sprung u p worldwide as an extension of mathematics teaching, because I certainly feel t h a t t h e practice of statistics is no closer t o mathematics than cooking is t o chemistry. Both mathematics and chemistry are reasonably precise subjects i n t h e i r own ways, and i n general what goes on i n them both is repeatable; perhaps they are t r u e sciences. On the other hand, statistics and cooking are as much arts as t h e y are science, although both have strong links t o t h e i r corresponding science: mathematics i n the case of statistics, and chemistry i n t h e case of cooking. Who would recommend t h a t a chemistry teacher with no cooking experience be appointed as cooking teacher as well? If I can convey t o you some of the enjoyment and intellectual challenge t h a t lies i n my particular variety of cooking, and encourage you t o try it yourself, I will have succeeded i n my aims.

4. Two further examples

I n this necessarily too brief section I offer two more concrete illustrations of interplay of the questions, answer and statistics. The f i r s t bne is a v e r y simple paraphrase of Neyman's classic illustration of hypothesis testi n g involving X - r a y screening f o r tuberculosis, and I refer you t o Neyman (1950, Section 5.2.1) f o r a fuller background and f u r t h e r details.

You have a single X - r a y examination and, after the photograph has been read b y the radiologist, you are given a clean bill of health, t h a t is, you are told that there is no indication that you are affected b y tuberculosis. With Neyman we will assume t h a t previous experience has led t o

pr(clean b i l l ]no TB) = 0.99

pr(cleanbillpB) =0.40

You now ask the radiologist "What are the chances t h a t I have T B ? " She

says "I can't answer t h a t question b u t I can say this: Of the people with

T B who are examined i n t h i s way, 60% are correctly identified as having

. . . TB, and of

" You i n t e r r u p t her. "Doctor, I know the procedure is

. . . imperfect, b u t you have j u s t examined my X-ray

What are the

chances that I have TB?"

If your radiologist is sufficiently flexible and well informed, she will answer "Well, t h a t depends -on t h e prevalence of T B i n your population, t h a t is, on the proportion of people affected b y T B i n the (a?) population from which you may be regarded as a typical individual". Indeed a simple application of Bayes' theorem yields:

ICOTS 2, 1986: Terry Speed

pr(TB Iclean bill) = prlclean bill I T B ) ~ ~ ( T B ) pr(clean bill)

A t last you see how t o get an answer t o your question. It may not be easy

. . . t o obtain a value f o r p r ( T B ) : your smoking habits, the location of y o u r

home, your occupation, your ancestry

may all play a p a r t i n defining

"your population", b u t this is what is needed t o answer the question and it

is far better t o recognise this than t o fob you off with the answer t o an-

other question not of interest to you.

If this example smacks of Bayesian statistics it is not entirely accidental, f o r there are many occasions where the Bayesian view (which is certainly not necessary i n this example) helps answer the question of interest, whereas classical statistics refuses, frequently answering another, u n asked, question instead. For a more complex, explicitly Bayesian example, see the v e r y fine paper Smith and West (1983) concerning the monitoring of renal transplants.

My second example concerns the determination of the age of dingos, Australia's wild native dogs. A statistician was given a large body of data r e lating the age of a number of dingos t o a set of physical measurements i n - cluding head length. The data concerned both males and females, a number of breeds and animals from a number of locations, b u t for t h i s discussion we will r e s t r i c t ourselves t o a single combination of sex, breed and location. The question, or at least the task, t o be addressed was the following: produce an age calibration curve f o r dingos based upon t h e most s u i t able physical measurement, t h a t is, produce a curve so t h a t the age of a dingo may be predicted by reading o f f the curve at the value of the p h y sical measurement. This curve was f o r use i n the field and it was taken f o r granted t h a t an estimate of the precision of any age so predicted would also be obtained.

It was found that a curve of the general form h = a + b [ l - exp(-ct)],

where h and t are head length and age, respectively, and a, b and c are

parameters of the curve, f i t t e d the data from each dingo extremely well over the range of ages used. This was an exercise i n non-linear regression with which the statistician took great care, special concern being given t o the different possible parametrizations of the curve, the convergence of t h e numerical algorithm used, the residuals about t h e fitted line and t o the validity of the resulting confidence intervals f o r a, b and c. The param-

eters estimated for different dingos naturally differed, although, not sur -

prisingly, the values of a (head length at b i r t h ) showed less variation than those of b (ultimate head length -a) and c (a growth rate parameter).

A l l this seems fine, and you might wonder why i am mentioning t h i s

example a t all in the present context. My answer is as follows. The statistician in question knew, or knew where t o find, lots of information about t h e f i t t i n g of individual growth curves, and so he focussed on this aspect of the problem. To answer the original question, however, his attention

ICOTS 2, 1986: Terry Speed

should have been pointed in quite a different direction, towards: the calculation of a population or group growth curve for the calibration procedure; features o f the sample of dingos measured t h a t may affect t h e use o f t h e i r measurements as a basis f o r the prediction of t h e age of a new dingo; properties of the parameters which are relevant t o t h i s question; and, finally, towards obtaining a realistic assessment o f the prediction e r r o r inherent in the use of the curve in the field.

I n summary, he was willing and able t o spend a lot of time on t h e i n d i v i d ual animals' curves; he was less willing and less able t o focus on t h e issues demanded b y the question, those concerning population parameters, population variability, problems of selection, u nrepresentativeness, and other issues including the use of normal theory, with real b u t not v e r y well defined populations.

5. What is t h e problem?

Let me oversimplify and p u t my message like this. I n the beginning we taught mathematics and called it statistics; much of t h i s was probability, a quite d i f f e r e n t subject. Then, with the help of computers, we started t o teach data analysis and statistical modelling; t h i s was fine apart from one feature: it was largely context-free. The real interest ( f o r others and many statisticians), the important difficulties and the whole point of statistics lies i n t h e interplay between the context and t h e statistics, t h a t is, i n the interplay between the items of my title.

Let me offer a few similar views. A.T. James (1977, p. 157) said i n the discussion of a paper on statistical inference:

The determination of what information in the data is relevant can only

. . . be made by a precise formulation of the question which the inference

is designed to answer.

If one wants statistical methods to prove

reliable when important practical issues are at stage, the question

which the inference is to answer should be formulated in relation to

these issues.

Cox (1984, p. 309) makes the following characteristically b r i e f contribution to our discussion :

It is t r i t e that specification of the purpose of a statistical analysis is

important.

Dawid (1986) is even more t o the point:

Fitting models is one thing; interpreting and using them is another,

. . . If the model is correct and we know the parameters, how ought

. . . we to compare [schools]?

There is in fact no unique answer; it

. . . all depends on our purpose.

there remains a strong need for a

careful prestatistical analysis of just what is required: following

which i t may well be found that i t is conceptually impossible to esti-

mate it!

Tukey and Mosteller (1977, p. 268) offer seven purposes of regression, or, as I would paraphrase it, seven types o f questions which regression analysis may help answer. Summarized, these seven purposes are:

ICOTS 2, 1986: Terry Speed

1. t o get a summary; 2. t o set aside the effect of a variable; 3. as a contribution t o an attempt a t causal analysis; 4. t o measure the size of an effect; 5. t o try t o discover a mathematical o r empirical law; 6. f o r prediction; 7. t o get a variable o u t of the way.

Similarly, T u k e y (1980, pp. 10-11) gives the following six aims of time series analysis; 1. Discovery of phenomena. 2. "Modelling". 3. Preparation f o r f u r t h e r inquiry. 4. Reaching conclusions. 5. Assessment of predictability.. 6. Description of variability.

Similar numbers of aims, purposes, o r types of questions could be given f o r the analysis of variance, the analysis of contingency tables, multivariate analysis, sampling and most other major areas of statistics. Yet how often, do our students meet these techniques i n context with even one of these aims, much less the f u l l range? And how else are t h e y going t o learn t o cope with the special difficulties which arise when questions are asked of them i n context whose answers require statistics? This is the problem.

6. Some General Comments

I n this section I will mention a few difficulties which I believe can be r e solved i n a given case when the relation between the questions asked, t h e form of the answers desired and the statistical analysis t o be conducted are carefully considered. A f u l l discussion of any one of the difficulties is o u t of the question, and even if t h a t had been given, there would probably remain an element of controversy, something which would be o u t of place i n a talk like this. The section closes with some f u r t h e r general comments about questions.

Some elementary difficulties which I t h i n k arise include

? What is t h e population?

When are population characteristics (e. g . proportions) relevant?

ICOTS 2, 1986: Terry Speed

What is the "correct" variance t o attach t o a mean o r proportion?

0 When should we standardize (for comparison)?

I have found t h a t t h e relations between statistical models and analyses on the one hand, and populations and samples on t h e other, with parameters playing a role i n both, are something which puzzle many students of our subject. The former play a b i g role i n standard statistics courses whereas the latter are prominent i n applications. Just how they connect is not a trivial matter.

A few somewhat more advanced difficulties include

0 Which regression: y on x, x on y o r some other?

0 When should we use correlation and when regression analysis?

When can/should we adjust y f o r x?

9 Which e r r o r terms do we compare (in anova)?

Should we regard a given effect as fixed o r random?

0 Which classifications (of a multiway table) .correspond t o factors and which t o responses?

More subtle difficulties are associated with general questions such as

9 Should we do a joint, marginal or conditional analysis?

I believe t h a t i n all of t h e above cases the difficulties arise because insufficient attention has been given t o the nonstatistical context i n which the discussion is t a k i n g place, and t h a t when t h e question of interest is clarified and the form of answer sought understood, the difficulty either disappears completely o r is readily resolved. Of course doing so takes some experience. Note that many of the difficulties listed involve, implicitly o r explicitly, the notion of conditioning, or i t s less probabilistic forms, standardizing o r adjusting. Just what we regard as being "held fixed" and what we "average over" i n any given context is crucial, and here our questions and answers determine everything. The simplest form of this issue is usually: "Are we interested i n j u s t these units (the ones we have

- seen), o r i n some population of units from which these may be regarded as

a (random?) sample, o r both?" Models are no help here.

A simple b u t easy t o forget aspect of the use of a statistical method is that not all questions which could be asked and answered b y that method, are necessarily appropriate i n a particular context. Lord's paradox, see Cox & McCullagh (1982) and references therein, provides a good example here.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download