FROM STATISTICAL SIGNIFICANCE TO EFFECT ESTIMATION

[Pages:275]FROM STATISTICAL SIGNIFICANCE TO EFFECT ESTIMATION:

STATISTICAL REFORM IN PSYCHOLOGY, MEDICINE AND

ECOLOGY

Fiona Fidler

Submitted in total fulfilment of the requirements of the degree of Doctor of Philosophy November 2005

Department of History and Philosophy of Science The University of Melbourne

ii

ABSTRACT

Compelling criticisms of statistical significance testing (or Null Hypothesis Significance Testing, NHST) can be found in virtually all areas of the social and life sciences-- including economics, sociology, ecology, biology, education and psychology. Because it is the overwhelmingly dominant statistical method in these sciences, criticisms need to be taken seriously. Yet, after half a century of cogent arguments against NHST and calls to adopt alternative practices some disciplines, such as psychology, show little sign of change. One obvious question is `why?' Why are psychological researchers so unwilling to abandon this flawed practice? In this thesis I attempt to answer this question, and compare their practice with other disciplines.

In medicine, effect estimation (in the form of confidence intervals, CIs) was institutionalised in the 1980s through strict and enforced journal editorial policy. It was facilitated by the timely rewriting of textbooks and statistics curricula. The transition was perhaps straight-forward, given the interaction between medical researchers and statisticians, and the processes of statistical editing and reviewing in the discipline. Whilst medicine remains far from a perfect paradigm of statistical practice, it has, on this narrow criterion--deemphasising statistical significance in favour of effect estimation--progressed further than psychology. Ecology too seems to have made some recent progress, though reform remains in nascent stages.

In the absence of adequate guidance from institutions such as the American Psychological Association, and the absence of appropriate editorial pressure, statistical reform in psychology has an uncertain future. What is lacking in psychology (and other disciplines) is an evidence base for statistical reform. This will entail providing empirical justification for adopting alternatives to NHST, and evidence-based guidance for implementing and interpreting those alternatives. The preliminary empirical work in this thesis suggests that CIs do indeed have the necessary cognitive advantage.

iii

DECLARATION

This is to certify that (i) the thesis comprises only my original work towards the PhD except where indicated in the Introduction; (ii) due acknowledgement has been made in the text to all other material used, (iii) the thesis is less than 100,000 words in length, exclusive of tables, maps, bibliographies and appendices.

................................................................................. Fiona Fidler

iv

PREFACE

During the time I have been writing this PhD, I have had the opportunity to work on two relevant Australian Research Council funded projects--one on statistical reform in psychology, the other on statistical reform in ecology--which have resulted in a number of joint publications. In the chapter outline that follows I explain which sections of the thesis form those publications.

Chapter One documents the uptake of NHST in the three disciplines, exploring in particular the early and strong attachment psychology had to the methods. Results of the journal survey presented in this chapter, and some of the surrounding discussion, has been published as Fidler, Cumming, Burgman and Thomason (2004).

Chapter Two has two distinct parts. First, I catalogue the many criticisms that have been made of NHST over the last half a century, in several disciplines. None of these criticisms are original; they have all been documented before. This part of the chapter, then, is merely a literature review. Second, I review and evaluate defences of NHST that have appeared relatively recently in psychological literature.

In Chapter Three, I argue that typical NHST practices have damaged the progress of these three sciences. As evidence, I provide a series of case studies of particular research programs in psychology, medicine and ecology that have been led astray or otherwise disrupted by problems associated with NHST.

In Chapter Four I turn specifically to statistical reform in psychology. Here I provide a history of particular events--published criticisms, editorial and institutional interventions--aimed at improving statistical practice in psychology, and evaluate the impact of these events. Part of this chapter, specifically the survey of the effects of Philip Kendall's editorial intervention at the Journal of Consulting and Clinical Psychology, has been published. The results and a more extensive discussion of the findings can be found in Fidler, Cumming, Thomason et al. (2005). In this chapter, I demonstrate that to date there has been limited response to reformers' calls for a change in psychology. I also introduce an extensive series of interviews with advocates of reform and members of the APA Board of Scientific Affairs' Task Force on Statistical Inference (TFSI), which provide material for several subsequent chapters. (A full list of interviewees and correspondents follows my Acknowledgements.)

v

Chapter Five critically evaluates attempts to reform the statistical guidelines in the fifth edition of the APA Publication Manual (2001), pointing to many reasons why new recommendations are likely to be unsuccessful at motivating change. This chapter is a very slightly modified version of Fidler (2002) and relies heavily on the interviews I described above.

Chapter Six chronicles reform events in medicine (as Chapter Four did for psychology), demonstrating a dramatic shift from NHST to CIs in the mid 1980s. One section of this chapter reports a survey of Ken Rothman's editorial reforms at The American Journal of Public Health and Epidemiology. This survey and some of the surrounding discussion was published as Fidler, Thomason, Cumming, Finch and Leeman (2004).

Chapter Seven compares reform in psychology and reform in medicine and asks why medicine was able to institute changes to reporting practices when psychology has largely failed to. It also acknowledges that medicine is far from a perfect paradigm of practice, and that both disciplines have some way to go. Some of the discussion here was also published in Fidler, Cumming, Burgman and Thomason (2004).

The focus of Chapter Eight is Ecology. Reform in ecology has progressed in a reasonably different fashion to either psychology or medicine. It is far more focused on Bayesian and information theoretic methods than effect sizes and CIs. Results from the journal survey presented, and surrounding discussion, have been accepted for publication in Conservation Biology (Fidler, Burgman, Cumming, Buttrose & Thomason, 2005).

Chapter Nine introduces the notion of evidence-based statistical reform, based on statistical cognition research. As I have explained, an estimation approach has been advocated (as a supplement or replacement to NHST) in many sciences for decades. Yet, few empirical questions have been asked about whether this new approach will be better understood, alleviate widespread misconceptions or lead to more substantial interpretations of research findings. Chapters Nine and Chapter Ten present my preliminary empirical efforts to establish such a research program.

Results from studies presented in Chapter Nine provide empirical evidence that CIs help alleviate a particularly serious misconception associated with NHST, namely that statistical non-significance is equivalent to evidence of `no effect'. However, there is less positive news in Chapter Ten. Studies in Chapter Ten reveal that CIs themselves are prone to a new set of unexpected and uncontroversial misconceptions. It is too early

vi

to tell whether this is simply because CIs are unfamiliar. Perhaps with adequate training, better presentations and appropriate guidelines, such misconceptions would simply disappear? Whilst this question remains largely unanswered, these two chapters highlight the remarkable fact that, to date, statistical reform has been advocated--and in some disciplines even instituted--without an evidence base. One of the most compelling arguments against NHST is its tendency to be misinterpreted. If it is to be abandoned largely because of this, then surely the onus is on us to provide some evidence that whatever replaces is will be less frequently misunderstood. I conclude this thesis with some brief thoughts about future research directions.

Note: For empirical studies reported in this thesis I have calculated 95% CIs for proportions. Some CIs have been calculated according the method recommended for proportions by Newcombe and Altman (2000); others have been calculate using the standard non-corrected formula for CIs for proportions (z*(p*(1-p)). Those calculated using the standard formula are therefore prone to the problems Newcombe and Altman suggest. These CIs were calculated and published before I became aware of Newcombe and Altman's recommendations. I have chosen to leave them in their original form so that they correspond with already published figures. In cases where I have used Newcombe and Altman's method (for work already published using this method or unpublished material) this is indicated in text, tables and/or figure captions.

A Further Note: Although the style (referencing, heading, figures and tables) used in this thesis is heavily modelled on APA style, I have sometimes deviated from APA recommendations. I am aware of this, and remind the reader that this thesis is technically not a psychology thesis but rather a History and Philosophy of Science thesis. I have, of course, made every effort to at least be consistent in style.

A final Note: There are several conventions for notation relating to p values. I have used `p value', but others use `p-value' or `P value'. In quotations throughout this thesis, the author's original notation is used.

vii

ACKNOWLEDGEMENTS

During the course of my research I interviewed many extraordinary people. Each gave generously of their time and told me almost all of the interesting things I know. A list of interviewees follows the introduction to this thesis. There are also many others, across several departments and universities, who deserve thanks.

Department of History and Philosophy of Science (HPS), University of Melbourne: Neil Thomason was my primary thesis supervisor. I have never met anyone like Neil. During the years we have worked together he has been relentlessly curious, enthusiastic and extremely generous. He has also been kind and patient. Amongst so many other things, Neil taught me how to be a good teacher.

Keith Hutchison co-supervised this thesis (in an official capacity); he signed a lot of forms and changed the way I give conference papers--for the better! Thanks also to Rosemary Robins and Howard Sankey.

The HPS postgraduate association (circa 1999-2003): Marg Ayre, who taught me what `finishing' meant, years before I could understand; Tao Bak; Emmaline Bexley, who, amongst so many other things, helped me format this thing; Kristian Camillieri; David Evans; Matthew Klugman, who read several draft chapters of this thesis and taught me much about writing history--remaining flaws of technique are of course, my own; Les Kneebone (whom I will thank again); Claire Leslie, Erik Nyberg and Peter Parbery.

Environmental Science Lab (and related parties), School of Botany, University of Melbourne: A large lab, packed with very kind and very smart people. Marg Burgman let me work there for over four years--probably much longer than he expected. Amongst others I worked with: Joe Banks; Sarah Bekessy (who helped make my survey questions to ecology students plausible); Jan Carey (thanks for debates about power and confidence intervals, and for chocolate); Yung En Chee (thanks for understanding and see below); Ryan Chisholm; Jane Elith (who has saved me from computer diasters many times and in so many ways, takes care of us all); Michelle Ensbey; Frith Jarrad (who may have

viii

suffered the most, as a consequence of lab geography); Lauren Keim (who read a chapter and was nice); Prema Lucas (who didn't but is forgiven); Mick McCarthy (who convinced me that Bayes made sense); Kirsten Parris (for sharing stories about frogs and editors); Cassia Reid; Tracey Regan (who showed me what it was all about); Andrea White (whose lunch packs have kept me alive the last few months); Bonnie Wintle and Brendan Wintle (who very patiently explained AIC and other important things).

School of Psychological Science, La Trobe University: Where I spend a lot of my time now: Cathy Faulkner read chapter drafts, offered invaluable advice, was patient and listened a lot--the clinical psychologist every PhD candidate should know. I look forward to continued collaborations! Thanks also to Melissa Coulson; Sarah Belia; Jo Leeman and my third year research methods students.

A particular acknowledgement under this heading to Geoff Cumming--an ever watchful `step-supervisor' who for some reason, years ago, put his faith in a strange HPS student and continues to offer seemingly endless opportunities to extend projects. Totally unimaginable that this thesis would have come together without him! For one thing, I wouldn't have understood confidence intervals without his pictures; for another, I may have inappropriately used `/'.

Money: I am grateful for these scholarships and grants: Melbourne Research Scholarship (MRS); Melbourne Abroad Travelling Scholarship (MATS); Travel Research in Postgraduate Study (TRIPS). In addition, I have worked on projects funded by The Australian Research Council (ARC). Thank you to those who bought home the ARC grants: Mark Burgman, Geoff Cumming, Neil Thomason and Sue Finch.

Family and Friends: Thanks for putting up with me: Toni Fidler (my mum); Frank Fidler (my dad); Jo Roxburgh; Robert Roxburgh; Eugine and Kay Kneebone; Les Kneebone; Emmaline Bexley; Tracey Regan; Helen Regan (whose hospitality helped `fund' many an overseas jaunt); Marg Ayre; Yung En Chee; Andy White; Naomi Toottell; Lauren Keim;

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download