Statistical Data Analysis - Polymatheia

 Statistical Data Analysis

GLEN COWAN

University ofSiegen

CLARENDON PRESS ? OXFORD 1998

Oxford University Press, Great Clarendon Street, Oxford OX2 6DP Oxford New York

Athens Auckland Bangkok Bogota Bombay Buenos Aires Calcutta Cape Town Dar es Salaam

Delhi Florence Hong Kong Istanbul Karachi Kuala Lumpur Madras Madrid Melbourne Mexico City Nairobi Paris Singapore Taipei Tokyo Toronto Warsaw and associated companies in Berlin Ibadan

Oxford is a registered trade mark ofOxford University Press

Published in the United States by Oxford University Press Inc., New York

? Glen Cowan, 1998

All rights reserved. No part ofthis publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, without the prior permission in writing ofOxford University Press. Within the UK, exceptions are allowed in respect ofany fair dealing for the purpose ofresearch or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act, 1988, or in the case ofreprographic reproduction in accordance with the terms of licences issued by the Copyright Licensing Agency. Enquiries concerning reproduction outside those terms and in other countries should be sent to the Rights Department, Oxford University Press, at the address above.

This book is sold subject to the condition that it shall not, by way oftrade or otherwise, be lent, re-sold, hired out, or otherwise circulated without the publisher's prior consent in anyform ofbinding or cover other than that in which it is published and without a similar

condition including this condition being imposed on the subsequent purchaser.

A catalogue recordfor this book is availablefrom the British Library

Library ofCongress Cataloging in Publication Data (Data available)

ISBNO 19 850156 O(Hbk) ISBNO 19 850155 2(Pbk)

Typeset by the author Printed in Great Britain by

Bookcraft (Bath) Ltd Midsomer Norton, Avon

Preface

The following book is a guide to the practical application of statistics in data analysis as typically encountered in the physical sciences, and in particular in high energy particle physics. Students entering this field do not usually go through a formal course in probability and statistics, despite having been exposed to many other advanced mathematical techniques. Statistical methods are invariably needed, however, in order to extract meaningful information from experimental data.

The book originally developed out of work with graduate students at the European Organization for Nuclear Research (CERN). It is primarily aimed at graduate or advanced undergraduate students in the physical sciences, especially those engaged in research or laboratory courses which involve data analysis. A number of the methods are widely used but less widely understood, and it is therefore hoped that more advanced researchers will also be able to profit from the material. Although most of the examples come from high energy particle physics, an attempt has been made to present the material in a reasonably general way so that the book can be useful to people in most branches of physics and astronomy.

It is assumed that the reader has an understanding of linear algebra, multivariable calculus and som; knowledge of complex analysis. No prior knowledge of probability and statistics, however, is assumed. Roughly speaking, the present book is somewhat less theoretically oriented than that of Eadie et al. [Ead71]' and somewhat more so than those of Lyons [Ly086] and Barlow [Bar89].

The first part of the book, Chapters 1 through 8, covers basic concepts of probability and random variables, Monte Carlo techniques, statistical tests, and methods of parameter estimation. The concept of probability plays, of course, a fundamental role. In addition to its interpretation as a relative frequency as used in classical statistics, the Bayesian approach using subjective probability is discussed as well. Although the frequency interpretation tends to dominate in most of the commonly applied methods, it was felt that certain applications can be better handled with Bayesian statistics, and that a brief discussion of this approach was therefore justified.

The last three chapters are somewhat more advanced than those preceding. Chapter 9 covers interval estimation, including the setting of limits on parameters. The characteristic function is introduced in Chapter 10 and used to derive a number of results which are stated without proof earlier in the book. Finally, Chapter 11 covers the problem of unfolding, i.e. the correcting of distributions for effects of measurement errors. This topic in particular is somewhat special-

vi Preface

ized, but since it is not treated in many other books it was felt that a discussion of the concepts would be found useful.

An attempt has been made to present the most important concepts and tools in a manageably short space. As a consequence, many results are given without proof and the reader is often referred to the literature for more detailed explanations. It is thus considerably more compact than several other works on similar topics, e.g. those by Brandt [Bra92] and Frodeson et aJ. [Fr079]. Most chapters employ concepts introduced in previous ones. Since the book is relatively short, however, it is hoped that readers will look at least briefly at the earlier chapters before skipping to the topic needed. A possible exception is Chapter 4 on statistical tests; this could by skipped without a serious loss of continuity by those mainly interested in parameter estimation.

The choice of and relative weights given to the various topics reflect the type of analysis usually encountered in particle physics. Here the data usually consist of a set of observed events, e.g. particle collisions or decays, as opposed to the data of a radio astronomer, who deals with a signal measured as a function of time. The topic of time series analysis is therefore omitted, as is analysis of variance. The important topic of numerical minimization is not treated, since computer routines that perform this task are widely available in program libraries.

At various points in the book, reference is made to the CERN program library (CERNLIB) [CER97], as this is the collection of computer sofware most accessible to particle physicists. The short tables of values included in the book have been computed using CERNLIB routines. Other useful sources of statistics software include the program libraries provided with the books by Press et al. [Pre92] and Brandt [Bra92].

Part of the material here was presented as a half-semester course at the University of Siegen in 1995. Given the topics added since then, most of the book could be covered in 30 one-hour lectures. Although no exercises are included, an evolving set of problems and additional related material can be found on the book's World Wide Web site. The link to this site can be located via the catalogue of the Oxford University Press home page at:



The reader interested in practicing the techniques of this book is encouraged to implement the examples on a computer. By modifying the various parameters and the input data, one can gain experience with the methods presented. This is particularly instructive in conjunction with the Monte Carlo method (Chapter 3), which allows one to generate simulated data sets with known properties. These can then be used as input to test the various statistical techniques.

Thanks are due above all to Sonke Adlung of Oxford University Press for encouraging me to write this book as well as for his many suggestions on its content. In addition I am grateful to Professors Sigmund Brandt and Claus Grupen of the University of Siegen for their support of this project and their feedback on the text. Significant improvements were suggested by Robert Cousins, as

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download