STATISTICS 1

STATISTICS 1

Keijo Ruohonen

(Translation by Jukka-Pekka Humaloja and Robert Pich?)

2011

Table of Contents

1 I FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS

1 1.1 Random Sampling 1 1.2 Some Important Statistics 2 1.3 Data Displays and Graphical Methods 6 1.4 Sampling distributions 6 1.4.1 Sampling distributions of means 10 1.4.2 The sampling distribution of the sample variance 12 1.4.3 t-Distribution 14 1.4.4 F-distribution

16 II ONE- AND TWO-SAMPLE ESTIMATION

16 2.1 Point Estimation and Interval Estimation 18 2.2 Single Sample: Estimating the Mean 22 2.3 Prediction Intervals 23 2.4 Tolerance Limits 24 2.5 Two Samples: Estimating the Difference between Two Means 27 2.6 Paired observations 28 2.7 Estimating a Proportion 29 2.8 Single Sample: Estimating the Variance 30 2.9 Two Samples: Estimating the Ratio of Two Variances

32 III TESTS OF HYPOTHESES

32 3.1 Statistical Hypotheses 32 3.2 Hypothesis Testing 33 3.3 One- and Two-Tailed Tests 35 3.4 Test statistic 37 3.5 P-probabilities 38 3.6 Tests Concerning Expectations 41 3.7 Tests Concerning Variances 42 3.8 Graphical Methods for Comparing Means

44 IV 2-TESTS

44 4.1 Goodness-of-Fit Test 45 4.2 Test for Independence. Contingency Tables 47 4.3 Test for Homogeneity

50 V MAXIMUM LIKELIHOOD ESTIMATION

50 5.1 Maximum Likelihood Estimation 51 5.2 Examples

i

ii

54 VI MULTIPLE LINEAR REGRESSION

54 6.1 Regression Models 55 6.2 Estimating the Coefficients. Using Matrices 58 6.3 Properties of Parameter Estimators 61 6.4 Statistical Consideration of Regression 64 6.5 Choice of a Fitted Model Through Hypothesis Testing 65 6.6 Categorical Regressors 68 6.7 Study of Residuals 69 6.8 Logistical Regression

73 VII NONPARAMETRIC STATISTICS

73 7.1 Sign Test 75 7.2 Signed-Rank Test 78 7.3 Mann?Whitney test 80 7.4 Kruskal?Wallis test 81 7.5 Rank Correlation Coefficient

84 VIII STOCHASTIC SIMULATION

84 8.1 Generating Random Numbers 84 8.1.1 Generating Uniform Distributions 85 8.1.2 Generating Discrete Distributions 86 8.1.3 Generating Continuous Distributions with the Inverse Transform Method 87 8.1.4 Generating Continuous Distributions with the Accept?Reject Method 89 8.2 Resampling 89 8.3 Monte Carlo Integration

92 Appendix: TOLERANCE INTERVALS

Preface

This document is the lecture notes for the course "MAT-33317 Statistics 1", and is a translation of the notes for the corresponding Finnish-language course. The laborious bulk translation was taken care of by Jukka-Pekka Humaloja and the material was then checked by professor Robert Pich?. I want to thank the translation team for their effort.

The lecture notes are based on chapters 8, 9, 10, 12 and 16 of the book WALPOLE, R.E. & MYERS, R.H. & MYERS, S.L. & YE, K.: Probability & Statistics for Engineers & Scientists, Pearson Prentice Hall (2007). The book (denoted WMMY in the following) is one of the most popular elementary statistics textbooks in the world. The corresponding sections in WMMY are indicated in the right margin. These notes are however much more compact than WMMY and should not be considered as a substitute for the book, for example for self-study. There are many topics where the presentation is quite different from WMMY; in particular, formulas that are nowadays considered too inaccurate have been replaced with better ones. Additionally, a chapter on stochastic simulation, which is not covered in WMMY, is included in these notes.

The examples are mostly from the book WMMY. The numbers of these examples in WMMY are given in the right margin. The examples have all been recomputed using MATLAB, the statistical program JMP, or web-based calculators. The examples aren't discussed as thoroughly as in WMMY and in many cases the treatment is different.

iii

An essential prerequisite for the course "MAT-33317 Statistics" is the course "MAT-20501 Probability Calculus" or a corresponding course that covers the material of chapters 1?8 of WMMY. MAT-33317 only covers the basics of statistics. The TUT mathematics department offers many advanced courses that go beyond the basics, including "MAT-34006 Statistics 2", which covers statistical quality control, design of experiments, and reliability theory, "MAT51706 Bayesian methods", which introduces the Bayesian approach to solving statistical problems, "MAT-51801 Mathematical Statistics", which covers the theoretical foundations of statistics, and "MAT-41281 Multivariate Statistical Methods", which covers a wide range of methods including regression.

Keijo Ruohonen

Chapter 1

FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS

This chapter is mostly a review of basic Probability Calculus. Additionally, some methods for visualisation of statistical data are presented.

1.1 Random Sampling

A population is a collection of all the values that may be included in a sample. A numerical value or a classification value may exist in the sample multiple times. A sample is a collection of certain values chosen from the population. The sample size, usually denoted by n, is the number of these values. If these values are chosen at random, the sample is called a random sample.

A sample can be considered a sequence of random variables: X1, X2, . . . , Xn ("the first sample variable", "the second sample variable", . . . ) that are independent and identically distributed. A concrete realized sample as a result of sampling is a sequence of values (numerical or classification values): x1, x2, . . . , xn. Note: random variables are denoted with upper case letters, realized values with lower case letters.

The sampling considered here is actually sampling with replacement. In other words, if a population is finite (or countably infinite), an element taken from the sample is replaced before taking another element.

1.2 Some Important Statistics

A statistic is some individual value calculated from a sample: f (X1, . . . , Xn) (random variables) or f (x1, . . . , xn) (realized values). A familiar statistic is the sample mean

1n

1n

X= n

Xi

or

x= n

xi.

i=1

i=1

1

[8.1]

Sampling without replacement is not considered in this

course.

[8.2]

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download