17 The Sampling Distribution



THE SAMPLING DISTRIBUTIONDavid W. StockburgerDeputy Director of Academic Assessment, US Air Force AcademyEmeritus Professor of Psychology, Missouri State University, USAWhat is it?The sampling XE "sampling" distribution XE "sampling distribution" is a distribution of a sample XE "sample" statistic. When using a procedure that repeatedly samples from a population and each time computes the same sample statistic, the resulting distribution of sample statistics is a sampling distribution of that statistic. To more clearly define the distribution, the name of the computed statistic is added as part of the title. For example, if the computed statistic was the sample mean, the sampling distribution would be titled “the sampling distribution of the sample mean.”For the sake of simplicity let us consider a simple example when we are dealing with a small discrete population consisting of the first ten integers {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}. Let us now repeatedly take random samples without replacement of size n=3 from this population. The random sampling might generate sets that look like {8, 3, 7}, {2, 1, 5}, {6, 3, 5}, {10, 7, 5}… If the mean (X) of each sample is found, the means of the above samples would appear as follows: 6, 2.67, 4.67, 7.33… How many different samples can we take, or put it differently, how many different sample means can we obtain? In our artificial example only 720, but in reality when we analyze very large populations, the number of possible different samples (of the same size) can be for all practical purposes treated as countless. Once we have obtained sample means for all samples, we have to list all their different values and number of their occurrences (frequencies). Finally, we will divide each frequency with the total number of samples to obtain relative frequencies (empirical probabilities). In this way we will come up to a list of all possible sample means and their relative frequencies. When the population is discrete, that list is called the sampling distribution of a statistics. Generally, the sampling distribution of a statistic is a probability distribution of that statistic derived from all possible samples having the same size from the population. When we are dealing with a continuous population it is impossible to enumerate all possible outcomes, so we have to rely on the results obtained in mathematical statistics (see section five of this paper for an example). Still, we can imagine a process that is similar to the one in the case of a discrete population. In that process we will take repeatedly thousands of different samples (of the same size) and calculate their statistic. In that way we will come to the relative frequency distribution of that statistic. The more samples we take, the closer this relative frequency distribution will come to the sampling distribution. Theoretically, as the number of samples approaches infinity our frequency distribution will approach the sampling distribution.Sampling distribution should not be confused with a sample distribution: the latter describes the distribution of values (elements) in a single sample.Referring back to our example, we can graphically display the sampling distribution of the mean as follows:Every statistic has a sampling XE "sampling" distribution. XE "sampling distribution" For example, suppose that instead of the mean XE "mean" , mediansmedians The score value that cuts the distribution in half, such that half the scores fall above the median and half fall below it. A measure of central tendency. XE "median" (Md) were computed for each sample XE "sample" . That is, within each sample the scores would be rank ordered and the middle score would be selected as the median. Using the samples above, the medians would be: 7, 2, 5, 7… The distribution of the medians calculated from all possible different samples of the same size is called the sampling distribution of the median XE "sampling distribution: of the median" and could be graphically shown as follows:It is possible to make up a new statistic and construct a sampling distribution for that new statistic. For example, by rank ordering the three scores within each sample and finding the mean of the highest and lowest scores a new statistic could be created. Let this statistic be called the mid-mean and be symbolized by M. For the above samples the values for this statistic would be: 5.5, 3, 4.5, 7.5… and the sampling distribution of the mid-mean could be graphically displayed as follows:Just as the population XE "population" distributions XE "population: model" can be described with parametersparameters Variables within the model that must be set before the model is completely specified. Variables that change the shape of the probability model. XE "parameters" , so can the sampling XE "sampling" distribution XE "sampling distribution" . The expected value XE "expected value" and variance of any distribution can be represented by the symbols ? (mu) and ? (Sigma squared), respectively. In the case of the sampling distribution, the ? symbol is often written with a subscript XE "subscript" to indicate which sampling distribution is being described. For example, the expected value of the sampling distribution of the mean XE "sampling distribution: of the mean" is represented by the symbol μX, that of the median XE "median" by μMd, and so on. The value of μX can be thought of as the theoretical mean XE "mean" of the distribution of means XE "mean: of the distribution of means" . In a similar manner the value of μMd, is the theoretical mean XE "mean" of a distribution of medians XE "mean: of a distribution of medians" . The square root of the variance of a sampling distribution XE "parameters" is given a special name, the standard errorstandard error The theoretical standard deviation of a sampling distribution. XE "standard error" . XE "error" In order to distinguish different sampling distributions, each has a name tagged on the end of “standard error” and a subscript XE "subscript" on the symbol. The theoretical standard deviation standard deviation A measure of variability. The positive square root of the variance. XE "standard deviation" of the sampling distribution of the mean XE "sampling distribution: of the mean" is called the standard error of the mean XE "standard error: of the mean" and is symbolized by X. Similarly, the theoretical standard deviation XE "standard deviation" of the sampling XE "sampling" distribution XE "sampling distribution" of the median XE "sampling distribution: of the median" is called the standard error XE "standard error" of the median XE "standard error: of the median" and is symbolized by Md.In each case the standard error XE "standard error" of the sampling distribution of a statistic XE "statistics" describes the degree to which the computed statistics may be expected to differ from one another when calculated from a sample XE "sample" of similar size and selected from similar population XE "population" models XE "population: model" . The larger the standard error of a given statistic XE "error" , the greater the differences between XE "between" the computed statistics for the different samples. From the example population, sampling method, and statistics described earlier, we would find μX = μMd= μM= 5.5 and X=1.46, Md= 1.96, and M=1.39.Why is the sampling distribution important - properties of statisticsStatistics have different properties as estimators of a population parameters. The sampling distribution of a statistic provides a window into some of the important properties. For example if the expected value of a statistic is equal to the expected value of the corresponding population parameter, the statistic is said to be unbiased. In the example above, all three statistics would be unbiased estimators of the population parameter μX. Consistency is another valuable property to have in the estimation of a population parameter XE "parameters" , as the statistic with the smallest standard error is preferred as an estimator estimator A statistic used to estimate a model parameter.of the corresponding population parameter, everything else being equal. Statisticians have proven that the standard error of the mean XE "standard error: of the mean" is smaller than the standard error of the median XE "standard error: of the median" . Because of this property, the mean XE "mean" is generally preferred over the median as an estimator of μX.Selection of distribution type to model scoresThe sampling distribution provides the theoretical foundation to select a distribution for many useful measures. For example, the central limit theorem describes why a measure, such as intelligence, that may be considered a summation of a number of independent quantities would necessarily be (approximately) distributed as a normal (Gaussian) curve. Hypothesis testingThe sampling distribution is integral to the hypothesis testing procedure. The sampling distribution is used in hypothesis testing to create a model of what the world would look like given the null hypothesis was true and a statistic was collected an infinite number of times. A single sample is taken, the sample statistic is calculated, and then it is compared to the model created by the sampling distribution of that statistic when the null hypothesis is true. If the sample statistic is unlikely given the model, then the model is rejected and a model with real effects is more likely. In the example process described earlier, if the sample {3, 1, 4} was taken from the population described above, the sample mean (2.67), median (3), or mid-mean (2.5) can be found and compared to the corresponding sampling distribution of that statistic. The probability of finding a sample statistic of that size or smaller could be found for each e.g. mean (p< .033), median (p<.18), and mid-mean (p<.025) and compared to the selected value of alpha (α). If alpha was set to .05, then the selected sample would be unlikely given the mean and mid-mean, but not the median.How can sampling distributions be constructed mathematically?Using advanced mathematics XE "thought experiment" statisticians can prove that under given conditions XE "statistician" a sampling distribution of some statistic must be a specific distribution. Let us illustrate this with the following theorem (for the proof see for example Hogg and Tanis (1997, p. 256)): If X1, X2, …, Xn are observations of a random sample of size n from the normal distribution N(?, ?σ2),X= 1ni=1nXiandS2= 1n-1i=1n(Xi-X)2thenn-1S2σ2 is χ2(n-1)The given conditions describe the assumptions that must be made in order for the distribution of the given sampling distribution to be true. For example, in the above theorem, assumptions about the sampling process (random sampling) and distribution of X (a normal distribution) are necessary for the proof. Of considerable importance to statistical thinking is the sampling distribution of the mean, a theoretical distribution of sample means. A mathematical theorem, called the Central Limit Theorem, describes the relationship of the parameters of the sampling distribution of the mean to the parameters of the probability model and sample size. Monte Carlo SimulationsIt is not always easy or even possible to derive the exact nature of a given sampling distribution using mathematical derivations. In such cases it is often possible to use Monte Carlo simulations to generate a close approximation to the true sampling distribution of the statistic. For example, a non-random sampling method, a non-standard distribution, or may be used with the resulting distribution not converging to a known type of probability distribution. When much of the current formulation of statistics was developed, Monte Carlo techniques, while available, were very inconvenient to apply. With current computers and programming languages such as Wolfram Mathematica (Kinney, 2009), Monte Carlo simulations are likely to become much more popular in creating sampling distributions.SummaryThe sampling distribution, a theoretical distribution of a sample statistic, is a critical concept in statistical thinking. The sampling distribution allows the statistician to hypothesize about what the world would look like if a statistic was calculated an infinite number of times.ReferencesHogg, R. V. and Elliot, A. T. (1997). Probability and Statistical Inference. Fifth edition. Upper Saddle River, NJ: Prentice Hall.Kinney, J. J. (2009). A Probability and Statistics Companion. Hoboken, NJ: John Wiley & Sons, Inc. ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download