Statistics: interpretation of data, especially the ...

The Basic Idea

Statistics: interpretation of data, especially the analysis of population characteristics by inference from sampling.

Inferential statistics: predict population characteristics (parameters) based on sample characteristics (statistics).

Sampling error: the error that occurs when a statistics based on a sample predicts the value of a population parameter.

Distributions

Sample: frequency distribution Population: probability distribution

Relative frequency for an interval is the portion of sample observations that fall in that interval

Probability of an interval is the portion of times that an observation would fall in that interval in a long run of repeated observations.

The probability distribution for a variable lists the possible values of the variable together with their probabilities. Probabilities distributions are characterized by parameters such as the mean and standard deviation. Letting P(yi) denote the probability of outcome yi for a variable, then the mean of the distribution is Sumi (yi P(yi)).

The normal distribution: a symmetric bell-shaped curve completely specified by the parameters, mean and std. deviation.

Mean: the central tendency of a distribution, the average

Variance & standard deviation: measures deviation from the central point-deviation from the mean

The z-score of an observation measures the number of std. deviations that it falls from the mean of the distribution. The z-score is positive when it falls above the mean and negative when it falls below the mean.

Sampling Distribution

Up to this point we have used a probability distribution to summarize the possible outcomes of a variable. In practice, the population distributions of most variables are unknown. Instead we use sample data to estimate characteristics of these distributions such as their parameters.

When a sample statistic estimates a population parameter, the accuracy of that estimate depends on sampling variation. A probability distribution that describes the variation that occurs from repeatedly selecting samples of a certain size to form a particular statistic is called a sampling distribution. In other words, a sampling distribution is a probability distribution of the possible values of a sample statistic.

The Central Limit Theorem states that for large random samples, the sampling distribution of the sample mean is approximately normal. This is true even if the population distribution is far from normal. The sample mean is centered around the true mean with a spread described by the sample std. deviation divided by the square root of the sample size.

Estimation and Confidence Intervals

A point estimate of a population parameter is the value of the sample statistic that predicts the value of that parameter. E.g. The sample mean is a point estimate of the population mean.

We will use point estimates that are the maximum likelihood estimates of the population parameters. A maximum likelihood estimate is the value of the parameter that is most consistent with the observed data in the sense that if the parameter equaled that number, the observed data would have the greatest chance of occurring.

A confidence interval describes how close the point estimate is likely to be to the population parameter. More precisely, the confidence interval for a parameter is the range of numbers within which the parameter is believed to fall. The probability that the confidence interval contains the parameter is called the confidence coefficient.

Example: large-sample confidence interval for the mean

90% confidence interval:

All we need to know is how many standard deviations about the mean will include 90% of the sample means. The following picture of the standard normal curve shows the z-value we want so that a total area of 0.90 is included between z = -1.645 and z = 1.645:

[pic]

We call this value of z "z.05 since the area of the tail to its right is .05 units:

90% confidence interval = [[pic] - 1.645[pic]/[pic]n, [pic] + 1.645[pic]/[pic]n]

99% confidence interval:

Similarly, for the 99% confidence interval, we can consult the following picture

[pic]

and obtain:

99% confidence interval = [[pic] - 2.576[pic]/[pic]n, [pic] + 2.576[pic]/[pic]n]

Significance Tests

Five elements of a significance test:

1. Assumptions:

a. type of data, form of population, method of sampling, sample size.

2. Hypotheses:

a. Null hypothesis, H0 (the parameter valued being tested---the “no effect” value)

b. Alternative hypothesis, Ha (alternative parameter values)

3. Test statistic

a. Compares point estimate to null hypotheses parameter value

4. P-value

a. Weight of evidence about H0; smaller P is more contrary to H0

5. Report P-value

P-value is the probability, when H0 is true, of a test statistic value at least as contrary to H0 as the value actually observed. The smaller the P-value, the more strongly the data contradicts H0.

Example: Significance test for a mean

1. Assumption: we have a random sample of size 30 or greater.

2. Hypotheses: H0: population mean equal y, where y is some number.

Ha: population mean not equal y

3. Test statistic: the sample mean, [pic], estimates the population mean. When the sample size is 30 or greater, the distribution of the sample mean is approx. normal about the population mean with standard error = sample std. deviation divided by the square root of the sample size.

The test statistic is the z-score: [pic] - y / [pic]/[pic]n

4. P-value:

[pic]

Sampling distribution of z = [pic] - y / [pic]/[pic]n when H0 is true

(standard normal distribution)

P = sum of the tail probabilities

5. Round the P value to 1 to 3 significant digits before reporting

-----------------------

Population: the set of objects of interest in the study

Sample: a subset of the population

statistic: summarizes the information in a collection of data, i.e., the data sample

random

parameter: a numerical summary of the population

Distribution of

Means when no effect

means

................
................

In order to avoid copyright disputes, this page is only a partial summary.

To fulfill the demand for quickly locating and searching documents.

It is intelligent file search solution for home and business.

Literature Lottery

To fulfill the demand for quickly locating and searching documents.

Related download

Related searches