STATISTICS - SUMMARY



STATISTICS - SUMMARY

1. Functions of statistics

a. description: summarize a set of data

b. inference: make generalizations from sample to population. parameter estimates, hypothesis tests.

2. Types of statistics

i. descriptive statistics: describe a set of data

a. central tendency: mean, median (order statistics), mode.

b. dispersion: range, variance & standard deviation,

c. Others: shape -skewness, kutosis.

d. EDA procedures (exploratory data analysis).

Stem & leaf display: ordered array, freq distrib. & histogram all in one.

Box and Whisker plot: Five number summary-minimum,Q1, median, Q3, and maximum.

Resistant statistics: trimmed and winsorized means,midhinge, interquartile deviation.

ii. inferential statistics: make inferences from samples to populations.

iii. Parameteric vs non-parametric statistics

a. parametric : generally assume interval scale measurements and normally distributed variables.

b. nonparametric (distribution free statistics) : generally weaker assumptions: ordinal or nominal measurements, don't specify the exact form of distribution.

3.Steps in hypothesis testing.

1. Make assumptions & choose the appropriate statistic. Check measurement scale of variables.

2. State null hypothesis; and the alternative

3. Select a confidence level for the test. Determine the critical region - values of the statistic for which you will reject the null hypothesis.

4. Calculate the statistic.

5. Reject or fail to reject null hypothesis.

6. Interpret results.

Type I error: rejecting null hypothesis when it is true.

Prob of Type I error is 1-confidence level.

Type II error: failing to reject null hypothesis when it is false.

Power of a test = 1-prob of a type II error.

4. Null hypotheses for simple bivariate tests.

a. Pearson Correlation rxy =0.

b. T-Test x =y

c. One Way ANOVA M1=M2=M3=...=Mn

d. Chi square : No relationship between x and y. Formally, this is captured by the "expected table", which assumes cells in the X-Y table can be generated completely from row and column totals.

5. EXAMPLES OF T-TEST AND CHI SQUARE

(1) T-TEST. Tests for differences in means (or percentages) across two subgroups. Null hypothesis is mean of Group 1 = mean of group 2. This test assumes interval scale measure of dependent variable (the one you compute means for) and that the distribution in the population is normal. The generalization to more than two groups is called a one way analysis of variance and the null hypothesis is that all the subgroup means are identical. These are parametric statistics since they assume interval scale and normality.

(2) Chi square is a nonparametric statistic to test if there is a relationship in a contingency table, i.e. Is the row variable related to the column variable? Is there any discernible pattern in the table? Can we predict the column variable Y if we know the row variable X?

The Chi square statistic is calculated by comparing the observed table from the sample, with an "expected" table derived under the null hypothesis of no relationship. If Fo denotes a cell in the observed table and Fe a corresponding cell in expected table, then

Chi square ( 2 ) = (Fo -Fe)2/Fe

cells

The cells in the expected table are computed from the row (nr ) and column (nc ) totals for the sample as follows:

Fe =nr nc / n .

CHI SQUARE TEST EXAMPLE: Suppose a sample (n=100) from student population yields the following observed table of frequencies:

GENDER

Male Female Total

IM-USE

Yes 20 40 60

No 30 10 40

Total 50 50 100

EXPECTED TABLE UNDER NULL HYPOTHESIS (NO RELATIONSHIP)

GENDER

Male Female Total

IM-USE

Yes 30 30 60

No 20 20 40

Total 50 50 100

2 = (20-30)2/30 + (40-30)2/30 + (30-20)2/20 + (10-20)2/20

100/30 + 100/30 + 100/20 +100/20 = 13.67

[pic]

Chi square tables report the probability of getting a Chi square value this high for a particular random sample, given that there is no relationship in the population. If doing the test by hand, you would look up the probability in a table. There are different Chi square tables depending on the number of cells in the table. Determine the number of degrees of freedom for the table as (rows-1) X (columns -1). In this case it is (2-1)*(2-1)=1. The probability of obtaining a Chi square of 13.67 given no relationship is less than .001. (The last entry in my table gives 10.83 as the chi square value corresponding to a probability of .001, so 13.67 would have a smaller probability).

If using a computer package, it will normally report both the Chi square and the probability or significance level corresponding to this value. In testing your null hypothesis, REJECT if the reported probability is less than .05 (or whatever confidence level you have chosen). FAIL TO REJECT if the probability is greater than .05.

For the above example : REVIEW OF STEPS IN HYPOTHESIS TESTING:

(1) Nominal level variables, so we used Chi square.

(2) State null hypothesis. No relationship between gender and IM-USE

(3) Choose confidence level. 95%, so alpha = .05, critical region is 2 > 3.84 (see App E -p. A-28).

(4) Draw sample and calculate the statistic; 2 = 13.67

(5). 13.67 > 3.84, so inside critical region, REJECT null hypothesis. Alternatively, SIG= .001 on computer printout, .001 ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download