Www.cheat-sheets.org

.TLI\

TSTICS FOR INTRODUGTORY COURSES

J STATISTICS - A set of tools for collecting, oreanizing, presenting, and analyzing numerical facts or observations.

I .Descriptive Statistics - procedures used to organize and present data in a convenient, useable. and communicable form.

2.Inferential Statistics - proceduresemployed to arrive at broader generalizations or inferences from sample data to populations.

-l STATISTIC - A number describing a sample characteristic. Results from the manipulation of sample data according to certain specified procedures.

J DATA - Characteristics or numbers that are collectedby observation.

J POPULATION - A complete set of actual or potential observations.

J PARAMETER - A number describing a population characteristic; typically, inferred from samplestatistic.

f SAMPLE - A subset of the population selectedaccording to some scheme.

J RANDOM SAMPLE - A subset selected in such a way that each member of the population has an equal opportunity to be selected. Ex.lottery numbers in afair lottery

J VARIABLE - A phenomenon that may take on different values.

Shows the number of times each observation occurs when the values ofa variable are arranged in order according to their magnitudes.

J il {il, I a rrI.)'A .l b]|, K I 3artl LQ

x fx 100 1 83

99 1 ut 98 0 85 gl 0 86

96 11 87 95 0 88 94 0 89 93 I 92 0 91

t 11 11111 1 o 1 1111111 111 11 1

x

f

74 11f

75 1111

76 1 1

77 111

7A I

79 1 1

80 1

81 11

82 I

xt 65 o 66 1 67 11 68 1 69 111 70 1111 71 0 72 11 73 111

II GROTJPEFDREOUENCYEilSTRIBUTION

- A frequency distribution in which the values

ofthe variable have been grouped into classes.

CLASS 98-100

f CLASS t

f MEAN -The ooint in a distributionof measurements aboutwhich the summeddeviationsare equalto zero.

Average value of a sample or population.

POPULATION MEAN

p: +!,*,

SAMPLE MEAN

o:#2*,

Note: The meanls very sensltlveto extrememeasurementsthat are not balancedon both sides.

I WEIGHTED MEAN - Sum of a setof observations

multiplied by their respectiveweights, divided by the

sumof theweights: WEIGHTED MEAN

9, *, *, -L-

w h e r ex r ,:

w e i g h t , ' x ,-

,\r*'

o b s e r v a t i o nG;

:

n u m b e ro f

o b s e r v a i i o ng r d u p s . ' C a l c u l a t efdr o m a p o p u l a t i o n .

sample.or gr6upingsin a frequencydistribution.

Ex. In the FrequencVDistribution below, the meun is 80.3: culculatbd by-using frequencies for the wis. Whengrouped, use clossmidpointsJbr xis.

J MEDIAN - Observationor potenliaol bservationin a set that divides the set so that the samenumber of observationslie on eachside of it. For an odd number of values.it is the middle value; for an evennumber it is the averageof the middle two.

Ex. In the Frequency Distribution table below, the median is 79.5.

f MODE - Observationthat occurs with the greatest tiequency. Ex. In the Frequency Distributioln nble below.the mode is 88.

GROUpITG OF DATA

tr CUMULATUEFREOUENCBYISTRI.

BUTION -A distributionwhichshowsthetotal frequencythroughthe upperreal limit of eachclass.

tr CUMUIATIVE PERCENTAGE DISTRI. BUTION-A distributionwhichshowsthetotal percentagethroughthe upperreal limit of eachclass.

I il {.ll lNl.l'tlz

!I! llrfGl:

CLASS

f I

65-67

3

6&70

8

71-73

5

7+76

9

Tt-79

6

80-82

4

83-85

8

86-88

8

89-91

6

92-g

1

95-97

2

9&100

2

Cum f 3 11 16 25 31 35 43 51 57 58 60 62

"

4.84

17.74 25.81 40.32 50.00 56.45 69.35 82.26 91.94 93.55 96.77 100.00

O SUM OF SOUARES fSSr- Deriationstiom

themeans. quareadndsummed:

P o p u l a t i o n S S : I ( X- li. r x ) ' o rI x i ',-

(t I r,), N

S a m p l eS S : I ( x i

-x)2or

_ r, Ixi2---

\,)2

O VARIANCE - The averageof squaredifferencesbetweenobservationasndtheir mean.

POPULANONVARIANCESAMPLEVARIANCE

VARIANCESFOH GBOUPEDDATA

POPUIATION

SAMPLE

^{G-'{G

o2:*i t,(r,-p)t

s2=;1i

tilm'-x;2

lI ;_r

t=1

D STANDARD DEVIATION - Squareroot of the variance: Ex. Pop. S.D. o -

D BAR GRAPH - A form of graph that uses bars to indicate the frequency of occurrence of observations. o Histogram - a form of bar graph used rr ith interval or ratio-scaled variables. - Interval Scale- a quantitative scale that permits the use of arithmetic operations.The zero point in the scale is arbitrary. - R.atio Scale- same as interval scaleexcepl that there is a true zero point.

D FREOUENCY CURVE - A form of graph representing a frequency distribution in the form of a continuous line that traces a histogram. o Cumulative Frequency Curve - a continuous line that traces a histogram where bars in all the lower classes are stacked up in the adjacent higher class. It cannot have a negative slop. o Normal curve - bell-shaped curve. o Skewed curve - departs from symmetry and tails-off at one end.

n

Y

I

U

fzi

)

N O R M A LC U R V E

15

^/T\

10 0 -att?

./

-t

\ \

S K E W E DC U R V E

15

-- \

/

\

10

-/ LEFT

\

J-

\

0

Probabioliitfy'eonft'ol ccurrenceA^tnat=t @-Number of outcomafamring EwntA

D SAMPLE SPACE- All possibleoutcomesof an experiment.

N TYPE OF EVENTS

o Exhaustive - two or more events are said to be exhaustive

if all possible outcomes are considered. Symbolically, P (A or B or...) - l.

rNon-Exhausdve -two or more events are said to be non-

exhaustive if they do not exhaust all possible outcomes. rMutually Exclusive - Events that cannot occur simultaneously:p(A andB) = 0; andp (A or B) = p (A) + p (B).

Ex. males, females oNon-Mutually Exclusive - Event-s that can occur

simultaneously:p (A orB) = P(A) +p(B) - p(A andB)'

&x. males, brown eyes.

Slndependent by occurrence

or

Events whose probability is unaffected nonoccurrence of each other: p(A lB) =

p(A); ptB In)= p(e); and p(A and B) = p(A) p(B).

Ex. gender and eye color SDependent - Events whose probability changes deoendlns upon the occurrence or non-occurrence ofeach other: p{.I I bl dilfers lrom AA): p(B lA) differs from p(B); andp(A andB): p(A) p(BlA): p(B) AAIB)

Ex. rsce and eye colon

C JOINT PROBABILITIES - Probabilitythat2 ot m o r e e v e n t so c c u r s i m u l t a n e o u s l y .

tr

tiMonAaRl PGrIoNbAabLilitPieRs=OsBuAmBmILaItTioIEnoSf

or Uncondiprobabilities'

D CONDITIONAL PROBABILITIES - Probability of I given the existence of ,S,written, p (Al$.

fl EXAMPLE- Given the numbers I to 9 as observations in a sample space: .Events mutually exclusive and exhaustive' Example: p (all odd numb ers); p ( all eu-ne nurnbers) .Evenls mutualty exclusive but not exhaustiveExample: p (an eien number); p (the numbers 7 and 5) .Events ni:ither mutually exclusive or exhaustive-

Example: p (an even number or a 2)

RANDOM VARIABLES

A ma'popnilnvg or function that assignsone and one-numerical value to each

outcome in an exPeriment.

tl DISCRETE RANDOM VARIABLES - In-

volvesrulesor probabilitymodelsfor assign-

ing or generatingonly distinctvalues(notfrac-

tionalmeasurements). C BINOMIAL DISTRIBUTION - A model

for the sumof a seriesof n independenttrials

wheretrial resultsin a 0 (failure)or I (suc-

cess).Ex. Coin to "t

p(r)=(!)n'l-trl"-'

wherep(s) is theprobabilityof s succesisn n

trials with a constantn probability per trials,

a- "n-

d w"

h' -e' -r

e (t

,1 s/

\

=s,!

n! (n-

s

)

!

Binomial mean: !: nx

Binomial variance: o': n, (l - tr)

A s n i n c r e a s e s ,t h e B i n o m i a l a p p r o a c h e st h e

Normal distribution.

D HYPERGEOMETRIC DISTRIBUTION A model for the sum of a series of n trials where

each trial results in a 0 or I and is drawn from a

small population with N elements split between

N1 successesand N2 failures.Then the probabil-

ity of splitting the n trials between xl successes

and x2 failures is:

Nl!

{_z!

p(xlandtrr:W

4tlv-r;lr 't

Hypergeometric mean: pt :E(xi - +

andvariance:o2: ffit+][p]

D POISSON DISTRIBUTION - A model for the number of occurrences of an event x :

0 , 1 , 2 , . . . ,w h e n t h e p r o b a b i l i t y o f o c c u r r e n c e

is small, but the number of opportunities for t h e o c c u r r e n c ei s l a r g e ,f o r x : 0 , 1 , 2 , 3 . . . .a n d )v> 0 . otherwise P(x) =.0.

e$t=ff

P o i s s o nm e a n a n d r a r i a n c e : , t .

tr LEVEL OF SIGNIFICANCE-Aprobabilin valueconsidererdareinthesamplindgistribution. specifiedunderthenull hypothesiws hereoneis willing to acknowledgteheoperationof chance mon significancelevelsare 170, 50 ,l0o . Alpha (a) level : the lowestleve for which the null hypothesis can be rejected.

The significanceleveldeterminesthecritical region. [| NULL HYPOTHESIS (flr) - A statement

that specifies hypothesized value(s) for one or more of the population parameter. lBx. Hs= a coin is unbiased.That isp : 0.5.] tr ALTERNATM HYPOTHESIS (.r/1) - A statement that specifies that the population parameter is some value other than the one

specified underthe null trypothesis.[Ex. I1r: a coin is biased That isp * 0.5.1

I. NONDIRECTIONAL HYPOTHESIS an alternative hypothesis (H1) that statesonll that the population parameter is different from the one ipicified under H 6. Ex. [1f lt + !t0 Two-TailedProbability Valueis employedwhen

the alternativehypothesisis non-directional. 2. DIRECTIONAL HYPOTHESIS - an

alternative hypothesis that statesthe direction rn which the population parameter differs fiom the one specified under 11* Ex. Ilt: Ir > pnr-trHf lr ' t1

One-TailedProbabilityValueis employedu'hen

the alternativehypothesisis directional. D NOTION OF INDIRECT PROOF - Stnct

interpretation ofhypothesis testing revealsthat thc'

null hypothesis canneverbeproved. [Ex. Ifwe toi. a coin 200 times andtails comesup 100 times. it is no guarantee that heads will come up exactly hali

the time in the long run; small discrepancies migfrt

exist. A bias can exist even at a small magnitude. We can make the assertion however that NO BASIS EXISTS FOR REJECTING THE HYPOTHESIS THAT THE COIN IS

UNBIASED . (The null hypothesisis not reieued.

When employing the 0.05 level of significa

reject the null hypothesis when a given res

occurs by chance5% of the time or less.]

] TWO TYPES OF ERRORS -Type1Error (Typea Error)= therejectionof 11,whenit is actuallytrue.Theprobabilityof a type 1 error is givenby a. -TypeII Error(TypeBError)=Theacceptance offl, whenit is actuallyfalse.Theprobabilin of a typeII erroris givenby B.

fl SAMPLING DISTRIBUTION - A theoretical probability distribution of a statisticthat would iesult from drawing all possible samplesof a

given size from some population.

THE STAIUDARDEBROR OF THE MEAN

A theoretical standard deviation of sample mean of a

given sample si4e, drawn from some speciJied popu-

lation.

DWhen based on a very large, known population, the

s t a n d a r de r r o r i s :

6" r__ _ o ^ ln

EWhen estimated from a sample drawn from very large

population, the standard error is:

O =^ =t -

S

'fn

lThe dispersion of sample means decreasesas sample size is increased.

For continuo us t'a ri u b Ies. .fi'eq uen t'i es u re e.tp ressed in terms o.fareus under u t'ttt.re.

D CONTINUOUS RANDOM VARIABLES - Variable that may take on any value along an uninterrupted interval of a numberline.

D NORMAL DISTRIBUTION - bell cun'e; a distribution whose valuescluster symmetrically aroundthe mean(alsomedianandmode).

(x-P)212o2 f ( x ) = - 1o",t'2x

wheref (x): frequency.at.a givenrzalue o : s t a n d a r dd e v i a t l o no f t h e distribution lt : aapppprrooxxiimmaatteellyI2y.171118q3 p : the meanof the distribution x : any scorein the distribution

D STANDARD NORMAL DISTRIBUTION - A normalrandomvariableZ. thathasa mean of0. andstandardeviationof l.

Q Z-VALUES- The numberof standarddeviationsa specificobservatiolniesfromthemean: ': x- 11

(for sample mean X)

rlf x 1,X2, X3,...xn , is a simple random sample of n elements from a large (infinite) population, with mean

mu(p) and standard deviation o, then the distribution of

T takes on the bell shaped distribution of a normal

random variable asn increasesandthe distribution ofthe

ratio:

7-!

6l^J n

approachesthe standard normal distribution asn goes to'infinity. In practice.a normal approximation is

acceptable for samples of 30 or larger.

Percentage Cumulative Distribution

for selected Z values under a normal curye

Z-value -3 -2 -l 0 +1 +2 +3 PercentifeScore o-13 2.2a 15.87 50.00 a4.13 97.72 99.a7

Critical when

region

for rejection

u : O-O7. two-tailed

of Hs test

NBIASEDNESS- Propertyof a reliableesimator beins estimated. o Unbiased Estimate of a Parameter - an estimate that equalson the averagethe value ofthe parameter.

Ex. the sample mesn is sn unbissed estimator of the population mesn. . Biased Estimate of a Parameter - an estimate that does not equal on the averagethe value ofthe parameter.

Ex. the sample variance calculated with n is a bissed estimator of the population variance, however, x'hen calculated with n-I it is unbiused. J STANDARD ERROR - The standarddeviation

of the estimator is called the standard error. Er. The standarderror forT's is. o: X = "/F This has to be distinguished from the STAND.A,RDDEVIATION OF THE SAMPLE:

' The standarderror measuresthe variability in the Ts around their expectedvalue E(X) while the stanJard deviation of the samplereflectsthe variability rn the samplearound the sample'smean (x).

-

I I

-r r

L'sBDwHEN toN IS UNK

THE STANDARDDEvIANOWN-Useof Student'sr.

f Wheno isnotknowni,tsvalueisestimatefdrom

F samoledata.

fjm' t-ratio- the ratio employedil thq.testingof

Ivpotheses or determiningthseignificancebaf

Vrri'erence betweenmeafrsltwo--samplecase)

inrolving a samplewith a i-distribuiion.The

tbrmulaTs:

\ F where p : population mean under H6

SX

r = . sl r l

"n6 oDistribution-symmetrical distribution with a mean of zero lnd standard deviation that annroachesone as degreesoffreedom increases ' i . i . a p p r o a c h e st h e Z d i s t r i b u t i o n ) .

. A , s s u m p t i o na n d c o n d i t i o n r e q u i r e d i n r\suming r-distribution:Samplesare diawn from a norm-allv distributed population and o rpopulation standarddeviatiori) is unknown.

o Homogeneity of Variance- If.2 samples are b e r n cc o m o a r e d t h e a s s u m p t i o ni n u s i n g t - r a t i o r' th?t the variances of the populatioi's from * h e r e t h e s a m p l e sa r e d r a w n a r e e q u a l .

o E s t i m a t e d6 X - , - X ,( t h a t i s s x , - F r ) i s b a s e do n thc unbiasedestimaieof the pofulaiion variance.

o Degreesof Freedom (dJ\-^thenumber of values that are free to vary after placing certain

restrictions on the data.

Example. The sample (43,74,42,65) has n = 4. The sum is 224 and mean : 56. Using these 4 numbers

and determining deviationsfrom the mean, we'll have J deviations namely (-13,18,-14,9) which sum up to :ero. Deviations from the mean is one restriction we have imposed and the natural consequence is that the sum ofthese deviations should equal zero. For this to happen, we can choose any number but our freedom to choose is limited to only 3 numbers because one is

restricted by the requirement that the sum ofthe de-

viations should equal zero. We use the equality:

(x,-x) +(x2-9 +ft t-x) +(xa--x:) 0 I Sogiven q meanof 56,iJ'thefirst 3 observqtionsure 43, 74,und 42, the last observationhus to be 65.This single restriction in this case helps us determine df, Theformula is n lessnumber of restrictions.In this t'ase,it is n-l= 4-l=3df

. _/-Ratiois a robust test-This meansthat statistical inferencesarelikely valid despitefairly largedepartures from normality in the populationdistribution. If normality of populationdistributionis in doubt,it is wise

to increasethe samplesize.

tr USED WHEN THE STANDARD DEVIATION IS KNOWN: Wheno is knownit is possible to describethe form of the distribution of thesamplemeanasaZ statisticT. hesamplemust be drawn from a normal distributionor havea samplesize(n) of at least30.

,' 6.== r - ! whereu : populationmean(either nro#rf or hypothesizedunderHo) andor = o/f,. o Critical Region - the portion of the areaunder thecurvewhich includesthosevaluesof a statistic thatleadto therejectionofthe null hypothesis.

- The most often usedsignificancelevelsare 0.01,0.05,and0.L Foraone-tailedtesutsingzstatistic,thesecorrespondto z-valuesof 2.33, 1.65,and 1.28respectivelyF.ora two-tailedtest, thecriticalregionof 0.01is split into two equal outerareasmarkedby z-valuesof 12.581.

Example 1. Given a population with lt:250 and o: S0,whatis theprobabili6t of drawing a sampleof n:100 valueswhosemean (x) is at least255?In this case,2=1.00.Looking atThble A, the given areafor 2:1.00 is 0.3413.Tb its right is 0.1587(=6.5-0.i413o) r 15.85%.

Conclusion: there are spproximately 16 chancesin 100 of obtaining a samplemean : 255from this papulation when n = 104.

Example 2. Assume we do not know the population me&n. However, we suspect that it may have been selectedfrom a population with 1t= 250 and 6= 50,but weare not sure. The hypothesis to be tested is whether the sample mean wasselectedfrom this populatian.Assume we obtainedfrom a sample (n) of 100, a sample ,neen of 263. Is it reasonable to &ssantethat this sample was drawn from the suspectedpopulation?

| .H o'.1=t 250(thattheactualmeanof thepopulationfrom which the sampleis drawnis equal to 250)Hi [t not equalto 250 (the alternative hypothesiSis that it is greaterthan or lessthan 250,thusa two-tailedtest).

2. e-statisticwill be usedbecausethe population o is known.

3.Assumethesignificancelevel(cr)to be0.01. Looking at Table A, we find that the area beyond a z of 2.58is approximately0.005.

TorejectH6atthe0.01levelof significancet,}reabsolutevalueof the obtainedz mustbe equalto or rgersepaotenrtdhianngtolzs6a.9m1oplrl2em.5e8a.Hn:er2e6th3eisv2a.l6u0e.of z cor-

tr CONCLUSION-Sincethisobtainedz fallswithin thecriticalregion,wemayrejecHt oatthe0.01level of significance.

tr CONFIDENCE INTERVAL- Interval within which we may consider a hypothesis tenable. Common confidence intervals are 90oh,95oh, and 99oh. Confidence Limits: limits definins the confidence interval.

(1- cr)100% confidence interval for rr:

ii, *F t l-il.l, ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download