Probability and Statistics in Microsoft Excel™

[Pages:11]aguideto

ProbabilityandStatisticsin MicrosoftExcelTM

Resourcestosupportthelearningofmathematics, statisticsandORinhighereducation.

mathstore.ac.uk TheStatisticalEducationthroughProblemSolving(STEPS) glossary stats.gla.ac/steps/glossary

ProbabilityandStatisticsinMicrosoftExcelTM

Excelprovidesmorethan100functionsrelatingtoprobabilityandstatistics.Italsohasafacilityfor constructingawiderangeofchartsandgraphsfordisplayingdata.Thisleafletprovidesaquickreference guidetoassistyouinharnessingExcel'sstatisticalcapability.Exceptwhereindicated,thefeaturesincluded hereareavailableinExcelVersions4.0andabove.Almostalltheinstructionsherealsoapplytothe spreadsheetfacilityinOpenOffice();anyslightvariationsincommands shouldbeobvioustotheuser. Excelisnotdesignedforstatisticalcomputing.Ifyourequirestatisticalanalysisbeyonddatavalidationand manipulation,tabulation,presentationandcalculationofsummarystatistics,youareadvisedtousea bespokestatisticalpackagesuchasMinitaborSPSS.

ExcelhasanAnalysisToolpakoptional"add-in"facilitythatincludesmacrosforcarryingoutmany elementarystatisticalanalyses.Theinstructionsforinstallationofthisadd-invarywiththeversionofExcel --usetheHelpfacilityinExcelforfurtherinformationonthis.Thisadd-infacilityisnotusedinthisleaflet. Therearetworeasonswhythisadd-inshouldbeusedwithcare: ? Unlikeotherspreadsheetfunctionality,whichensuresthatcalculationsautomaticallyupdateinthe

lightofchangeselsewhereintheworkbook,theoutputfromtheadd-inisnotdynamicallylinkedtothe sourcedata.Henceifanyofthedatachangetheadd-inmustberunagaintoobtainupdatedoutput. ? Outputfromtheadd-incanbemisleading(see). Thereareothercommerciallyavailableadd-insthatmakeuseofExcel'sfamiliaruserinterfacebut supplementitsstatisticalfunctionality.Examplesinclude:

Analyse-it?

R-Excel



Unistat



XLSTAT



StatTools



Usingthisleaflet

Supposeyouhaveasampleofthreedata,10.4,11.2and16.4,thatyouhaveenteredintocellsA2:A4ona

worksheet.InExcelafunction,e.g.SUM,canbeappliedtothesedatainoneoffourways:

=SUM(10.4,11.2,16.4)

=SUM(A2,A3,A4)

=SUM(A2:A4)

=SUM(x)

wherexisthenameattachedtorangeA2:A4.

Inthisleaflet,forsimplicity,wehavechosentorefertonamed ranges.Tonamearange,simplyhighlighttherangeofcells,clickin theNameBoxonthefarleftoftheFormulaBar,typeintherequired name,e.g.x,thenpressEnter.InExcel2007namescanbemanaged viaFormulas>NameManager.

Ifyouprefernottousenamestheninwhatfollowssimplyreplacethe

nameoftherange,e.g.x,bytherangeaddress,e.g.A2:A4.

DescriptiveStatistics

Assumingasampleofdatainrangex

Sampletotal,x

=SUM(x)

Samplesize,n

=COUNT(x)

Samplemean,x/n

=AVERAGE(x)

Samplevariance,s2

=VAR(x)

Samplestandarddeviation,s

=STDEV(x)

Meansquareddeviation

=VARP(x)

Rootmeansquareddeviation

=STDEVP(x)

Correctedsumofsquares,Sxx

Rawsumofsquares,x2

=DEVSQ(x) =SUMSQ(x)

Minimumvalue

=MIN(x)

Maximumvalue

=MAX(x)

Range

=MAX(x)-MIN(x)

LowerQuartile,Q1*

=QUARTILE(x,1)

Median,Q2

=MEDIAN(x)

UpperQuartile,Q3*

=QUARTILE(x,3)

Interquartilerange,IQR

=QUARTILE(x,3)-QUARTILE(x,1)

KthPercentile

=PERCENTILE(x,K%) whereKisanumberbetween0and100

Mode

=MODE(x)

*Note:Thereareseveraldifferentdefinitionsfortheupperandlowerquartiles,sothevaluescalculatedby

Excelmaynotagreewithyourtextbookorotherstatisticalcalculationtools.

Boxplot

See

GroupedFrequencyData

Assumingafrequencydistributionwithclassmidpointsstoredinrangexandfrequenciesinrangef:

Samplesize,n

=SUM(f)

Sampletotal,fx

=SUMPRODUCT(f,x)

Samplemean,fx/n

=SUMPRODUCT(f,x)/SUM(f)

Correctedsumofsquares,Sxx =SUMPRODUCT(f,x,x)-SUMPRODUCT(f,x)^2/SUM(f)

Samplevariance,s2

=(SUMPRODUCT(f,x,x)-SUMPRODUCT(f,x)^2/SUM(f))/(SUM(f)-1)

Samplestandarddeviation,s =SQRT(Samplevariance)

GraphicalRepresentations

Exceloffersawiderangeofcharttypesfordisplayingdata.Manyoftheseareover-elaborate.In particular,3-Deffectscanbemisleadingandshouldbeavoided.

InExcel2007toconstructachartforyourdata: 1. Selecttherangecontainingyourdata,includinganyroworcolumnlabels. 2. Onthemainribbon,clickontheInserttab. 3. UndertheChartsgroupoficons,selectthecharttyperequired,thenthepreferredchartsubtype. 4. UnderChartToolsonthemainribbon,usetheDesign,LayoutandFormattabstocustomisethechart.

InearlierversionsofExcel,selectthedatarangeandthenInsert>CharttoinvoketheChartWizard.

PermutationsandCombinations

Numberofdifferentcombinationsofmobjectsselectedfromnobjects

nCm

=COMBIN(n,m)

Numberofdifferentpermutationsofmobjectsselectedfromnobjects

nPm

=PERMUT(n,m)

StandardProbabilityDistributions

AssumingarandomvariableXandconstantsaandb

Binomial

Bin(n,p)

P(X=a)

=BINOMDIST(a,n,p,FALSE)

P(Xa)

=BINOMDIST(a,n,p,TRUE)

Geometric Geom(p)

P(X=a)

=BINOMDIST(1,a,p,FALSE)/a

P(Xa)

=1-BINOMDIST(0,a,p,FALSE)

Poisson

Po( )

P(X=a)

=POISSON(a,lambda,FALSE)

P(Xa)

=POISSON(a,lambda,TRUE)

Pascal

Pasc(n,p)

P(X=a) =NEGBINOMDIST(a-n,n,p)

P(Xa) =BETADIST(p,n,a-n+1)/BETADIST(1,n,a-

n+1)

Normal

N(? , 2)

f(a)

=NORMDIST(a,mu,sigma,FALSE)

P(Xa)

=NORMDIST(a,mu,sigma,TRUE)

P(aXb)

=NORMDIST(b,mu,sigma,TRUE)

-NORMDIST(a,mu,sigma,TRUE)

P(Xb)

=1-NORMDIST(b,mu,sigma,TRUE)

Exponential Expon()

f(a)

=EXPONDIST(a,theta,FALSE)

P(Xa)

=EXPONDIST(a,theta,TRUE)

P(aXb)

=EXP(-a*theta)-EXP(-b*theta)

P(Xb)

=EXP(-b*theta)

Gamma

Ga( , )

f(a)

=GAMMADIST(a,alpha,beta,

FALSE)

P(Xa)

=GAMMADIST(a,alpha,beta,TRUE)

P(aXb)

=GAMMADIST(b,alpha,beta,

TRUE)

-GAMMADIST(a,alpha,beta,

TRUE)

P(Xb)

=1-GAMMADIST(b,alpha,beta,

TRUE)

TestStatisticsforPopularSignificanceTests

Onesampletestofamean Assumingasampleofdatainrangex,drawnfromapopulationwithmean?andstandarddeviation:

H0:?=?0H1:??0

Teststatistic,z

=(AVERAGE(x)-mu0)/(sigma/SQRT(COUNT(x)))

assumingknown

Teststatistic,t

=(AVERAGE(x)-mu0)/(STDEV(x)/SQRT(COUNT(x)))

assumingunknown

Onesampletestofavariance

Assumingasampleofdatainrangex,drawnfromapopulationwithmean?andstandarddeviation:

H0:2=02H1:2> 02

Teststatistic,2

=DEVSQ(x)/sigma0^2

Twosampletestofdifferencebetweenmeans Assumingtwosamplesofdatainrangesxandy,drawnfrompopulationswithmeans?1and?2andequal

variances:

H0:?1-?2=cH1:?1-?2c Estimatetheunknowncommonstandarddeviationbythepooledestimate:

s

=SQRT((DEVSQ(x)+DEVSQ(y))/(COUNT(x)+COUNT(y)-2))

Teststatistic,t =(AVERAGE(x)-AVERAGE(y)-c)/(s*SQRT(1/COUNT(x)+1/COUNT(y)))

Twosampletestofratioofvariances Assumingtwosamplesofdatainrangesxandy,drawnfrompopulationswithvariances12and22: H0:12=22H1:12>22

Teststatistic,F =VAR(x)/VAR(y)

Chi-squaredtestofassociation

Assumingatwo-waycontingencytableofobservedfrequencies.

H0:rowfactorindependentofcolumnfactor H1:someassociationbetweenrowandcolumnfactors Thesuggestedlayoutbelowfora4x2tablecaneasilybemodifiedfortablesofothersizes.

A1: A3: C1: G3: C8: C9: C10:

=SUM(C3:D6)

=SUM(C3:D3)

=SUM(C3:C6)

=$A3*C$1/$A$1

=CHITEST(C3:D6,G3:H6)

=(COUNT(A3:A6)-1)*(COUNT(C1:D1)-1)

=CHIINV(C8,C9)

copydowntoA6 copyacrosstoD1 copyintoG3:H6

CriticalValuesandP-valuesforStatisticalTests

Therearetwoapproachestoconductingsignificancetests.Someanalystsliketocomparetheteststatistic

withthecriticalvalueforagivensignificancelevel;othersprefertocalculatetheP-valuecorrespondingto

theteststatistic.Excelcanbeusedforeithermethod.

Assumingsignificancelevel,(typically=5%or0.05):

Two-tailedz-test Uppertailcriticalvalue =NORMSINV(1-alpha/2) P-valueforgivenz =2*(1-NORMSDIST(ABS(z))) Two-tailedt-testwithvdegreesoffreedom Uppertailcriticalvalue =TINV(alpha,v) P-valueforgivent =TDIST(ABS(t),v,2)

One-tailed 2-testwithvdegreesoffreedom Uppertailcriticalvalue =CHIINV(alpha,v) P-valueforgivenchisquared=CHIDIST(chisquared,v) One-tailedF-testwithv1degreesoffreedomin thenumeratorandv2inthedenominator Uppertailcriticalvalue =FINV(alpha,v1,v2) P-valueforgivenF =FDIST(F,v1,v2)

ConfidenceLimits

Assumingdegreeofconfidence100(1-)% (e.g.for95%confidence=0.05):

One-samplestatistics,withdatainrangex

For ?(known)

Lowerlimit=AVERAGE(x)-NORMSINV(1-alpha/2)*sigma/SQRT(COUNT(x))

or =AVERAGE(x)-CONFIDENCE(alpha,sigma,COUNT(x))

For ?(unknown)

Upperlimit=AVERAGE(x)+NORMSINV(1-alpha/2)*sigma/SQRT(COUNT(x)) or =AVERAGE(x)+CONFIDENCE(alpha,sigma,COUNT(x))

Lowerlimit=AVERAGE(x)-TINV(alpha, COUNT(x)-1)*STDEV(x)/SQRT(COUNT(x))

Upperlimit=AVERAGE(x)+TINV(alpha, COUNT(x)-1)*STDEV(x)/SQRT(COUNT(x))

For 2

Lowerlimit=(DEVSQ(x)/CHIINV(alpha/2,COUNT(x))-1)

Upperlimit=(DEVSQ(x)/CHIINV(1-alpha/2,COUNT(x))-1)

Two-samplestatistics,withdataforthefirstsampleinrangex,andthesecondsampleinrangey

For ?x-?y (xknown,y known)

Lowerlimit =AVERAGE(x)-AVERAGE(y)-NORMSINV(1-alpha/2)*SQRT(sigmax^2/COUNT(x)+sigmay^2/COUNT(y))

Upperlimit =AVERAGE(x)-AVERAGE(y)+NORMSINV(1-alpha/2)*SQRT(sigmax^2/COUNT(x)+sigmay^2/COUNT(y))

For ?x-?y (xandy unknownbutassumedequal)

Estimatetheunknowncommonstandarddeviationbythepooledestimate:

s

=SQRT((DEVSQ(x)+DEVSQ(y))/(COUNT(x)+COUNT(y)-2))

Lowerlimit =AVERAGE(x)-AVERAGE(y)-TINV(alpha,COUNT(x)+COUNT(y)-2)*s*SQRT(1/COUNT(x)+1/COUNT(y))

Upperlimit =AVERAGE(x)-AVERAGE(y)+TINV(alpha,COUNT(x)+COUNT(y)-2)*s*SQRT(1/COUNT(x)+1/COUNT(y))

For x2/ y2

Lowerlimit=DEVSQ(x)/DEVSQ(y)/FINV(alpha/2,COUNT(x)-1,COUNT(y)-1)

Upperlimit(DEVSQ(x)/DEVSQ(y)/FINV(1-alpha/2,COUNT(x)-1,COUNT(y)-1)

SimpleLinearRegression

InExcelVersions5andabove,aregressionline(ortrendline)canbeaddedtoascatterplotbyright-clicking ononeoftheplottedpointsandselectingAddTrendlinefromtheshortcutmenu.Bothlinearandavariety ofnon-linearmodelsmaybefittedtothedata.Theequationofthefittedmodelmaybedisplayed, togetherwiththevalueofthecoefficientofdetermination,R2.Therearealsooptionstoextrapolatethe trendlineineitherdirection,ortoforcethetrendlinetohaveaspecificintercept.

Thetrendlineapproachispurelygraphical.Tocalculatepredictions,regressionfunctionsmustbeused.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download