Supervised and Unsupervised Learning - Caltech Astronomy

[Pages:69]SupervisedandUnsupervised Learning

CiroDonalek Ay/Bi199?April2011

Summary

? KDDandDataMiningTasks ? Findingtheop?malapproach ? SupervisedModels

? NeuralNetworks ? Mul?LayerPerceptron ? DecisionTrees

? UnsupervisedModels

? DifferentTypesofClustering ? DistancesandNormaliza?on ? Kmeans ? SelfOrganizingMaps

? Combiningdifferentmodels

? CommiOeeMachines ? IntroducingaPrioriKnowledge ? SleepingExpertFramework

KnowledgeDiscoveryinDatabases

? KDDmaybedefinedas:"Thenontrivialprocessof iden2fyingvalid,novel,poten2allyuseful,and ul2matelyunderstandablepa9ernsindata".

? KDDisaninterac?veanditera?veprocessinvolving severalsteps.

Yougotyourdata:what'snext?

Whatkindofanalysisdoyouneed?Whichmodelismoreappropriateforit?...

Cleanyourdata!

? Datapreprocessingtransformstherawdata intoaformatthatwillbemoreeasilyand effec?velyprocessedforthepurposeofthe user.

? Sometasks

? sampling:selectsarepresenta?vesubset fromalargepopula?onofdata;

? Noisetreatment ? strategiestohandlemissingdata:some?mes

yourrowswillbeincomplete,notall parametersaremeasuredforallsamples. ? normaliza2on ? featureextrac2on:pullsoutspecifieddata thatissignificantinsomepar?cularcontext.

Usestandard formats!

MissingData

? Missingdataareapartofalmostallresearch,andweallhaveto decidehowtodealwithit.

? CompleteCaseAnalysis:useonlyrowswithallthevalues ? AvailableCaseAnalysis ? Subs?tu?on

? MeanValue:replacethemissingvaluewiththe meanvalueforthatpar?cularaOribute

? RegressionSubs?tu?on:wecanreplacethe missingvaluewithhistoricalvaluefromsimilarcases

? MatchingImputa?on:foreachunitwithamissingy, findaunitwithsimilarvaluesofxintheobserved dataandtakeitsyvalue

? MaximumLikelihood,EM,etc

? SomeDMmodelscandealwithmissingdatabeOerthanothers. ? Whichtechniquetoadoptreallydependsonyourdata

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download