Clustering Lecture 8: MapReduce

Clustering Lecture 8: MapReduce

Jing Gao

SUNY Buffalo

1

Outline

? Basics

? Motivation, definition, evaluation

? Methods

? Partitional ? Hierarchical ? Density-based ? Mixture model ? Spectral methods

? Advanced topics

? Clustering ensemble ? Clustering in MapReduce ? Semi-supervised clustering, subspace clustering, co-clustering,

etc.

2

Big Data EveryWhere

? Lots of data is being collected and warehoused

? Web data, e-commerce ? purchases at department/

grocery stores ? Bank/Credit Card

transactions ? Social Network

3

Divide and Conquer

"Work"

Partition

w1

"worker"

r1

w2

"worker"

r2

w3

"worker"

r3

"Result"

Combine

4

Distributed Grep

Very big data

Split data Split data Split data

Split data

grep grep grep

grep

matches

matches

matches

cat

matches

All matches

5

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download