Clustering 3: Hierarchical clustering (continued ...
Clustering 3: Hierarchical clustering (continued); choosing the number of clusters
Ryan Tibshirani Data Mining: 36-462/36-662
January 31 2013
Optional reading: ISL 10.3, ESL 14.3
1
Even more linkages
Last time we learned about hierarchical agglomerative clustering, basic idea is to repeatedly merge two most similar groups, as measured by the linkage
Three linkages: single, complete, average linkage. Properties: Single and complete linkage can have problems with chaining and crowding, respectively, but average linkage doesn't Cutting an average linkage tree provides no interpretation, but there is a nice interpretation for single, complete linkage trees Average linkage is sensitive to a monotone transformation of the dissimilarities dij, but single and complete linkage are not All three linkages produce dendrograms with no inversions
Actually, there are many more linkages out there, each having different properties. Today: we'll look at two more
2
Reminder: linkages
Our setup: given X1, . . . Xn and pairwise dissimilarities dij. (E.g., think of Xi Rp and dij = Xi - Xj 2)
Single linkage: measures the closest pair of points
dsingle(G, H) = min dij
iG, jH
Complete linkage: measures the farthest pair of points dcomplete(G, H) = max dij
iG, jH
Average linkage: measures the average dissimilarity over all pairs
1
daverage(G, H)
=
nG
? nH
dij
iG, jH
3
Centroid linkage
Centroid linkage1 is commonly used. Assume that Xi Rp, and dij = Xi - Xj 2. Let X?G, X?H denote group averages for G, H.
Then:
dcentroid(G, H) = X?G - X?H 2
q
Example (dissimilarities dij are distances, groups are marked by colors): centroid linkage score dcentroid(G, H) is the distance between the group centroids (i.e., group averages)
-2
-1
0
1
2
q q
q qq qq
q q qq q
q q
q
q
q q
q q
q
q
q qqq
q qqqqqq
q
qq qq q q qq
qq
q
qq q qq
q
q
q
q
qq
q
q
qqq
q
q q
qq
q
q
q qqq
qqq
q
qqq q qq
q
q
q q
q qqq
q
q q
q qq
qq
q
q
q
-2
-1
0
1
2
1Eisen et al. (1998), "Cluster Analysis and Display of Genome-Wide Expression Patterns"
4
Centroid linkage is the standard in biology
Centroid linkage is simple: easy to understand, and easy to implement. Maybe for these reasons, it has become the standard for hierarchical clustering in biology
5
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- an introduction to game theory uh
- choosing the optimal number of factors in exploratory
- selecting research participants
- count the objects and circle the correct number
- clustering 3 hierarchical clustering continued
- principles of mathematics 12 explained 284
- permutations and combinations
- introduction to probability models
- probability fundamental counting principle permutations
- part 1 module 4 the fundamental counting principle example
Related searches
- another word for continued success
- nysed statement of continued eligibility
- hierarchical culture
- hierarchical culture examples
- egalitarian vs hierarchical cultures
- example of hierarchical organization
- hierarchical regression analysis
- how to report hierarchical regression
- hierarchical multiple regression write up
- hierarchical regression write up apa
- reporting hierarchical regression apa
- organizations with hierarchical cultures