Clustering 3: Hierarchical clustering (continued ...
Clustering 3: Hierarchical clustering (continued); choosing the number of clusters
Ryan Tibshirani Data Mining: 36-462/36-662
January 31 2013
Optional reading: ISL 10.3, ESL 14.3
1
Even more linkages
Last time we learned about hierarchical agglomerative clustering, basic idea is to repeatedly merge two most similar groups, as measured by the linkage
Three linkages: single, complete, average linkage. Properties: Single and complete linkage can have problems with chaining and crowding, respectively, but average linkage doesn't Cutting an average linkage tree provides no interpretation, but there is a nice interpretation for single, complete linkage trees Average linkage is sensitive to a monotone transformation of the dissimilarities dij, but single and complete linkage are not All three linkages produce dendrograms with no inversions
Actually, there are many more linkages out there, each having different properties. Today: we'll look at two more
2
Reminder: linkages
Our setup: given X1, . . . Xn and pairwise dissimilarities dij. (E.g., think of Xi Rp and dij = Xi - Xj 2)
Single linkage: measures the closest pair of points
dsingle(G, H) = min dij
iG, jH
Complete linkage: measures the farthest pair of points dcomplete(G, H) = max dij
iG, jH
Average linkage: measures the average dissimilarity over all pairs
1
daverage(G, H)
=
nG
? nH
dij
iG, jH
3
Centroid linkage
Centroid linkage1 is commonly used. Assume that Xi Rp, and dij = Xi - Xj 2. Let X?G, X?H denote group averages for G, H.
Then:
dcentroid(G, H) = X?G - X?H 2
q
Example (dissimilarities dij are distances, groups are marked by colors): centroid linkage score dcentroid(G, H) is the distance between the group centroids (i.e., group averages)
-2
-1
0
1
2
q q
q qq qq
q q qq q
q q
q
q
q q
q q
q
q
q qqq
q qqqqqq
q
qq qq q q qq
qq
q
qq q qq
q
q
q
q
qq
q
q
qqq
q
q q
qq
q
q
q qqq
qqq
q
qqq q qq
q
q
q q
q qqq
q
q q
q qq
qq
q
q
q
-2
-1
0
1
2
1Eisen et al. (1998), "Cluster Analysis and Display of Genome-Wide Expression Patterns"
4
Centroid linkage is the standard in biology
Centroid linkage is simple: easy to understand, and easy to implement. Maybe for these reasons, it has become the standard for hierarchical clustering in biology
5
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- solutions to problem set 1
- cat 5 grade 9practice test overview
- sections 2 1 2 2 multiple choice choose the one
- game theory lecture 1 eurecom
- 10 1 outcomes and events big ideas learning
- 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 48 49 50
- the limit of a sequence mit mathematics
- factor analysis example harvard university
- 35 permutations combinations and proba bility
- an introduction to game theory uh
Related searches
- another word for continued success
- nysed statement of continued eligibility
- hierarchical culture
- hierarchical culture examples
- egalitarian vs hierarchical cultures
- example of hierarchical organization
- hierarchical regression analysis
- how to report hierarchical regression
- hierarchical multiple regression write up
- hierarchical regression write up apa
- reporting hierarchical regression apa
- organizations with hierarchical cultures