Lecture 5




October 18, 2001

Lecturer: Larry Ruzzo

Notes: Tobias Mann


Clustering History



Clustering Applied to Microarray Data



Two Kinds of Clustering



Hierarchical Clustering Basic Algorithm



This algorithm raises an immediate issue, which is how to determine when the algorithm should terminate. Popular choices include stopping when all data is incorporated into a tree, and stopping when there is a qualitative change in the distance between candidate clusters (for instance, when you go from merging clusters which are all close together to merging clusters which are comparatively quite distant).


Cluster Distance Functions


Suppose you have two clusters, X (with points xi, 1<i<N), and Y (with points yj,1<j<M). Let d(xi,yj) be a distance function between points in the data set. The following are ways to compute distances between two clusters as a function of the distances between the points in the two clusters.



Notes on Hierarchical Clustering Algorithm Behavior



Hierarchical Algorithm Pros and Cons






††††††††††† ††††††††††† †††††††††††