Lecture 5

 

Clustering

 

October 18, 2001

Lecturer: Larry Ruzzo

Notes: Tobias Mann

 

Clustering History

 

 

Clustering Applied to Microarray Data

 

 

Two Kinds of Clustering

 

 

Hierarchical Clustering Basic Algorithm

 

 

This algorithm raises an immediate issue, which is how to determine when the algorithm should terminate. Popular choices include stopping when all data is incorporated into a tree, and stopping when there is a qualitative change in the distance between candidate clusters (for instance, when you go from merging clusters which are all close together to merging clusters which are comparatively quite distant).

 

Cluster Distance Functions

 

Suppose you have two clusters, X (with points xi, 1<i<N), and Y (with points yj,1<j<M). Let d(xi,yj) be a distance function between points in the data set. The following are ways to compute distances between two clusters as a function of the distances between the points in the two clusters.

 

 

Notes on Hierarchical Clustering Algorithm Behavior

 

 

Hierarchical Algorithm Pros and Cons

 

Pros:

 

Cons: