TIME: 4:30-5:30 pm,  Thursday, October 19, 2006

PLACE: CSE 403

TITLE:  The effectiveness of Lloyd-type methods for the k-means problem

SPEAKER: Yuval Rabani
         Technion
 

ABSTRACT:

We investigate variants of Lloyd's heuristic for clustering high
dimensional data in an attempt to explain its popularity (a half
century after its introduction) among practitioners, and in order to
suggest improvements in its application. We propose and justify a
clusterability criterion for data sets. We present variants of Lloyd's
heuristic that quickly lead to provably near-optimal clustering
solutions when applied to well-clusterable instances. This is the
first performance guarantee for a variant of Lloyd's heuristic. The
provision of a guarantee on output quality does not come at the
expense of speed: some of our algorithms are candidates for being
faster in practice than currently used variants of Lloyd's method. In
addition, our other algorithms are faster on well-clusterable
instances than recently proposed approximation algorithms, while
maintaining similar guarantees on clustering quality. Our main
algorithmic contribution is a novel probabilistic seeding process for
the starting configuration of a Lloyd-type iteration.

This is joint work with Rafail Ostrovsky, Leonard Schulman, and
Chaitanya Swamy.