TIME: 2:30-3:20 pm,  December 5, 2007

PLACE: CSE 503  

SPEAKER: Yury Lifshits
         Caltech

TITLE: Combinatorial approaches to data mining


ABSTRACT:
In many algorithmic problems like nearest neighbor search,
clustering, near-duplicate detection or decentralized navigation
input dataset is described either by some explicit representation
(e.g. vectors) or by oracle access to pairwise distances.

In this talk we assume that our knowledge about dataset
is much more limited. All information provided to an algorithm
has the form "A is more similar to C than B is".
Thus, we have only "comparative" information. It turns out that
with some reasonable "consistancy assumption" all mentioned problems
can be solved efficiently without any access to actual distance values.
We also show a connection between combinatorial framework and
concepts of doubling dimension and Karger-Ruhl dimension.