Neighbourhood Component Analysis
Sam Roweis
University of Toronto
Say I give you a dataset of N points {x_n} in D dimensions. Can you find for me
(up to rotation and isotropic scaling) a projection matrix A (of size d by D)
such that when you apply nearest neighbour classification to the point set {y_n
= A x_n} you get the best possible performance? Nearest neighbour classification
is a very simple nonparametric method for supervised learning, and has several
appealing properties: the decision surfaces are nonlinear, the quality of the
predictions automatically improve as the amount of training data increases, and
there is only a single hyperparameter to be tuned. However, there are two
significant problems. First, we must define what we mean by "nearest", in other
words we must specify a metric on the input space. Second, the computational
load of the classifier is quite high at test time since we must store and search
through the entire training set to find the neighbours of a query point before
we can do classification. In this talk I will present an algorithm for linearly
reducing the dimensionality of the input data in a way that preserves the
performance of nearest neighbour classification as much as possible in the low
dimensional space. The algorithm simultaneouly reduces the computational load of
the classifier at test time and learns a (low-rank) metric on the input space.
Results on a variety of datasets show that the performance of nearest neighbour
after using our dimensionality reduction technique consistently exceeds the
performance after applying other linear dimensionality reduction methods such as
PCA or LDA.
Sam Roweis is an Assistant Professor in the Department of Computer Science at
the University of Toronto. His research interests are in machine learning, data
mining, and statistical signal processing. Roweis did his undergraduate degree
at the University of Toronto in the Engineering Science program and earned his
doctoral degree in 1999 from the California Institute of Technology working with
John Hopfield. He did a postdoc with Geoff Hinton and Zoubin Ghahramani at the
Gatsby Unit in London, and has also worked in several industrial research labs
including Bell Labs, and Whizbang! Labs. He is the holder of a Canada Research
Chair in Statistical Machine Learning and the winner of a Premier's Research
Excellence Award.