Improving the Scalability of Synonym Resolution in the KnowItAll Project's Search Engine

by
Bo Qin

Abstract:

Identifying and merging object and relation synonyms, or Synonym Resolution (SR), are important for information extraction. Previously, a scalable system for unsupervised SR work (independent of strong domain knowledge and hand-tagged training examples), known as RESOLVER, has been implemented that runs on a single machine. RESOLVER uses a probabilistic relation model for calculating object and relation similarities based on string similarity and similarity of the assertions containing them. Since the quality of SR depends on the amount of assertions available, a parallelized RESOLVER system, or P-RESOLVER, running on a cluster of machines would produce results with better precision and recall as it can process more assertions and reduce the memory constraint that is associated with the single machine version of RESOLVER. The fully-implemented P-RESOLVER uses Hadoop's MapReduce framework to process extracted assertions on multiple machines simultaneously.

Advised by Oren Etzioni and Michele Banko.

CSE 203
Wednesday
June 4, 2008
4:30 - 5:20 pm