From: Katarzyna Wilamowska (kasiaw@washington.edu)
Date: Wed Dec 08 2004 - 11:56:10 PST
Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL
Peter D. Turney
Summary
Paper discusses a simple unsupervised learning algorithm for recognizing synonyms
Important Ideas
The first important idea in this paper is that the internet is a valuable and large source of data that one can search through. This size allows for PMI-IR to be much more sensitive to sparse data.
I thought that the point of PMI-IR being simpler than LSA was interesting. Since one can use such a large data source to search for an answer, one can have a simpler program and get the same, or better results.
The different types of IR methods was cool. I didn't think of the antonym problem, until I did get to score3.
Flaws
Lack of experimentation. I would be nice to know if chunk size really does matter.
In the introduction the author hinted as "the expressive power of the search engine's query language" but didn't talk about it after that.
Questions
Experiment: LSA vs. PMI-IR with same chunk-size
Experiment: LSA vs. PMI-IR with limited document number
Experiment: LSA vs. PMI-IR with same chunk-size and limited document number.
Increasing the performance of PMI_IR with a different IR method.
This archive was generated by hypermail 2.1.6 : Wed Dec 08 2004 - 11:56:13 PST