From: Stef Sch... (stef@cs.washington.edu)
Date: Wed Dec 08 2004 - 10:48:25 PST
Article: Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL
Author: Peter D. Turney
This paper compares the performance of Pointwise Mutual Information
(PMI) and Information Retrieval (IR) with Latent Semantic Analysis
(LSA) on answering synonym questions from the TOEFL test, and further
examines PMI-IR on an ESL synonyms test.
Strengths:
This paper showed that using the simple technique of PMI-IR can
outperform LSA on synonym questions. It shows that PMI-IR can be a
powerful technique, if used with a large corpus (such as that indexed
by a modern search engine). Like in PROVERB, this paper shows that
advances in computing technology can be used to solve problems more
accurately than previous techniques allowed. Furthermore, it shows
that a simple, unsupervised learning algorithm over this huge corpus
can retrieve some interesting information. It made good use of the
advanced search techniques, particularly the NEAR keyword, which is
based on the idea that synonyms will be relatively close to each other
in a text (as authors generally try to mix things up, so as not to bore
their readers). This alone boosted the accuracy by 10-14%, and was
responsible for the vast majority of the improvements in comparison to
LSA.
Additionally, the method of adding context to a synonym for the ESL
questions is a nice contribution, since it provides a way to
automatically deal with polysemy, and still extract the proper synonym
(even if it isn't the most common form of the word).
Weaknesses:
Is this really a fair comparison of LSA and PMI-IR? Since the PMI-IR
method has access to a much larger data source than the LSA method,
which effectively had access only to 30,473 articles (as opposed to the
millions/billions of pages indexed by AltaVista). One would expect LSA
over a larger database to do better as well. The author recognized
this discrepancy, and gave some analysis showing that it should be
possible to scale up, but the question remains, is this comparison
fair?
Another flaw I thought was their lack of data. They used a test set of
80 TOEFL and 50 ESL questions. If there were one or two especially
difficult or one or two especially easy questions in that set, then the
accuracy would change by about 3-4%. I'd be curious to see if they
maintained this performance as they scaled up the number of questions
answered.
Future work:
Clearly, one easy thing would be to use a bigger search engine (e.g.
google). Another avenue for future work would be to modify the LSA so
that it a) used a much larger data source, and thus provide a better
comparison, and/or b) used a smaller "chunk" size, which would
effectively simulate the NEAR queries. Since LSA treats the data as a
bag of words, and thus (by arguments made in the paper) should rate
antonyms about as highly as synonyms. This was one thing that PMI-IR
was able to get around (using the NOT NEAR queries). Would a similar
type of exclusion be possible (or necessary?) for LSA?
Another possibility for future research is how this could be applied to
other problems. Both the PMI-IR model, and the synonym detection,
since the author has particular interest in automatically extracting
the keywords from scientific articles.
This archive was generated by hypermail 2.1.6 : Wed Dec 08 2004 - 10:47:22 PST