From: Vaishnavi Sannidhanam (vaishu@cs.washington.edu)
Date: Wed Dec 08 2004 - 01:26:08 PST
Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL
By Peter D. Turney
--------------------------------------------------------
* One-line summary
------------------------
- This paper talks about the design and techniques involved in developing an
artificially intelligent synonym solver that uses statistical analysis over
web queries to do unsupervised learning.
* The (two) most important ideas in the paper, and why
-----------------------------------------------------------
- One of the cool concepts that this paper introduces is the idea of
exploiting web searches. Not just using them aimlessly but in a methodical
manner -- associating it to the concept of how words appear together and
also what kind of words appear together.
- This paper also kind of makes us think why we need databases when we can
query something over the web and get answers. Yes, of course someone has to
maintain one database, but engines like these doesn't have to.
- The four heuristics presented by the paper based on word locations or
occurrences in a paper were very cool ideas.
- This is a second paper we read so far that uses the divide and conquer
approach to tackle may be something that could be pursued as a very complex
problem and proposes a very elegant solution.
- I also like the applications section mentioned in the paper. This gives
the reader an idea of why a synonym finder might be useful and hence gives
the paper a wholesomeness
* The one or two largest flaws in the paper
---------------------------------------------------
- I thought that the evaluation section though had lots of data in it was
not through enough. I took TOEFL long time ago, so I don't exactly remember,
but I thought it was pretty easy. So, I really don't know if picking
questions from TOEFL was a good metric. However, they do claim that it
performed better than an average person.
- Among the four scores they use for finding synonyms they say that score 3
reduces the risk of scores 1 and 2 finding an antonym as likely as a
synonym. However, they did not really explain it clearly nor present good
evaluation that shows this.
* Identify two important, open research questions on the topic, and
why they matt
-----------------------------------------------------------------------
- Can this be extended to work on finding similar sentences or groups of
sentences and thus understand the language itself? How hard or easy will
that be?
- We can also try to see how this algorithm works on larger and varying data
sets and how well it performs as opposed to an average human.
This archive was generated by hypermail 2.1.6 : Wed Dec 08 2004 - 01:26:14 PST