Review 3

From: Vaishnavi Sannidhanam (vaishu@cs.washington.edu)
Date: Wed Dec 08 2004 - 01:26:08 PST

  • Next message: Beltran Ibarra Davila-Armero: "Review 3"

    Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL
            By Peter D. Turney
    --------------------------------------------------------

    * One-line summary
    ------------------------
    - This paper talks about the design and techniques involved in developing an
    artificially intelligent synonym solver that uses statistical analysis over
    web queries to do unsupervised learning.
      
    * The (two) most important ideas in the paper, and why
    -----------------------------------------------------------
    - One of the cool concepts that this paper introduces is the idea of
    exploiting web searches. Not just using them aimlessly but in a methodical
    manner -- associating it to the concept of how words appear together and
    also what kind of words appear together.

    - This paper also kind of makes us think why we need databases when we can
    query something over the web and get answers. Yes, of course someone has to
    maintain one database, but engines like these doesn't have to.

    - The four heuristics presented by the paper based on word locations or
    occurrences in a paper were very cool ideas.

    - This is a second paper we read so far that uses the divide and conquer
    approach to tackle may be something that could be pursued as a very complex
    problem and proposes a very elegant solution.

    - I also like the applications section mentioned in the paper. This gives
    the reader an idea of why a synonym finder might be useful and hence gives
    the paper a wholesomeness

      
    * The one or two largest flaws in the paper
    ---------------------------------------------------
    - I thought that the evaluation section though had lots of data in it was
    not through enough. I took TOEFL long time ago, so I don't exactly remember,
    but I thought it was pretty easy. So, I really don't know if picking
    questions from TOEFL was a good metric. However, they do claim that it
    performed better than an average person.

    - Among the four scores they use for finding synonyms they say that score 3
    reduces the risk of scores 1 and 2 finding an antonym as likely as a
    synonym. However, they did not really explain it clearly nor present good
    evaluation that shows this.

    * Identify two important, open research questions on the topic, and
    why they matt
    -----------------------------------------------------------------------
    - Can this be extended to work on finding similar sentences or groups of
    sentences and thus understand the language itself? How hard or easy will
    that be?

    - We can also try to see how this algorithm works on larger and varying data
    sets and how well it performs as opposed to an average human.


  • Next message: Beltran Ibarra Davila-Armero: "Review 3"

    This archive was generated by hypermail 2.1.6 : Wed Dec 08 2004 - 01:26:14 PST