paper review #3

From: Jiun-Hung Chen (jhchen@cs.washington.edu)
Date: Wed Dec 08 2004 - 01:04:57 PST

  • Next message: jklink@u.washington.edu: "Paper review 3 (Jonas Klink)"

    Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL
    Peter D. Turney

    Review by Jiun-Hung Chen

    1. Summary
    This paper proposes an unsupervised learning algorithm which uses Pointwise Mutual Information (PMI)
    and Information Retrieval (IR) for learning synonyms.

    2. Most important ideas
    The most important idea in this paper is to perform a task by issuing queries to a search engine and
    analyzing the replies to the queries. I think people use this idea all the time and it works very well.
    For example, you want to eat some Japanese food but don't know where to eat. You may send a query
    like "good Japanese restaurant in Seattle" to Google, analyze the replies and then decide on a restaurant.
    The key points are WWW is a huge database and search engines can provide very useful and reliable replies
    to queries. To formulate learning synonyms as an unsupervised learning and to solve this problem by analyzing
    cooccurence are difficult and interesting. I believe that the success can be ascribed to the key insight
    that a word is characterized by the company it keeps. In contrast, a supervised learning for synonyms
    seems to be intuitive and trivial.
     
    3. Largest flaws
    The largest flaw is that the author exaggerates comparisons in the abstract
    although he does mention that comparisons between PMI-IR and LSA are biased
    because experiments are not done under the same conditions. On the other hand,
    I think fair comparisons are missing. The other flaw is that hits may be good estimates for probabilities
    but the author does not verify this point.

    4. Open research questions
    Extending this work to finding relations in sentences, paragraphs or documents by mining the web can be very
    important and useful for natural language processing and understanding. Furthermore, mining the web for
    visual information such as images and movies is challenging because no obvious structure information
    such as grammars is available.


  • Next message: jklink@u.washington.edu: "Paper review 3 (Jonas Klink)"

    This archive was generated by hypermail 2.1.6 : Wed Dec 08 2004 - 01:04:57 PST