Review

From: Adrienne Wang (axwang@cs.washington.edu)
Date: Wed Dec 08 2004 - 11:40:20 PST

  • Next message: Adrienne Wang: "Review"

    Title: Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL
    Author: Peter D. Turney

    Summary: A simple unsupervised learning algorithm PMI-IR is
    designed for recognizing synonyms, which uses Pointwise Mutual
    Information (PMI) and Information Retrieval (IR) to measure the
    similarity of pairs of words.

    Important ideas: 1. Pointwise Mutual Information will give an
    score to estimate the sematic similarity between two words. Not
    only the scores considers the synonyms, but also the antonyms and
    context-dependent words. In LSA paper, they claimed the MI
    analysis would give a similar accuracy. But actually it turns out
    PMI is better. 2. Using Web as the data source, the usual
    difficulty for sematic similarity measure is the sparseness of the
    data. But Web provides a good and huge data source. In there
    project, they use Alta-vista search engine.

    Weak points: The two methods use different databases, so the
    results seems not comparable to each other. Especially some
    researchers have pointed out that LSA would perform better than
    PMI if given the same database. So probably the reason for the
    good performance of PMI is because of the huge data for Web, and
    even from search engine.

    Possible research directions: 1. Scale the LSA up to Web size
    database and then test the two methods. 2. Do PMI-IR on
    encyclopedia text scale database and test the two performance.


  • Next message: Adrienne Wang: "Review"

    This archive was generated by hypermail 2.1.6 : Wed Dec 08 2004 - 11:42:34 PST