paper review: Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL

From: Mathias Ganter (mganter@u.washington.edu)
Date: Wed Dec 08 2004 - 09:34:28 PST

  • Next message: Stef Sch...: "Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL"

    Authors and Title

    Turney, Peter (2001) Mining the Web for Synonyms: PMI-IR versus LSA on
    TOEFL. In De Raedt, Luc and Flach, Peter, Eds. Proceedings Proceedings of
    the Twelfth European Conference on Machine Learning (ECML-2001), pages pp.
    491-502, Freiburg, Germany.

     

    Remarks

    This paper by P. Turney presents an unsupervised learning algorithm based on
    the concept of co-occurrence for recognizing synonyms by analyzing responses
    to queries sent to an online search engine. The implemented algorithm uses
    PMI and IR to measure the consistency between pairs of queries and answers
    and is finally evaluated by using TOEFL and by comparison with both the
    performance of LSA and the performance of non-English US College applicants.
    There are 4 scores of increasing sophistication yielding an increasing
    percentage of correct answers.

     

    The major concept of this paper is the implementation and use of an
    unsupervised learning algorithm to extract information from the biggest
    source of information you may wish, i.e. the World Wide Web thus answering a
    specific question. It assigns a score to each possible candidate and selects
    the choice that maximizes the score. It is interesting to see how knowledge
    of semantics increases this score - because the interpretation of literary
    language and spoken language can be considered as really difficult for
    computers (as a professor once told me). The author outlines the algorithms
    performance compared to non-machine-learning systems that lack to perform
    well in areas of expertise. It is also mentioned that most of the hard work
    done to find synonyms is done by the search engine and not by their
    algorithm, outlining the importance of these search engines.

     

    The major flaws of this paper are the restricted set of queries given, the
    comparison of PMI-IR and LSA that is not possible without inaccuracy and the
    outlining of various future applications of the PMI-IR algorithm without
    ever giving them a try (I think that it is not a good idea to mention all
    future research interests when they are not fully developed).

    In addition, I am missing an accurate explanation of the algorithm.
    Furthermore, they concentrate too much on applications and forget to mention
    more points on the machine learning.

     

     

    Well, the open research questions are totally clear:

    - implementing and testing all the suggestion they give in their
    paper

    - increasing the query set to more difficult words

    - decreasing the running time by multi-threading and reducing network
    traffic

    - Why did they choose the TOEFL and not the more difficult CSE as the
    query source?

    - Can this idea be implemented on other semantic questions?

     


  • Next message: Stef Sch...: "Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL"

    This archive was generated by hypermail 2.1.6 : Wed Dec 08 2004 - 09:34:29 PST