Review #3

From: Martha Mercaldi (mercaldi@cs.washington.edu)
Date: Wed Dec 08 2004 - 11:52:56 PST

  • Next message: Katarzyna Wilamowska: "review 3"

    Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL

    Peter D. Turney

     

    Summary:

    This paper presents an algorithm for solving synonym word problems.

     

    Primary ideas:

     

    The author states that the primary contribution from this work is the
    coupling of existing PMI techniques with existing IR
    techniques. Several potential PMI scoring functions are presented and
    their subtleties are discussed. One central observation is that
    relatively simple scoring functions can capture a surprising large
    amount of information about word meaning.

     

    One or two largest flaws:

     

    I did not think that this paper explained the context of the work
    clearly enough. I’m not an expert in this area, and I found it
    difficult to discern what parts were new algorithms and what parts were
    new techniques applied to existing algorithms. I gather (perhaps
    incorrectly) that PMI was an existing technique, the three scoring
    functions were newly developed (if this is the case what scoring
    function had been used in the past with PMI?) and that coupling PMI
    with IR was the primary contribution of this paper. 

     

    My other complaint is a scientific one. Whatever search engine is used,
    AltaVista here, might have its own search and correlation algorithms it
    uses under the covers. Perhaps this is due to my living in the age of
    Google and in fact AltaVista was much more primitive. However it seems
    appropriate to at least mention how AltaVista classified something such
    as “nearness” if those algorithms are to be incorporated. Otherwise it
    is hard to tell if the performance improvement when going from score1()
    to score2() is due to the scoring function or some behavior internal to
    the search engine.

     

    Open research questions:

     

    One interesting question that I do not think was fully addressed in
    this paper was the synergy between the PMI and the IR algorithms
    used. With the great strides made in IR in the past 5 years, revisiting
    this work might reveal interesting improvements in performance. 

     

    The author cites automated extraction of keywords as his ultimate
    goal. Personally I feel that as far as scientific literature goes, the
    authors generally annotate their work with keywords already and that
    the relative small amount of literature does not provide much
    motivation for automation. However the idea of a browser annotating
    webpages with keywords could be helpful for a user. The massive number
    of pages on the web surely motivate automation of the process.

     


  • Next message: Katarzyna Wilamowska: "review 3"

    This archive was generated by hypermail 2.1.6 : Wed Dec 08 2004 - 11:52:59 PST