Review 3

From: Beltran Ibarra Davila-Armero (bida@cs.washington.edu)
Date: Wed Dec 08 2004 - 02:22:47 PST

  • Next message: Pravin Bhat: "(no subject)"

    *Mining the Web for synonyms: PMI-IR versus LSA on TOEFL_ *

    By Peter D. Turney

    This article describes two techniques to find synonyms, PMI-IR and LSA,
    and discus their different results on the TOEFL test and the ESL test.

    One of the major ideas is that up to now, programs that looked for
    synonyms, often used only small databases or at least not as big as the
    Web. The idea of PMI-IR is that by querying the web one can find
    synonyms pretty accurately. Of course, the quality of the result depends
    on the quality of the query, but with good queries one can get better
    results than querying a “small” database (up to 10% better).

    A cool idea developed by this article is the way it has formulated
    different ways of querying the search engine in order to get better
    results. I like that evolution from the only AND operator to the
    combination of AND, NEAR, NOT and the context. Although, as he admits
    that his program uses brute force, he tried to refine it a little.

    One of the flaws of this article is the poor experimentation. I have
    taken the TOEFL exam several times (the first time when I was thirteen
    years old) and I, as well as many other people, have always considered
    it as an easy test, specially those questions where you have four
    choices (which narrow the search a lot!!!!). I don’t think that TOEFL
    testing was a good idea. Maybe he did it to test his program on the same
    basis as the LSA technique, but then he should have reduced the database
    to a comparable size than the database used by the LSA.

    Another flaw is that he does not explain what pushed them to have those
    particular scores and how they improved the results. For example, he
    does not explain the improvements brought by score_3 in terms of
    avoiding antagonisms. Maybe some deeper experimentation on this would
    have been welcomed.

    Also, I thought that the experimentation did not seem very pushed. I
    mean that 80 TOEFL test do not really prove that a technique is really
    superior to another. When one thinks of the amount of TOEFL tests
    available (I think it is the most common English exam in the world), I
    guess that there are millions, 80 seems a very poor number.

    Finally, the end of the article seems to give a lot of future work in
    terms of improvements. For an AAAI paper there are many improvements
    maybe that could have been done before publishing it.

    I guess that one of the most obvious open questions is what would happen
    if there were not any choices. Would this kind of mining be effective?
    That is a field to explore, since when we need synonyms, we do not
    always have access to possibilities.

    And could this technique be used to find other kind of related words
    like antagonisms, for example?


  • Next message: Pravin Bhat: "(no subject)"

    This archive was generated by hypermail 2.1.6 : Wed Dec 08 2004 - 02:22:53 PST