review-2

From: Pravin Bhat (pravinb@u.washington.edu)
Date: Sun Dec 05 2004 - 23:36:50 PST

  • Next message: Vaishnavi Sannidhanam: "Review 2"

    Proverb: The Probabilistic Cruciverbalist
    Keim et al
    ********************************************************************************
    The paper tackles the "AI-complete" problem of solving crossword puzzles by
    describing an architecture that performs better than an average human on a
    popular class of crossword puzzles - namely the New York Times(NYT) style crossword
    puzzles.

    Paper Strengths:
    The key contribution made by this paper is the learning algorithm to generate
    candidate word lists for a given clue. This learning algorithm can be thought of
    as a variant of the "ensemble of recognizers" technique that uses probability theory.
    The authors manage to build a strong probabilistic recognizer that recognizes the
    candidate crossword-targets given a clue using an ensemble of weak recognizers.
    Individually the strongest recognizer in the system, the Dijkstra3 module, recognizes
    the right target as it's best candidate with an accuracy of 13.3%. But the merger
    manages to the combine the results from all the recognizers to raise the accuracy
    level to 40.9% after which the grid-filling algorithm takes over.

    The Dijkstra module described in the paper was particularly interesting. Essentially
    this technique manages to build a n-gram model for a high dimensional space, here
    the number of dimensions is equal to the number of words in the dictionary, using
    sparse training data and limited memory. Given a large enough data corpus the authors
    could have built an accurate n-gram model by simply counting frequency of word-pairs,
    word-triplets, etc. Instead the authors relax this calculation by calculating n-grams
    as a summation of individual word-pair bigrams. The bigram probabilities are stored
    in a graph instead of a table to minimize memory consumption. The authors further
    strengthen their n-gram model using the assumption that most phrases in English language
    continue to make sense when one of its word is replaced by an "equivalent" word. To
    account for such phrase transformations the authors add synonyms to their word-pairs
    during the bigram calculations.

    Paper weaknesses:
    In a sense the paper is disappointing once the authors reveal their techniques. One
    has to wonder if their system really "solves" crossword puzzles. PROVERB is more like
    the non-Chinese speaking human in the Chinese room experiment. PROVERB has a set of
    rules, some of which are encoded using probabilities of prior examples, which are
    used to make a shallow analysis of a given crossword puzzle. The solver's performance
    could drop at the slightest changes - say if the current editor of NYT crossword puzzles
    retires or if NYT has an independence day special where all the answers are
    ex-president names.

    On a similar note, the authors mention that once they tested their recognizer on a NYT
    dataset they further debugged and modified their expert-modules before testing it on
    another NYT dataset. While it is reasonable for a learning algorithm to require
    constants/parameters to be hand-tuned for a given data-domain it is unreasonable for
    the learning algorithm to require changes to the algorithm itself to account for the
    changes in the domain. The authors in the "modification" stage could have strengthen
    their rule based expert-modules, like the transformation-module, specifically for NYT
    type crossword puzzles which would skew their performance results.

    Future work:
    The learning component of PROVERB needs to be further analyzed for its dependence on
    hand written rules. It would be interesting to see how PROVERB performances on
    non-American crosswords after training on the right dataset without any changes to
    the rule-based modules. Another simple test would be to see how well PROVERB performs
    on NYT crossword puzzles from several decades ago or maybe on crossword puzzles from
    a rival newspaper that uses a different style of clues.

    Another area of research would be analysis of global clue patterns from partially
    generated results. For example, if PROVERB recognized that every candidate-list generated
    so far contains the name of an ex-president then it could hypothesize that all targets for the
    current puzzle involve president names. This hypothesis could then be tested on results from
    other clues. Highly probable hypothesis would then be enforced in the grid-filling step.


  • Next message: Vaishnavi Sannidhanam: "Review 2"

    This archive was generated by hypermail 2.1.6 : Sun Dec 05 2004 - 23:36:51 PST