PROVERB Review

From: Brian Ferris (bdferris@cs.washington.edu)
Date: Mon Dec 06 2004 - 11:44:54 PST

  • Next message: Katarzyna Wilamowska: "(no subject)"

    "PROVERB: The Probabilistic Cruciverbalist"
    Keim, G. A.; Shazeer, N.; Littman, M. L.; Agarwal, S.; Cheves, C. M.;
    Fitzgerald, J.; Grosland, J.; Jiang, F.; Pollard, S.; and Weinmeister,
    K.

    PROVERB is a probabilistic system for solving cross-word puzzles that
    aggregates the results of multiple solver models, each using a
    different approach, with implicit grid constraints to produce solutions
    that are, on average, better than a regular human solver but not
    competitive at the championship level.

    One of the first key ideas in the paper is the importance of previous
    training data learned from the existing crossword database (CWDB).
    When researchers removed all modules that relied on the CWDB in any
    way, the average percentage of words correct dropped from 94.8% to
    27.1%. A much smaller drop to 87.6% was observed when CWDB modules
    were used exclusively. Additionally, the PROVERB(I) system, which used
    implicit letter distributions learned from the CWDB training set,
    performed with higher accuracy than vanilla PROVERB on the tournament
    puzzle set, though a cost of increased runtime. These results are
    important in demonstrating that the key solving competence of the
    PROVERB system comes from data learned from existing puzzles and not
    other non-CWDB-based information retrieval and database modules.

    Another important idea from the paper was techniques for merging the
    variety of individual solver module solution candidates into a cohesive
    waited solution candidate set. As the author points out, such a task
    has relevance in meta-search engine result combination as well.
    Their technique successfully merges candidates from a given module m,
    by re-normalizing the probabilities and confidences using the scale(m),
    length-scale(m) and spread(m). These parameters give explicit control
    over how solver module data is incorporated into the final candidate
    list, with optimal settings being set using hill-climbing.

    One of the major flaws with the paper is the authors' failure to
    further justify the relevancy of the work. While there is a certain
    novelty to solving crossword puzzles with an AI system, such novelty is
    usually fortified with novel techniques that result from the
    exploration of the problem. The reliance of the system on the existing
    CWDB and the authors' acknowledgment that the results would have been
    possible without the recent increases in computing power suggests that
    the authors have constructed a large expert system that was previously
    not possible due to memory and processing constraints. As demonstrated
    by the tournament results, the solver is only as good as the puzzles it
    has been trained to solve. Specifically, most of the solver modules
    are hand-designed for solving specific clue types, making the system in
    general susceptible to new and novel clues. While the system is an
    impressive combination of existing techniques, its lack of new
    techniques and general robustness suggests that is not a system of
    lasting importance.

    Further research could potentially explore ways to increase the
    performance of non-CWDB solver modules in the system. As discussed
    previously, PROVERB is highly reliant on training data from existing
    puzzles, somewhat to its detriment. If the techniques from natural
    language processing and knowledge representation / inference could be
    pushed hard to apply to this realm, the solver would evolve from a
    large expert system to a solver that is actually reasoning and more
    robust as result. Additional research could also be done in
    quantifying crossword puzzle difficulty. Such research could be useful
    for both quantifying how well solvers and doing but also assist in the
    automated generation of new puzzles, potentially making them tricky for
    man and machine alike.


  • Next message: Katarzyna Wilamowska: "(no subject)"

    This archive was generated by hypermail 2.1.6 : Mon Dec 06 2004 - 11:43:34 PST