PROVERB review

From: Alan L. Liu (aliu_at_cs.washington.edu)
Date: Mon Dec 08 2003 - 00:02:44 PST

  • Next message: Russell Power: "Review: PROVERB - The Probablistic Cruciverbalist"

    Paper title: PROVERB: The Probabilistic Cruciverbalist
    Authors: Greg A. Keim, Noam M. Shazeer, Michael L. Littman et. al.

    Summary: PROVERB combines a large crossword puzzle database with "expert
    modules" that generate probability distributions over word candidates.
    It fills out crossword puzzles by merging the distributions and
    selecting the most probable words.

    Most important ideas:
    There are a bunch of techniques to approach the problem of crossword
    puzzle solving. Instead of picking one, make a module that does each one
    and then learn how to weight their outputs, metacrawler fashion.

    The authors point out that solving crosswords involve using information
    that is readily available. PROVERB uses information retrieval techniques
    to wrangle a plethora of information found in encyclopedias,
    dictionaries, and thesauri to produce probability distributions over
    candidate words.

    Flaws:
    Where did the scale, length-scale, and spread parameters for merging
    expert modules come from, and what do they mean?

    Just using the crossword database, the authors were able to guess
    correct words 87.6% of the time, while adding syntactic, database, and
    information retrieval modules, they increased it to 94.8%. That means
    those techniques helped solve only half the problems that CWDB modules
    couldn't by themselves. It seems clear that even something simple like
    the syntactic modules would help boost the success because CWDB can't
    solve problems it hasn't seen before, so what how are the other modules
    actually helping? The graph they have doesn't isolate the effectiveness
    of each module, but only heaps them together to show the effectiveness
    of cumulatively adding modules. With that graph, one cannot tell if the
    information retrieval modules by themselves couldn't have accounted for
    the success rate, without any of the other non-CWDB modules! Those
    non-CWDB modules also get a lowly 27.1% word-solving rate without CWDB
    -- is that good considering how not all problems have a corpus of
    problems and solutions?

    Open research questions:
    How can solving crossword puzzles be extended to other, more useful
    domains? Using IR is fascinating -- (but) it reminds me of the Cyc
    philosophy of using available knowledge to do more informed search. The
    rules for guessing words from crossword clues is more structured than
    general queries, and the data that PROVERB uses is more structured too,
    so would turning it onto the biggest source of information -- the
    Internet -- would probably not work too well (or would it)?


  • Next message: Russell Power: "Review: PROVERB - The Probablistic Cruciverbalist"

    This archive was generated by hypermail 2.1.6 : Mon Dec 08 2003 - 00:02:44 PST