PROVERB Review

From: Martha Mercaldi (mercaldi@cs.washington.edu)
Date: Sat Dec 04 2004 - 22:56:08 PST

  • Next message: Harsha V. Madhyastha: "PROVERB review"

    PROVERB: The Probabilistic Cruciverbalist

    Greg A. Keim, Noam M. Shazeer, Michael L. Littman

     

    Summary:

    This paper presents the design, architecture, and evaluation of PROVERB
    a crossword puzzle solving system.

    Two primary ideas:

    Most of the words in this paper were spent describing the architecture
    and various ways software might try to solve a crossword (approximating
    solver experience via a database, classifying clues types, identifying
    and describing several solution techniques). I presume this was the
    first paper on the project, and that is the reason for such detail.

    The second half of the paper evaluated the solver’s performance on a
    set of “normal” puzzles as well as in tournament play. The authors
    identify some clear successes, in particular the use of probabilistic
    reasoning when merging proposed targets to fill the grid. I was
    disappointed that this analysis didn’t go further and in addition to
    discussing the clues the program solved correctly, also investigate the
    clues that stumped it.

    One or two largest flaws:

    A friend of mine who contributes crosswords to the NYT and has read
    this paper elaborated that PROVERB’s poor performance at the tournament
    was due primarily to one puzzle on which it failed miserably. In that
    puzzle, all of the clues were “Spoonerized” (e.g. “Home is near” was
    given instead of “Nome is here” as a clue for “Alaska”). Humans
    (extremely clever ones, anyway) can detect this, but the software could
    not. I think this is very interesting both anecdotally and in that it
    reveals something about the flaws of the system. The common opinion is
    that if the software had competed in any other year it would have made
    a better showing. This leads to a criticism I had of the paper even
    before learning the full story. I had been looking forward to an
    investigation of what types of clues tripped up the program and why the
    authors felt it failed.   

    This is a smaller complaint, but I personally would have like to have
    seen where in Table 2 (characterizing NYT puzzle difficulty by day) the
    tournament puzzles fit. The assumption is that they were even more
    difficult than the Saturday puzzles at the NYT, but it would have been
    nice to show this. 

    Open research questions:

    As evidenced from my earlier comments I am very interested in the
    “failure analysis” for this program. I think studying the clues which
    stumped the existing system would in the least interesting case reveal
    a shortcoming of the program and in the most interesting case reveal
    something about how humans solve puzzles. This might produce a meaty
    challenge for the AI community to try to incorporate some other aspect
    of human thinking.

    This paper talks about “expert” modules, and it would be interesting to
    consult the real experts (the humans who beat it at the tournament) to
    see how they classify and solve clues in their mind. In the current
    architecture the clue answering and the merging steps are separated,
    but I know that when I solve these puzzles the steps are interleaved
    with each half providing me some information to help in the
    other. Perhaps looping between the two steps would improve performance.

     

     


  • Next message: Harsha V. Madhyastha: "PROVERB review"

    This archive was generated by hypermail 2.1.6 : Sat Dec 04 2004 - 22:56:13 PST