Review 2

From: Stef Sch... (stef@cs.washington.edu)
Date: Mon Dec 06 2004 - 12:17:47 PST


Note: I apologize for the lateness of this submission. I was out of
town all weekend, and this is the first access I've had in several
days.

Title: Proverb: The probabilistic Cruciverbalist
Main Authors: Greg A. Keim, Noam M. Shazeer, Michael L. Littman
Additional Authors: Sushant Agarwal, Catherine M. Cheves, Joseph
Fitzgerald, Jason Grosland, Fan Jiang, Shannon Pollard, Karl
Weinmeister

This paper poses solving crossword puzzles, ranging from simple to the
complex New York Times ones, as an AI problem. They use several
techniques to generate candidate answers for each clue, and use a
probability model to weigh how likely each answer is. They attempt to
solve the puzzle by looking at the crossword grid and candidate answers
as a set of constraints.

Strengths:
One of the most important contributions of this paper was their
analysis of the crossword domain, from clue types to analyzing what
makes a crossword difficult (such as the NY Times crosswords towards
the end of the week). They also observed that most clues fall within a
certain set of categories, and based on that, they can exploit previous
puzzles in order to find solutions to future ones.

They used a very large set of heuristics for generating candidate
answers, thus exploiting a wide range of general and domain specific
knowledge. The "Djikstra" modules seemed particularly useful for
generating previously unknown candidates, since they weight the
likelihood based on "document frequency", which should boost unknown
answers from the wordlist and wordlist-big generators as appropriate.
Furthermore, by using this variety heuristics, and doing a weighted
combination of them, the candidate answers from the domain specific
ones (such as music or geography) would be boosted and most likely to
be considered. Additionally, the transformation heuristics was a good
trick that would inherently expand the knowledge contained in the CWDB.

They also provided a good overall explanation of how their system
works, how each of the generators generates candidates, and how the
candidates from each system are combined to form an

Weaknesses:
The authors don't provide much analyses of their system. They go to
great lengths to describe the setup, each of the different models, and
how the models are combined, but they only report on how successful it
was over the test set. With the wordlist and wordlist big generators,
I would assume that enough of the correct targets can be successfully
generated, such that the one or two that are incorrectly generated (on
average) could be compensated for. Namely, the other clues should
provide enough constraints that even an unknown multi-word target
should be inferable. It would be interesting to see how and why the
solver fails when it does.

They also didn't explain why they chose a hill climbing method to
decide weights. They didn't mention if they used randomized restarts
or how they searched the space to ensure that they weren't stuck in a
local maxima. They had sufficient data to where they could use a
variety of methods. I'd also be curious to see if performance improved
using some form of stacking, bagging, boosting, etc. (since they had a
very large set of data, this would be a viable test).

Open research questions:
How would this do with larger databases, or even access to the web.
With search engines such as google indexing over 8 billion pages, that
should provide enough information to glean clues.

Additionally, how difficult would it be to modify the solver to more
closely reflect the ACPT puzzle format, and how would it compare to
expert solvers if it were so modified.



This archive was generated by hypermail 2.1.6 : Mon Dec 06 2004 - 12:16:50 PST