From: Brian Ferris (bdferris@cs.washington.edu)
Date: Mon Dec 06 2004 - 11:44:54 PST
"PROVERB: The Probabilistic Cruciverbalist"
Keim, G. A.; Shazeer, N.; Littman, M. L.; Agarwal, S.; Cheves, C. M.;
Fitzgerald, J.; Grosland, J.; Jiang, F.; Pollard, S.; and Weinmeister,
K.
PROVERB is a probabilistic system for solving cross-word puzzles that
aggregates the results of multiple solver models, each using a
different approach, with implicit grid constraints to produce solutions
that are, on average, better than a regular human solver but not
competitive at the championship level.
One of the first key ideas in the paper is the importance of previous
training data learned from the existing crossword database (CWDB).
When researchers removed all modules that relied on the CWDB in any
way, the average percentage of words correct dropped from 94.8% to
27.1%. A much smaller drop to 87.6% was observed when CWDB modules
were used exclusively. Additionally, the PROVERB(I) system, which used
implicit letter distributions learned from the CWDB training set,
performed with higher accuracy than vanilla PROVERB on the tournament
puzzle set, though a cost of increased runtime. These results are
important in demonstrating that the key solving competence of the
PROVERB system comes from data learned from existing puzzles and not
other non-CWDB-based information retrieval and database modules.
Another important idea from the paper was techniques for merging the
variety of individual solver module solution candidates into a cohesive
waited solution candidate set. As the author points out, such a task
has relevance in meta-search engine result combination as well.
Their technique successfully merges candidates from a given module m,
by re-normalizing the probabilities and confidences using the scale(m),
length-scale(m) and spread(m). These parameters give explicit control
over how solver module data is incorporated into the final candidate
list, with optimal settings being set using hill-climbing.
One of the major flaws with the paper is the authors' failure to
further justify the relevancy of the work. While there is a certain
novelty to solving crossword puzzles with an AI system, such novelty is
usually fortified with novel techniques that result from the
exploration of the problem. The reliance of the system on the existing
CWDB and the authors' acknowledgment that the results would have been
possible without the recent increases in computing power suggests that
the authors have constructed a large expert system that was previously
not possible due to memory and processing constraints. As demonstrated
by the tournament results, the solver is only as good as the puzzles it
has been trained to solve. Specifically, most of the solver modules
are hand-designed for solving specific clue types, making the system in
general susceptible to new and novel clues. While the system is an
impressive combination of existing techniques, its lack of new
techniques and general robustness suggests that is not a system of
lasting importance.
Further research could potentially explore ways to increase the
performance of non-CWDB solver modules in the system. As discussed
previously, PROVERB is highly reliant on training data from existing
puzzles, somewhat to its detriment. If the techniques from natural
language processing and knowledge representation / inference could be
pushed hard to apply to this realm, the solver would evolve from a
large expert system to a solver that is actually reasoning and more
robust as result. Additional research could also be done in
quantifying crossword puzzle difficulty. Such research could be useful
for both quantifying how well solvers and doing but also assist in the
automated generation of new puzzles, potentially making them tricky for
man and machine alike.
This archive was generated by hypermail 2.1.6 : Mon Dec 06 2004 - 11:43:34 PST