From: Alan L. Liu (aliu_at_cs.washington.edu)
Date: Mon Dec 08 2003 - 00:02:44 PST
Paper title: PROVERB: The Probabilistic Cruciverbalist
Authors: Greg A. Keim, Noam M. Shazeer, Michael L. Littman et. al.
Summary: PROVERB combines a large crossword puzzle database with "expert
modules" that generate probability distributions over word candidates.
It fills out crossword puzzles by merging the distributions and
selecting the most probable words.
Most important ideas:
There are a bunch of techniques to approach the problem of crossword
puzzle solving. Instead of picking one, make a module that does each one
and then learn how to weight their outputs, metacrawler fashion.
The authors point out that solving crosswords involve using information
that is readily available. PROVERB uses information retrieval techniques
to wrangle a plethora of information found in encyclopedias,
dictionaries, and thesauri to produce probability distributions over
candidate words.
Flaws:
Where did the scale, length-scale, and spread parameters for merging
expert modules come from, and what do they mean?
Just using the crossword database, the authors were able to guess
correct words 87.6% of the time, while adding syntactic, database, and
information retrieval modules, they increased it to 94.8%. That means
those techniques helped solve only half the problems that CWDB modules
couldn't by themselves. It seems clear that even something simple like
the syntactic modules would help boost the success because CWDB can't
solve problems it hasn't seen before, so what how are the other modules
actually helping? The graph they have doesn't isolate the effectiveness
of each module, but only heaps them together to show the effectiveness
of cumulatively adding modules. With that graph, one cannot tell if the
information retrieval modules by themselves couldn't have accounted for
the success rate, without any of the other non-CWDB modules! Those
non-CWDB modules also get a lowly 27.1% word-solving rate without CWDB
-- is that good considering how not all problems have a corpus of
problems and solutions?
Open research questions:
How can solving crossword puzzles be extended to other, more useful
domains? Using IR is fascinating -- (but) it reminds me of the Cyc
philosophy of using available knowledge to do more informed search. The
rules for guessing words from crossword clues is more structured than
general queries, and the data that PROVERB uses is more structured too,
so would turning it onto the biggest source of information -- the
Internet -- would probably not work too well (or would it)?
This archive was generated by hypermail 2.1.6 : Mon Dec 08 2003 - 00:02:44 PST