Review of "PROVERB: The Probabilistic Cruciverbalist - G. A. Keim et. al."
Harsha V. Madhyastha

This paper presents a system called PROVERB which has been developed to solve
crossword puzzles. The basic idea behind the system is to adopt a two-tier
approach. First, the crossword puzzle is solved using a wide variety of expert 
modules which specialize in a range of domains such as information retrieval, 
probabilistic reasoning, etc. Each expert module associates a probability with 
every answer it proposes. These probabilistic list of answers for each clue 
from every expert module are then merged by another module to produce the
final solution. The objective employed in merging these lists is to maximum
the number of clues correctly solved - the reasoning behind this being that
this is the scoring criterion in human crossword tournaments.

As this problem had never been addressed before this work, the biggest
contribution of this paper is that it shows that a system can be built to
solve crossword. Moreover, in spite of this being the first attempt at building
a crossword solver, the system obtains extremely high accuracy. This paper
clearly provides motivation for other researchers to further explore this
problem. Given the design choice made for the system by the authors, the other 
major strength of the paper is the wide range of expert modules it explores.
30 different expert modules were built which not only used several data
sources but also employed varying techniques such as latent semantic indexing
and n-grams.

However, though the paper describes so many varied expert modules, it does
just that. It only briefly touches upon how each expert module works and there
is hardly any analysis comparing their relative performance. Finally, the
reader is left clueless which expert module works well for what types of clues
and why. An aspect of the system that I fear about is its over-dependence on
the crossword database. In fact the difference in accuracy shown by the
all-or-nothing analysis clearly highlights this over-dependence. Though the
authors defend this approach by the fact that most human crossword solvers
also become experts based on experience, this limits the system's
applicability across crosswords from different sources.

Hence, I believe one of the important questions to be addressed is whether a
system to solve crosswords can be built without being so overly dependent on
access to similar historical crosswords. Do there exist some rules that can be
learnt that are valid across different crossword sources? Another interesting 
research question to be addressed is whether a system can be built to solve 
British-style cryptic crosswords, instead of the highly simplistic style of 
crosswords addressed in this paper. The reason I find this interesting is that 
unlike in the easy crosswords accounted for by PROVERB, the clues in cryptic 
crosswords have more information as each clue usually has two different ways of 
getting at the answer. So, in fact, cryptic crosswords might be easier to tame 
using machine power!