Processing XML Streams with Deterministic
Automata
Review by Lucas Kreger-Stickles
In their paper the authors present a system for
evaluating XPath expressions through the use of DFAs. The authors
point out that while in the worst case DFAs for XPath expressions grow
exponentially, this worst case is not often encountered. They indicate
that much of this exponential growth can be atributed to * and //
operations in the XPath expressions. The authors then go on to
describe how the use of a DFA which is constructed 'lazily', improves
the practicality of such a technique.
The authors then describe the mechanisms of their system. First
a set of querries are given and turned into a querry tree (which
consisits of many XPath instructions), then this tree is converted into
an NFA which is finally converted into a DFA.
The authors indicate that processing XPath with DFAs is quite
efficent but that it is the size that can cause problems. It is in
this context that the authors compare and contracst eager and lazy
DFA construction. While an eager DFA is constructed at the begining of
compuation a lazy DFA is constructed on the fly as needed. The hope is
that the vast majority of states int eh DFA will never need to be
evaluated, thus keepign the size of the DFA in check.
Finaly, the authors conducted limited experiments to validate their
technology and which showed a 'warm-up' phase (durring which the lazy
DFA was adding states quite frequently) followed by an extended period
of high troughput.