From: Neva Cherniavsky (nchernia@cs.washington.edu)
Date: Wed May 05 2004 - 08:45:12 PDT
This paper describes a method for processing XML streams by converting all
XPath expressions to DFAs. Previously, it was known that the number of
states in the DFA is exponential. The authors show two types of DFA
evaluation, eager and lazy, and show that the former is exponential but
the latter is not.
The eager DFA evaluation is also shown to not always be exponential;
multiple "*"s cause it to be exponential, which occurs infrequently in
practice. But the "//" construct also causes it to be exponential, which
is a much bigger problem. Lazy evaluation means that the DFA states are
computed as needed, and the authors were able to theoretically bound the
size of the lazy DFA. It is polynomial in the number of states, though it
depends on the graph schema or DTD for this; a pathologically poor design
could cause the lazy DFA to have an exponential number of states.
The authors then validated their method by running it on XML data, both
real and synthetic. They did better on the synthetic data. More
discussion about why they ran out of memory on one of the synthetic data
items would have been useful.
Overall, I thought this was a great paper. The authors took two known
ideas, DFAs for XML and lazy DFA evaluation, and synthesized them in a
beautiful way. The theoretical results are good, and give future
designers guidance on how to built XML schema that can be processed
quickly in a stream setting.
This archive was generated by hypermail 2.1.6 : Wed May 05 2004 - 08:45:13 PDT