From: Michael Gubanov (mgubanov@cs.washington.edu)
Date: Wed May 05 2004 - 12:00:51 PDT
Paper: Processing XML Streams with Deterministic Automata
Summary: The paper proves the feasibility of using eager/lazy DFA
construction in practice
for XML stream processing. It bounds the number of states DFA could have
in the worst case in respect to the number of XPath expressions to be
evaluated
on the XML stream.
Main Ideas:
- For eager DFA the number of "*" is the only source of exponential blowup
in the case of a single, linear XPath expression. For multiple expressions
more important is the number of expressions containing "//" and it is
indeed the source of the exponential growth.
- Whereas, query trees converted to DFAs and used for XML stream filtering
do not support predicates directly, it could be worked around easily
by decomposing them into several XPath expressions.
- Eager DFA has no more than k(1+n) states in the case with no wild-cards
and there is an exponential upper bound in the case with when
there are wild-cards.
- The extra exponential blowup in the case with eager DFA and multiple
XPath expressions is exactly the number of expressions with "//".
- In the case with lazy DFA it is surprising that the number of states
does not depend on the number of expressions at all, but depends on their
depth.
- The number of states in lazy DFA is limited by 1+G where G is a data guide
of an XML stream.
Relevance:
I think it is an interesting and useful work. As Web-Services spread more
widely, therefore making XML streams essential for data exchange, fast XML
stream
processing will become one of the main issues.
This archive was generated by hypermail 2.1.6 : Wed May 05 2004 - 12:00:57 PDT