Processing XML Streams with Deterministic Automata

From: Neva Cherniavsky (nchernia@cs.washington.edu)
Date: Wed May 05 2004 - 08:45:12 PDT

  • Next message: Steven Balensiefer: "Lazy DFAs for XML processing"

    This paper describes a method for processing XML streams by converting all
    XPath expressions to DFAs. Previously, it was known that the number of
    states in the DFA is exponential. The authors show two types of DFA
    evaluation, eager and lazy, and show that the former is exponential but
    the latter is not.

    The eager DFA evaluation is also shown to not always be exponential;
    multiple "*"s cause it to be exponential, which occurs infrequently in
    practice. But the "//" construct also causes it to be exponential, which
    is a much bigger problem. Lazy evaluation means that the DFA states are
    computed as needed, and the authors were able to theoretically bound the
    size of the lazy DFA. It is polynomial in the number of states, though it
    depends on the graph schema or DTD for this; a pathologically poor design
    could cause the lazy DFA to have an exponential number of states.

    The authors then validated their method by running it on XML data, both
    real and synthetic. They did better on the synthetic data. More
    discussion about why they ran out of memory on one of the synthetic data
    items would have been useful.

    Overall, I thought this was a great paper. The authors took two known
    ideas, DFAs for XML and lazy DFA evaluation, and synthesized them in a
    beautiful way. The theoretical results are good, and give future
    designers guidance on how to built XML schema that can be processed
    quickly in a stream setting.


  • Next message: Steven Balensiefer: "Lazy DFAs for XML processing"

    This archive was generated by hypermail 2.1.6 : Wed May 05 2004 - 08:45:13 PDT