Paper 6 review

From: Bhushan Mandhani (bhushan@cs.washington.edu)
Date: Wed May 05 2004 - 11:22:33 PDT

  • Next message: Lucas Kreger-Stickles: "Review: Processing XML Streams with Deterministic Automata"

            Processing XML Streams with Deterministic Automata

    The paper proposes a solution to the problem of applying a large number of
    XPath expressions to a XML stream. A single DFA is constructed, and then
    applied to the XML stream. The main issue here is, this was not thought
    possible because of an exponential blowup in the number of DFA states with
    the number of expressions.

    The paper starts out by showing upper bounds for the number of states. For
    the eager DFA, the bound is exponential in the number of expressions
    containing the descendant axis, and thus eager DFA's are ruled out as
    unfeasible. For the lazy DFA's, bounds are separately given for the cases
    when the data is characterized by a simple graph schema or a data guide.
    These bounds convincingly establish the feasibility of using lazy DFA's.
    The best thing about these bounds is that they are independent of the
    number of XPath expressions, and instead depend on parameters which
    characterize the data itself.

    This is then backed up by solid experiments, which verify that the number
    of states generated in a lazy DFA is very manageable, for a variety of
    synthetic and real datasets. The number of states is particularly small
    for real datasets. Further, the data throughput is found to be independent
    of the number of XPath expressions, which is again remarkable. The bounds
    above hinted this might be the case, and the experiments verified this.
    Further, the throughput is orders of magnitude better than that of the
    other method, XFilter.

    Overall, the paper has interesting theoretical results which establish the
    feasibility of using lazy DFA's for efficiently processing XML streams,
    and this is convincingly established by the experiments.


  • Next message: Lucas Kreger-Stickles: "Review: Processing XML Streams with Deterministic Automata"

    This archive was generated by hypermail 2.1.6 : Wed May 05 2004 - 11:22:34 PDT