Review #6

From: Atri Rudra (atri@cs.washington.edu)
Date: Wed May 05 2004 - 11:47:25 PDT

  • Next message: Michael Gubanov: "DFA"

    Processing XML Streams with Deterministic Automata
    --------------------------------------------------

    Todd J. Green et al.

    The authors consider the problem of evaluating a large number of XPath
    expressions on an XML stream. The basic idea is to represent all the
    XPath expressions as a single deterministic finite automata (DFA).
    The basic procedure is to create an NFA from the queries and then
    to convert it to a DFA. Storing the whole DFA turns out to be infeasible
    because of the large number of states. The authors propose doing the NFA
    to DFA conversion on the fly: this has the advantage that only the states
    required to be expanded are created. The authors provided theoretical
    bounds on the number of states in the eager DFA
    (the bound is exponential and the authors also show a matching lower bound)
    and the lazy DFA (the bound depends on the maximum depth of any
    XPath query and not on the total number of such queries). The authors
    also provide experimental validation of their results.

    Here are my questions/comments:

    (*) The authors do not consider filters/predicates in the XPath queries
        and mention they would slow down the throughput: are there studies
        done to characterize this slowdown ?

    (*) The whole DFA model assumes that the XML documents are well formed
        (that is atleast the tags match). Are there ways to check this
        assumption in a streaming model ?

    (*) The authors also only consider query sets: can general queries be
        stated equivalently in terms of query sets. If not how would the
        results presented in this paper change ?

    (*) It seems to me that the whole eager vs lazy DFA things is like a
        space-time tradeoff (the eager DFA would take more space
        but should be faster as it has everything precomputed while
        the lazy DFA uses up lesser space but is slower as it has to
        do the computations on the fly). Is there any merit to some
        hybrid scheme: for example expand certain portions of the DFA
        and expand the rest of it on the fly: maybe use some sort of
        caching ?


  • Next message: Michael Gubanov: "DFA"

    This archive was generated by hypermail 2.1.6 : Wed May 05 2004 - 11:47:26 PDT