From: Ankur Jain (ankur@cs.washington.edu)
Date: Fri May 07 2004 - 09:43:19 PDT
Processing XML Streams with Deterministic Automata - Green et al.
The conventional way to evaluate a large number of XPath expressions on an
XML stream was to construct NFA's. But inspite of heavy optimizations, the
throughput achieved using these NFA's was orders of magnitude lower than
what the authors had in mind. Converting the NFA's to DFA's leads to an
exponential blowup of states and thus was a seemingly infeasible approach.
The paper's most significant contribution comes here where they show that
under some assumptions on the XML data, in lazily constructed DFA's, the
number of states is manageable - and hence can used to efficiently solve the
problem at hand. They follow up this insight with some solid analytical
arguments and experimental results to make their approach convincing.
If the intention is, as mentioned in the introduction, to make forwarding
decisions based on XPath expressions, then this approach is understandable,
otherwise if it is really just about collecting statistical information
about the data stream, Monte-Carlo techniques would perhaps have made more
sense. I was also wondering if such a large number of XPath queries on XML
streams is indeed a reality even in the former scenario (perhaps this doubt
is more to do with what exactly are XML routers, where are they deployed,
what is their functionality).
This archive was generated by hypermail 2.1.6 : Fri May 07 2004 - 09:43:21 PDT