From: Atri Rudra (atri@cs.washington.edu)
Date: Sun Apr 18 2004 - 19:28:24 PDT
Relational Databases for Querying XML Documents: Limitations and Opportunities
------------------------------------------------------------------------------
This paper presents techniques to convert XML documents to relational
tuples, translate semi-structured queries over XML document to SQL
queries and convert the results back to XML. As authors argue this
approach has the merit of using techniques developed over 20 years
vis-a- vis the relatively new techniques of semi-structured query languages.
The approach is divided into three broad steps:
(i) Conversion of DTDs into relational schemas (and the corresponding
conversion of the XML documents into relational tuples). First the regular expressions
in the DTD are simplified. These set of transformations can
result in the loss of information about the relative orders (which the
authors point out can be retained by adding extra information). However, I
am not sure how representing the "+" operator by the "*" operator does not
introduce extra information. The authors then consider converting the
simplified DTDs to relational schemas by inlining techniques: that is,
"pack" as many descendants of an element into a single tuple as possible.
This methodology, termed as basic is presented with techniques to handle
set-valued attributes and recursion. This approach however, results in
many tables being created. The idea in Shared technique is to create one
table for element nodes that are shared and share them. The authors also
consider the Hybrid technique which is sort of a middle ground between
the Basic and Shared techniques. Evaluations of the two techniques are
also presented (the Basic technique ran out of memory in many sample DTDs
and is omitted from the results).
(ii) Converting semi-structured queries to SQL. The authors give conversion
techniques for simple path queries, simple recursive path queries and
arbitrary path expression (the last one is converted into a bunch of
simple recursive path queries).
(iii) Converting results of SQL queries to XML. The authors consider a
some cases and give techniques for conversion. However, the techniques
presented for queries which return complex XML elements are not
very satisfactory.
Overall, the main bottleneck seems to be in Step (iii). The authors
mention some techniques which if incorporated in the relational systems
would aid in easier handling of XML queries. It would be interesting to
see how Chris addresses these issues on his conversion work.
The paper as a whole was not as much a satisfying read as the Essence of
XML paper which may in turn be due to the fact that while this work is more of
a hack while the other paper was a crisp and sound theoretical result :-)
This archive was generated by hypermail 2.1.6 : Sun Apr 18 2004 - 19:28:26 PDT