From: Steven Balensiefer (alaska@cs.washington.edu)
Date: Sun Apr 18 2004 - 21:34:34 PDT
This paper explores the possibilities and limitations of using relational
database systems to query XML data. The authors note that the semi-structured
format of XML seems to lend itself better to semi-structured queries, but that
discarding 20 years of work in relational databases should not be done offhand.
The authors rely on DTDs (Document Type Descriptors) to generate relational
schemas. To handle the tremendous complexity of DTDs they proposed a number of
simplifying transformations that reduced the DTD to a form closer to a standard
relational schema.
To do the translation from the DTD to relational schema several fundamental
difficulties were addressed, and three different algorithms were described and
evaluated. The biggest difficulties were caused by XML's support of set-valued
attributes and recursion. To store sets, a separate relation was created with
the values in the set. Recursion was handled by defining a DTD-graph, then
extracting a subgraph for the current element which was traversed to a fixed
depth. The algorithms differed in the amount of sharing they utilized, and the
criteria used to determine when an attribute should be inlined.
In addition to the conversion from XML to a relational DB the paper covered
translation of XML queries written in XML-QL and Lorel to SQL. In general this
required establishing the relations containing the elements on the path from
the root, and specifying the appropriate joins. Obviously there's more to it
than I've described, but they also omitted details for space reasons
The final step of the query process was generating XML results from the SQL
query results. This illustrated the need for "tag variables" in the relations
to specify the types for the output. The relational model was not well suited
for providing answers that could be easily translated back to complex results.
I thought this paper was an interesting approach to handling XML data. Sadly,
the results were sufficiently poor to suggest that semi-structured queries may
be a better solution, which has been borne out by the development of Xquery.
I felt that the authors did a good job of choosing important material while
excluding extraneous low-level details. It may be standard procedure in the DB
community, but I was surprised not to see any comparisions of the algorithms in
with respect to actual running time for given queries. My only thought is that
the metrics they chose would be constant regardless of DBMS or system
configuration. Even so, I still think it might have been useful piece of
information.
I consider this paper as a "signpost" to the rest of the DB community that
further research in this area might not be wise. Fortunately Xquery has matured
to the point of providing queries on XML documents.
This archive was generated by hypermail 2.1.6 : Sun Apr 18 2004 - 21:34:37 PDT