From: Charles Giefer (cgiefer@cs.washington.edu)
Date: Mon Apr 19 2004 - 16:30:14 PDT
Relational Databases for Quering XML Documents: Limitations and
Opportunities
Shanmugasundaram et al.
This paper focuses on using standard relational database technology to
process XML documents and their related queries. There are four main
aspects of this process. The first is creating a schema for the relational
database from XML information (specifically DTDs). The second is to load
the XML document into the relational database. The third is to translate
queries over the XML document into queries into the relational database.
The fourth is to output the results in XML format.
There are several problems that are encountered in this process. One such
problem is that XML is not in first normal form. Therefore, it is difficult
to come up with a schema that describes the data completely, efficiently,
and compactly. They posed three methods to derive a schema based on DTD
information: Basic, Shared and Hybrid. Basic was the simplest translation
but very inefficient because it made a unique table for each element.
Shared made tables only for the shared elements and root elements. Hybrid
is a minor modification of the Shared method and reduces the number of
tables in some cases (but increases the number of elements in the tables).
Hybrid and Shared were the better solutions and their related performances
were compared. I was slightly surprised that only qualitative results were
used to compare performance instead of an additional quantitative evaluation
of performance (execution time) on a real database. Maybe this is an
insignificant point, but it would have been nice to see which schema allowed
the fastest queries (not simply the fewest number of joins--there must be
more factors).
Simplifying the DTDs was one way they made this translation easier. They
took complicated DTD rules and reduced them into smaller and simpler rules.
Their transforms lost typing information, and their defense for why this is
possible seemed a little problematic. It clearly does not permit
roundtripping, but it may be "good enough" for this method. Also, the XML
document must already conform to the DTD, so we know we have valid data.
Converting the XML queries to SQL queries is a major issue that was not
fully addressed in this paper. It talks about some methods, but it is by no
means comprehensive. There are no formal semantics for the translation. It
is not even clear if SQL can express everything that can be expressed in an
XML query.
Finally, the translation back to XML was almost trivial. One exception that
was addressed was where related elements are nested using a "order by"
qualifier.
While they don't claim to be a comprehensive analysis of the topic, this
paper does a good job of introducing many key components to using a
relational database to query an XML document. The biggest contribution in
my opinion is the translation from XML to a relational database.
Automatically deriving the relational database schema is a significant
contribution. The remaining contributions seem a little weak in their
development, especially the translation between queries.
This archive was generated by hypermail 2.1.6 : Mon Apr 19 2004 - 16:30:15 PDT