From: Alexander Moshchuk (anm@cs.washington.edu)
Date: Mon Apr 19 2004 - 01:20:24 PDT
Summary: This paper discusses the approach of using existing relational
database techniques to process XML documents.
The paper argues that we could apply some of the mature technology
behind existing relational database systems to query XML, as opposed
to constructing new systems based on semi-structured query
languages. The authors' basic approach consists of four steps:
- A relational schema is determined using a DTD for a given XML
document. During the process, many DTD details have to be simplified.
Three techniques (basic, shared, hybrid of the two) are given to map
DTD elements to relations, all relying on first converting a DTD to a
graph representation. The approaches differ in which elements are
inlined and when new relations are created.
- The XML document is parsed and its data is loaded into relational
tables of some DBMS.
- The original queries for the XML document are converted into SQL
queries to run on corresponding relational data. This involves
identifying the relation corresponding to the root and translating
path expressions to joins, but it gets more complicated with
recursion. Since semi-structured queries are more powerful and
flexible than SQL, this process can be awkward and incomplete for
arbitrary path expressions.
- After SQL queries are executed, the results can be converted back
into XML form. The authors identify this as the most difficult part,
with many complexity and efficiency issues. To help deal with these
issues (and those from query translation), the authors propose a set
of modifications to existing relational systems.
The paper presents this approach but doesn't give any useful
evaluation or comparison to running original semi-structured queries
on XML, and as a result, it is unclear whether there will be practical
benefits. Also, they evaluate their schema conversion techniques
using the number of joins, which to me didn't seem like a good metric;
perhaps the actual cost of running the queries would be more helpful.
Finally, it seems some of their DTD simplifications and
generalizations were too strict and could lead to inefficiencies in
converted data; e.g. an element x of form (y,z,y) is simplified into
(y*,z), and instead of storing two y fields in a relation for x, a
separate relation must be created mapping x's and y's.
Overall, the paper presents an interesting approach to querying XML
data, but after reading it I'm led to believe that perhaps it's easier
to use RDMS experience to add enough power to a semi-structured system,
rather than to hack existing relational technology to remove
limitations that authors identified and then still go through the
complex (and slow?) translation process for every query.
This archive was generated by hypermail 2.1.6 : Mon Apr 19 2004 - 01:20:28 PDT