From: Bhushan Mandhani (bhushan@cs.washington.edu)
Date: Mon Apr 19 2004 - 12:12:01 PDT
Relational Databases for Querying XML Documents: Limitations and Opportunities
- Shanmugasundaram et al
This paper describes an approach for storing XML documents in relational
databases. The XML DTD is used to generate the relational schema. XML
documents are parsed, and stored as tuples in the database. XML queries
are converted into corresponding SQL queries, and the results converted
back to XML.
The method described for converting a XML DTD to a relational schema is
quite intuitive. The DTD is represented as a graph, and elements are
mapped to relations, with descendants of an element being inlined into the
same relation as much as possible. However, subelements which are sets or
which recurse back to the ancestor element require separate relations, and
thus, we can expect a lot of fragmentation. Further, the description of
the conversion from a DTD to a relational schema is a bit hand-waving at
times.
They evaluate their method by the average number of SQL joins needed to
process a path expression of a given length. This certainly seems to be a
right metric to use. In the experimental results, however, there is no
clear winner between the Shared and Hybrid inlining techniques, and no
real effort has been made to characterize when one works better than the
other, in terms of the properties of the DTD (which I guess, should be
possible). Further, in most cases, the number of joins was almost equal
to the length of the path expression, and thus, the cost of joins in query
processing will probably be too much, making this approach inefficient.
It is clear that the process of conversion of an arbitrary path expression
to corresponding SQL queries is quite complex, and in some cases, not
possible at all. This is in addition to the high computational cost of
the SQL queries themselves. Finally, the conversion of relational query
results back into XML is also an issue. It can be quite difficult to
answer queries which return complex XML elements.
The paper concludes by mentioning possible extensions to existing
relational database technology that will make them more suitable for XML
storage. I doubt that this approach of storing XML in relational databases
is going to be more efficient than having a query engine specifically
designed for XML.
This archive was generated by hypermail 2.1.6 : Mon Apr 19 2004 - 12:12:02 PDT