Review : Relational Databases for Querying XML documents ...

From: Joe Xavier (joexav@microsoft.com)
Date: Tue Apr 20 2004 - 01:00:25 PDT

Next message: Edu-Software Sales Team: "Academic Discounts for Software 20476"

Previous message: Aaron Chang: "review 1"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

Review : Relational Databases for Querying XML documents: Limitations and Opportunities
*********************************************************************************************************

Excellent paper on using existing relational technology to store and query XML (semi-structured) documents. The paper describes a variations of a technique to map DTDs to relational schemas. It then compares the techniques in terms of query performance since the techniques result in different kinds of relational schema. It does a great job of analyzing how the structure of the XML and the query itself affects the performance results obtained using the different techniques.

Drawbacks:
The biggest flaw in the paper is their failure to adequetely address the issue of preserving order when mapping XML to a relational schema. They gloss over this issue completely with a breief mention that this is an issue to be considered. In my opinion, their experiments and quantitative results lose a lot since they haven't addressed this issue. In order to preserve order (document order for the XML elements) would requre storing more attributes in relations. This can result in more joins and maybe more SQL queries (although I'm not sure of the latter).
Another flaw that could have been addressed easily which would have made this paper more attractive from an implementation perspective is the fact that not every set-valued child needed to be flattened and inlined. They could have spent some time dealing with the concept of annotating elements in the DTD (or XML Schema) that don't need to be flattened out. These elements (all under one node) could be stored as a flat string in a single column in a table. These can still be searched if there's a full-text index on the column. This would be useful for paths that users don't expect to ever query using a path query. From 9i onwards, Oracle allows creating specific path indexes on their XMLType columns. SQL Server allows creating path indexes on XML Datatype columns but this indexes every path.
Although they claim that this technique is extensible to XML Schema the same way as DTD I'm not sure I believe this. XML Schema can have a great amount of complexity that DTDs don't even start to come close to.
Another concept that the paper fails to mention is an edge-table representation of XML that can be an alternative to their approach.
Their treatment of XML construction was very simplistic. They glossed over all the hard things in construction and only dealt with the simple cases.

Kudos:
Overall the paper was a great read. It was easy to understand and covers all the really important issues (except as pointed out above). To be fair, they touh upon some of the issues in their conclusion.

Next message: Edu-Software Sales Team: "Academic Discounts for Software 20476"

Previous message: Aaron Chang: "review 1"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.6 : Tue Apr 20 2004 - 01:00:35 PDT