Relation Databases for Querying XML Documents

From: Stavan Parikh (stavan@cs.washington.edu)
Date: Mon Apr 19 2004 - 11:28:53 PDT

  • Next message: Lucas Kreger-Stickles: "Review: Realtional Databases for Querying XML Documents"

    In this paper Shammugasundaram et.al investigate the use of relational databases to query XML documents. They argue that there are two approaches of querying semi-structured data such as XML. One is to develop a semi-structured query language (which ended up being the case with XQuery) or to leverage the work done in the relational databases in some way. In my opinion their basic premise was very sound. An efficient mechanism by which XML could be queried using SQL would have been a big win.

    To allow querying XML data using SQL there are three steps proposed. First is to convert the XML data to a relational database. To do this they use techniques like flattening the data and inlining as many descendants as possible For inlining they propose multiple schemes which they term as 'Basic', 'Shared' and 'Hybrid' which is a combination of the first two. Basic suffers from various drawbacks because it creates a large number of relations. Shared tries to ameliorate this by ensuring that each element is represented in only one node. This leads to better efficiency but increases the number of joins needed to query at a particular node. The hybrid approach tries to take the best of both worlds.

    The next step is to convert semi-structured queries to SQL. They propose schemes to handle simple path expressions, recursive path expressions and finally arbitrary expressions by converting them to queries of the first two types. The final challenge is converting the results back to XML. It seems that the authors glossed over some of the results here like handling nested queries. Overall the scheme proposed here seemed the most inelegant solution of the three parts as they handle each possible case separately without any major generalizations.

    The evaluation of this work showed that while their approach worked for some data sets, it did not apply easily across the board. This they attributed to limitations of the SQL query scheme and proposed various extensions that would make it easier to query xml type data. It was interesting to see that some of the things they propose as possible SQL extensions have found their way into the XQuery standard. Overall this paper showed that leveraging work in the SQL world to query XML is not easy and the solution they propose is inelegant. The work pretty much shows that semi-structured query languages might be the better option (which is born out today).


  • Next message: Lucas Kreger-Stickles: "Review: Realtional Databases for Querying XML Documents"

    This archive was generated by hypermail 2.1.6 : Mon Apr 19 2004 - 11:28:53 PDT