Relational Databases for Querying XML Documents

From: Steven Balensiefer (alaska@cs.washington.edu)
Date: Sun Apr 18 2004 - 21:34:34 PDT

  • Next message: Alexander Moshchuk: "Review 2"

    This paper explores the possibilities and limitations of using relational
    database systems to query XML data. The authors note that the semi-structured
    format of XML seems to lend itself better to semi-structured queries, but that
    discarding 20 years of work in relational databases should not be done offhand.

    The authors rely on DTDs (Document Type Descriptors) to generate relational
    schemas. To handle the tremendous complexity of DTDs they proposed a number of
    simplifying transformations that reduced the DTD to a form closer to a standard
    relational schema.

     To do the translation from the DTD to relational schema several fundamental
    difficulties were addressed, and three different algorithms were described and
    evaluated. The biggest difficulties were caused by XML's support of set-valued
    attributes and recursion. To store sets, a separate relation was created with
    the values in the set. Recursion was handled by defining a DTD-graph, then
    extracting a subgraph for the current element which was traversed to a fixed
    depth. The algorithms differed in the amount of sharing they utilized, and the
    criteria used to determine when an attribute should be inlined.

     In addition to the conversion from XML to a relational DB the paper covered
    translation of XML queries written in XML-QL and Lorel to SQL. In general this
    required establishing the relations containing the elements on the path from
    the root, and specifying the appropriate joins. Obviously there's more to it
    than I've described, but they also omitted details for space reasons

     The final step of the query process was generating XML results from the SQL
    query results. This illustrated the need for "tag variables" in the relations
    to specify the types for the output. The relational model was not well suited
    for providing answers that could be easily translated back to complex results.

     I thought this paper was an interesting approach to handling XML data. Sadly,
    the results were sufficiently poor to suggest that semi-structured queries may
    be a better solution, which has been borne out by the development of Xquery.

     I felt that the authors did a good job of choosing important material while
    excluding extraneous low-level details. It may be standard procedure in the DB
    community, but I was surprised not to see any comparisions of the algorithms in
    with respect to actual running time for given queries. My only thought is that
    the metrics they chose would be constant regardless of DBMS or system
    configuration. Even so, I still think it might have been useful piece of
    information.

      I consider this paper as a "signpost" to the rest of the DB community that
    further research in this area might not be wise. Fortunately Xquery has matured
    to the point of providing queries on XML documents.


  • Next message: Alexander Moshchuk: "Review 2"

    This archive was generated by hypermail 2.1.6 : Sun Apr 18 2004 - 21:34:37 PDT