Using the old equipment

From: Charles Giefer (cgiefer@cs.washington.edu)
Date: Mon Apr 19 2004 - 16:30:14 PDT

  • Next message: Joe Xavier: "Review : Relational Databases for Querying XML documents ..."

    Relational Databases for Quering XML Documents: Limitations and
    Opportunities
    Shanmugasundaram et al.

    This paper focuses on using standard relational database technology to
    process XML documents and their related queries. There are four main
    aspects of this process. The first is creating a schema for the relational
    database from XML information (specifically DTDs). The second is to load
    the XML document into the relational database. The third is to translate
    queries over the XML document into queries into the relational database.
    The fourth is to output the results in XML format.

    There are several problems that are encountered in this process. One such
    problem is that XML is not in first normal form. Therefore, it is difficult
    to come up with a schema that describes the data completely, efficiently,
    and compactly. They posed three methods to derive a schema based on DTD
    information: Basic, Shared and Hybrid. Basic was the simplest translation
    but very inefficient because it made a unique table for each element.
    Shared made tables only for the shared elements and root elements. Hybrid
    is a minor modification of the Shared method and reduces the number of
    tables in some cases (but increases the number of elements in the tables).
    Hybrid and Shared were the better solutions and their related performances
    were compared. I was slightly surprised that only qualitative results were
    used to compare performance instead of an additional quantitative evaluation
    of performance (execution time) on a real database. Maybe this is an
    insignificant point, but it would have been nice to see which schema allowed
    the fastest queries (not simply the fewest number of joins--there must be
    more factors).

    Simplifying the DTDs was one way they made this translation easier. They
    took complicated DTD rules and reduced them into smaller and simpler rules.
    Their transforms lost typing information, and their defense for why this is
    possible seemed a little problematic. It clearly does not permit
    roundtripping, but it may be "good enough" for this method. Also, the XML
    document must already conform to the DTD, so we know we have valid data.

    Converting the XML queries to SQL queries is a major issue that was not
    fully addressed in this paper. It talks about some methods, but it is by no
    means comprehensive. There are no formal semantics for the translation. It
    is not even clear if SQL can express everything that can be expressed in an
    XML query.

    Finally, the translation back to XML was almost trivial. One exception that
    was addressed was where related elements are nested using a "order by"
    qualifier.

    While they don't claim to be a comprehensive analysis of the topic, this
    paper does a good job of introducing many key components to using a
    relational database to query an XML document. The biggest contribution in
    my opinion is the translation from XML to a relational database.
    Automatically deriving the relational database schema is a significant
    contribution. The remaining contributions seem a little weak in their
    development, especially the translation between queries.


  • Next message: Joe Xavier: "Review : Relational Databases for Querying XML documents ..."

    This archive was generated by hypermail 2.1.6 : Mon Apr 19 2004 - 16:30:15 PDT