Paper 2 Review

From: Bhushan Mandhani (bhushan@cs.washington.edu)
Date: Mon Apr 19 2004 - 12:12:01 PDT

  • Next message: Charles Giefer: "Using the old equipment"

    Relational Databases for Querying XML Documents: Limitations and Opportunities
               - Shanmugasundaram et al

    This paper describes an approach for storing XML documents in relational
    databases. The XML DTD is used to generate the relational schema. XML
    documents are parsed, and stored as tuples in the database. XML queries
    are converted into corresponding SQL queries, and the results converted
    back to XML.

    The method described for converting a XML DTD to a relational schema is
    quite intuitive. The DTD is represented as a graph, and elements are
    mapped to relations, with descendants of an element being inlined into the
    same relation as much as possible. However, subelements which are sets or
    which recurse back to the ancestor element require separate relations, and
    thus, we can expect a lot of fragmentation. Further, the description of
    the conversion from a DTD to a relational schema is a bit hand-waving at
    times.

    They evaluate their method by the average number of SQL joins needed to
    process a path expression of a given length. This certainly seems to be a
    right metric to use. In the experimental results, however, there is no
    clear winner between the Shared and Hybrid inlining techniques, and no
    real effort has been made to characterize when one works better than the
    other, in terms of the properties of the DTD (which I guess, should be
    possible). Further, in most cases, the number of joins was almost equal
    to the length of the path expression, and thus, the cost of joins in query
    processing will probably be too much, making this approach inefficient.

    It is clear that the process of conversion of an arbitrary path expression
    to corresponding SQL queries is quite complex, and in some cases, not
    possible at all. This is in addition to the high computational cost of
    the SQL queries themselves. Finally, the conversion of relational query
    results back into XML is also an issue. It can be quite difficult to
    answer queries which return complex XML elements.

    The paper concludes by mentioning possible extensions to existing
    relational database technology that will make them more suitable for XML
    storage. I doubt that this approach of storing XML in relational databases
    is going to be more efficient than having a query engine specifically
    designed for XML.


  • Next message: Charles Giefer: "Using the old equipment"

    This archive was generated by hypermail 2.1.6 : Mon Apr 19 2004 - 12:12:02 PDT