XML querying via RDBMS

From: Danny Wyatt (danny@cs.washington.edu)
Date: Mon Apr 19 2004 - 11:42:48 PDT

  • Next message: Bhushan Mandhani: "Paper 2 Review"

    The authors present, refine, and evaluate a method for automatically
    creating an RDBMS schema from an XML DTD. Since they are only
    interested in using the RDBMS to facilitate queries on XML documents
    they relax a DTD's requirements so to preserve only which children an
    XML entity can have and whether they are optional or required, and
    single- or set-valued. Thus any XML entity is treated as a bag of child
    entities. They ultimately settle on a transformation that inlines
    shared children unless those children are set-valued. This duplicates
    data, but no more so than the original document. More importantly, it
    reduces the number of joins needed to fulfil single-element queries.
    They're querying is mostly efficient, but returning the results is
    another matter. If an element has multiple set-valued children and a
    query wants the element in its entirety---not at all unusual---they have
    to initiate a large join-a-thon and even then cannot properly merge the
    results without methods external to the RDBMS.

    Another shortcoming is their treatment of elements that may contain any
    child element. These children are left unparsed and thus un-queryable.
    However, this also suggests a solution to returning set-valued elements
    without extending the RDBMS to break 1st normal form. Resting on the
    (unvalidated) assumption that queries for entire elements are common, it
    could be worthwhile to store the unparsed text of an element with its
    tuple. When there is a query for the entire element, this text could be
    returned without requiring the multiple joins needed to reassemble the
    element. The joins would then be used for more granular queries
    (queries with longer paths) while queries that "bottom out" at a given
    node can just use the stored version of the entire node,


  • Next message: Bhushan Mandhani: "Paper 2 Review"

    This archive was generated by hypermail 2.1.6 : Mon Apr 19 2004 - 11:42:53 PDT