From: Danny Wyatt (danny@cs.washington.edu)
Date: Mon Apr 19 2004 - 11:42:48 PDT
The authors present, refine, and evaluate a method for automatically
creating an RDBMS schema from an XML DTD. Since they are only
interested in using the RDBMS to facilitate queries on XML documents
they relax a DTD's requirements so to preserve only which children an
XML entity can have and whether they are optional or required, and
single- or set-valued. Thus any XML entity is treated as a bag of child
entities. They ultimately settle on a transformation that inlines
shared children unless those children are set-valued. This duplicates
data, but no more so than the original document. More importantly, it
reduces the number of joins needed to fulfil single-element queries.
They're querying is mostly efficient, but returning the results is
another matter. If an element has multiple set-valued children and a
query wants the element in its entirety---not at all unusual---they have
to initiate a large join-a-thon and even then cannot properly merge the
results without methods external to the RDBMS.
Another shortcoming is their treatment of elements that may contain any
child element. These children are left unparsed and thus un-queryable.
However, this also suggests a solution to returning set-valued elements
without extending the RDBMS to break 1st normal form. Resting on the
(unvalidated) assumption that queries for entire elements are common, it
could be worthwhile to store the unparsed text of an element with its
tuple. When there is a query for the entire element, this text could be
returned without requiring the multiple joins needed to reassemble the
element. The joins would then be used for more granular queries
(queries with longer paths) while queries that "bottom out" at a given
node can just use the stored version of the entire node,
This archive was generated by hypermail 2.1.6 : Mon Apr 19 2004 - 11:42:53 PDT