From: CR (chrisre@cs.washington.edu)
Date: Mon Apr 19 2004 - 07:35:37 PDT
This paper discusses translation of XML documents to relational engines
for more efficient querying. The authors’ main point is that by
performing this translation we can take advantage of the considerable
maturity of relational technology, especially with respect to performance.
This paper discusses how to get the schema-like DTD of an xml document
into a relational database. As we have discussed, this is a challenge
because XML documents can have essentially set-valued attributes, which
precludes them from being in first normal form. The two main methods are
basic and shared, though basic seems to be there to prove how badly a
naïve scheme will perform. They do not even run basic in the results
section.
One oddity is that they did not address how types were handled. It
seemed from their examples that almost everything was an uninterpreted
string. This is odd because the types of attributes can cause large
differences in runtime performance in a database – and they are
essentially turning their back on this. Also, if the DBA is supposed to
put in this information why are they precluded from changing the schema?
It seems clear to me that a person would still be required in this setup
but their role with respect to this tool is unclear. I suppose this is
because the paper’s main motivation was to contend that this translation
was actually possible.
Translation of the language seemed weak but I do not know the climate at
the time. However it seems odd to me to have paths of only length 3. I
do not know which ideas are original to this paper but the idea of
treating tagged values as attributes is a good one. I also suppose that
the climate was such that people were considering modifications to
databases to support XML rather than trying to think about the
translation. For example they propose set valued attributes as an
extension to traditional engines.
I think as a general rule people should be cautious of identifying
limitations. Precluding the possibility of efficient solution, while
common to computer science, still must be done judiciously.
This archive was generated by hypermail 2.1.6 : Mon Apr 19 2004 - 07:35:39 PDT