Review 2

From: Alexander Moshchuk (anm@cs.washington.edu)
Date: Mon Apr 19 2004 - 01:20:24 PDT

  • Next message: Aaron Chang: "review 2"

    Summary: This paper discusses the approach of using existing relational
    database techniques to process XML documents.

    The paper argues that we could apply some of the mature technology
    behind existing relational database systems to query XML, as opposed
    to constructing new systems based on semi-structured query
    languages. The authors' basic approach consists of four steps:

    - A relational schema is determined using a DTD for a given XML
    document. During the process, many DTD details have to be simplified.
    Three techniques (basic, shared, hybrid of the two) are given to map
    DTD elements to relations, all relying on first converting a DTD to a
    graph representation. The approaches differ in which elements are
    inlined and when new relations are created.

    - The XML document is parsed and its data is loaded into relational
    tables of some DBMS.

    - The original queries for the XML document are converted into SQL
    queries to run on corresponding relational data. This involves
    identifying the relation corresponding to the root and translating
    path expressions to joins, but it gets more complicated with
    recursion. Since semi-structured queries are more powerful and
    flexible than SQL, this process can be awkward and incomplete for
    arbitrary path expressions.

    - After SQL queries are executed, the results can be converted back
    into XML form. The authors identify this as the most difficult part,
    with many complexity and efficiency issues. To help deal with these
    issues (and those from query translation), the authors propose a set
    of modifications to existing relational systems.

    The paper presents this approach but doesn't give any useful
    evaluation or comparison to running original semi-structured queries
    on XML, and as a result, it is unclear whether there will be practical
    benefits. Also, they evaluate their schema conversion techniques
    using the number of joins, which to me didn't seem like a good metric;
    perhaps the actual cost of running the queries would be more helpful.
    Finally, it seems some of their DTD simplifications and
    generalizations were too strict and could lead to inefficiencies in
    converted data; e.g. an element x of form (y,z,y) is simplified into
    (y*,z), and instead of storing two y fields in a relation for x, a
    separate relation must be created mapping x's and y's.

    Overall, the paper presents an interesting approach to querying XML
    data, but after reading it I'm led to believe that perhaps it's easier
    to use RDMS experience to add enough power to a semi-structured system,
    rather than to hack existing relational technology to remove
    limitations that authors identified and then still go through the
    complex (and slow?) translation process for every query.


  • Next message: Aaron Chang: "review 2"

    This archive was generated by hypermail 2.1.6 : Mon Apr 19 2004 - 01:20:28 PDT