Project Requirements

The course project is due Tuesday, March 6. The project will be completed in groups, and a written proposal will be due by each group on Thursday of the third week. Some possible project ideas are described below. Original ideas are welcome, but should be approved by the instructor before the proposal deadline.

  1. Indexing patterns

    Consider a large semistructured database assumed to be a single XML file, but much larger than main memory. For a limited class of queries (to be determined) design indexing techniques to minimize disk i/o. Analyze and compare the efficiency of each proposed index, taking into account the cost of maintainence over updates to the database.

    This problem remains unsolved today.  Readings on related topics are:

  2. Xpath containment

    Choose a manageable, well-defined class of Xpath expressions and investigate algorithms for deciding the containment of such expressions. The main technique to be used here is that of containment of conjunctive queries.

    Related reading:  Peter T. Wood,  in Dood 2000,On the Equivalence of XML Patterns

  3. Xpath query pruning

    Given an Xpath expression over XML data conforming to a given schema, investigate techniques for transforming the original expression into a simpler expression that returns the same results over all data conforming to the schema. Define a reasonable subset of Xpath and of XML-Schema for which such pruning works. There is some relevant literature on this topic.

    Some readings on query pruning:


  4. Storing XML as relational data

    Compare and analyze proposed methods of storing XML in a relational DBMS. There is a some literature on this topic. This project would require digesting existing literature, and either proposing original improvements, or providing a comparison of known techniques.

    There are three main publications on this topic: 

  5. Publishing relational data as XML

    Assess the current state of methods for publishing XML views of relational data, both research prototypes and database vendor capabilities. All three major database vendors offer XML extensions to their database systems: Oracle, IBM, Microsoft. Provide a comparison study of their approaches.

    There are two reading on this topic:
  6. Bulk processing of recursive transformations

    XML is processed recursively with XSLT, essentially a language for defining recursive functions. But recursive tree traversals are inefficient to implement and impossible to optimize. Bulk processing  of such transformations recognizes some restricted forms for recursive processing that can be implemented with standard relational operators. Define a fragment of XSLT that can be processed in bulk, and implement a translator from that fragment to SQL. There is some relevant literature that needs to be digested for this project.

    The only reading here is a bulky journal paper (  Come to my office, and I can explain you in a few minutes what is going on. It is not trivial: but it is exciting and highly relevant.

  7. Processing ordered collections in SQL

    SQL processes only unordered collections (bags). XML query languages on the other hand must consider ordered collections, since XML is an ordered data model. Can operations on ordered collections be implemented efficiently in SQL ? In this project you are required to design a suite of techniques for processing ordered collections in SQL, and implement a translator from a query language on ordered collections into SQL, implementing those techniques. As starting point, represent an ordered collection by adding an extra integer field that stores the "index" of each tuple in the collection.