Paper 5 review

From: Bhushan Mandhani (bhushan@cs.washington.edu)
Date: Mon May 03 2004 - 11:01:27 PDT

  • Next message: Michael Gubanov: "views"

            Information Integration Using Logical Views
                        - Jeff Ullman

    The paper starts out by describing CQ's and tests for checking their
    containment, including the cases when they are extended thru negated
    subgoals and comparison operators. It then describes query answering using
    views. The views themselves are CQ's. I was surprised that so much space
    was devoted to this review of known theory. It then looks at information
    integration, which is identified as one of the important current problems,
    and goes on to describe two mediator-based data integration systems: IM
    and Tsimmis.

    IM has a global schema in terms of which queries are expressed. Each data
    source publishes one or more views, again defined using this global
    schema. IM answers queries using these views. Although this query
    rewriting problem is NP-complete, queries are typically short enuf for
    this to not be a problem. I thought IM had some appealing features. Having
    a global schema certainly makes it easy to pose queries to the system.
    Adding new data sources is convenient in IM, the newly added source just
    needs to export appropriate views. The idea of rewriting queries to answer
    them using views is elegant, and brings out the great practical utility of
    these theoretical ideas. I guess one downside is that the data sources are
    restricted, since they are forced to export views over this fixed schema.

    Tsimmis has a completely different architecture, with a hierarchy of
    interacting mediators and wrappers. Data is exchanged in the OEM model,
    which I feel is a downside, since data sources don't normally store data
    this way, and here we are forcing them to export data as OEM objects.
    Further, there is no global schema against which queries can be posed,
    instead the objects exported by the mediator have to be queried, which
    again is probably not as natural as having a given schema. Tsimmis uses
    rule expansion to answer queries, and as an example in the paper shows,
    this can sometimes be awkward. For adding new data sources, the mediators
    themselves have to be redefined. The motivation behind Tsimmis was that
    mediators would do complex processing to integrate data from different
    sources, and I wonder whether any experimental and user studies of Tsimmis
    were done, and if this was found to be the case. However, one appealing
    feature of Tsimmis is that it handles semistructured data well.

    Overall, the most interesting aspect of the paper was that it showed how
    ideas in database theory are used in practice, in addition to showing two
    interesting mediator systems.


  • Next message: Michael Gubanov: "views"

    This archive was generated by hypermail 2.1.6 : Mon May 03 2004 - 11:01:29 PDT