Information Integration using Logical Views

From: Steven Balensiefer (alaska@cs.washington.edu)
Date: Mon May 03 2004 - 00:18:57 PDT

  • Next message: Atri Rudra: "Review #5"

     This paper discusses the theory behind answering queries by composing
    information from distinct sources presenting logical views. In addition to
    the theory there's also a discussion of two different systems that seek to
    provide this kind of integration.

     The first section was an overview of the containment in conjunctive
    queries as well as the translation between conjunctive queries in datalog.
    I thought the examples were very helpful, and appreciated the treatment of
    the material, but it was all results from prior work so I'm not going to
    spend time talking about it.

     I thought the key idea of the paper was that the subgoals in any datalog
    query must be covered by logical views, and that only views needed to
    cover the subgoals should be used. I was disappointed that the theorems
    about minimal size excluded arithmetic comparisions and negations. From
    the material in the first part, I gathered that including those things
    greatly complicates the containment calculations. Even so, limiting the
    queries to only positive conjunctions seems to be a major constraint. I
    can't claim to know it's impact on the descriptive capability of the
    language, but from a programmers standpoint, restating everything would
    appear to either require additional global predicates, or a number of
    unintuitive rewrites.

     When it came time to actually describe the process of integrating this
    information and querying the "mediators" that coalesce the various views,
    the earlier material on containment all made sense. The information
    Manifold approach seems like a more direct approach, and Ullman says that
    it relies on the basic minimalization technique presented in the minimal
    solution theorems.

     In contrast, the Tsimmis approach has the mediator export objects that
    provide access to data contained in the views from the data sources.
    Though Ullman showed an example where removing all access to the
    underlying views was a mistake, he was quick to note that it was a
    contrived example. The key to this approach, in my mind was ability to
    handle semi-structured data (XML anyone?) and the way it dealt with the
    presence or absence of subobjects.

     One of the drawbacks to the whole Tsimmis approach was the requirement
    for the correct exported data from the mediator, something that could
    easily change based on the work-load. In commercial database systems there
    exist a vast array of tuning "knobs" and this would simply be adding to
    that number.

     I'd expect that both of these methods for dealing with this problem
    provided good ideas and even starting points for current research on the
    information integration process. I think it's fair to say that this is a
    hugely important area and that a major breakthrough in integrating
    varieties of different data sources will have major applications in all
    fields from military operations to financial-planning to internet rumor
    mills.
     


  • Next message: Atri Rudra: "Review #5"

    This archive was generated by hypermail 2.1.6 : Mon May 03 2004 - 00:18:57 PDT