Information Integration Using Logical Views

From: Charles Giefer (cgiefer@comcast.net)
Date: Mon May 03 2004 - 02:35:50 PDT

  • Next message: Aaron Chang: "review 5"

    Information Integration Using Logical Views
    Jeffrey D. Ullman

    This paper describes some theory behind conjunctive queries and containment.
    Then it uses this foundation to describe views, which are sets of
    conjunctive queries that describe a set of data. Finally, the paper
    explores two different implementations that use views to satisfy queries
    over several views, which can represent a diverse set of data sources.

    One of the major results in the theory section was that equivalence is
    easiest determined by showing that each query is contained in the other.
    Therefore, queries can be simplified and proven to be equivalent using this
    check.

    Another interesting fact that was reoccurring throughout the paper was that
    many of the algorithms required exponential time in the general case, but
    linear time in the common case. This result seems important to the
    effectiveness of these applications. It would be interesting to see what
    types of queries do not fall into the common case linear-time set. Similar
    tradeoffs exist in processor architecture where modifications are made to
    increase the average case performance, but may lead to a worse worst-case
    performance.

    While both implementations, Information Manifold and Tsimmis seemed like
    realistic wrappers/mediators for the integration of data among several
    sources, there is one design choice that clearly seemed better in Tsmmis
    rather than IM. Tsimmis was able to handle semi-structured data better. It
    would return results where not all fields were available. This would seem
    to make it more applicable to current web processing such as XML. Also,
    this seems like it would be necessary in a highly distributed system, where
    reliability varies for each data source.

    The idea to have recursive generation of an infinite number of views seemed
    a bit far-fetched although clearly desirable. Being limited to any fixed
    schema leads to performance tradeoffs when considering the number of joins
    over different queries. In this application, their "recursive programs to
    generate infinite families of views" seems like a long-winded way of saying
    "dynamic generation of views" instead of being attached to a static set of
    views. Again, this appears to be a case where the common case would benefit
    from recursive definitions.

    There are a few things that I found confusing in this paper. First, I
    didn't completely understand the proof of why the containment test works.
    The key seems to be that you need a canonical database to use to test each
    query. It appears that the canonical databases are built to define
    different cases in which the output may vary. I didn't quite understand how
    to build these canonical databases. Second, they referred to Pi^p_2, which
    I am not familiar with. It seems to be a complexity class, but its meaning
    is unclear to me.


  • Next message: Aaron Chang: "review 5"

    This archive was generated by hypermail 2.1.6 : Mon May 03 2004 - 02:36:05 PDT