views

From: Michael Gubanov (mgubanov@cs.washington.edu)
Date: Mon May 03 2004 - 11:17:10 PDT

  • Next message: Neva Cherniavsky: "Information Integration Using Logical Views"

    Paper: Information Integration Using Logical Views

    Summary: The paper first reviews the theoretical background under
    conjunctive query (CQ), Datalog reformulation using views and second
    discusses the differences
    in query reformulation in two research prototypes "Information Manifold"
    and "Tsimmis"

    Main Ideas:
    - Review of CQ and Datalog definitions and complexity analysis
    of pure CQ's, CQ's with negation, CQ's with Arithmetic Comparisons,
    and Datalog programs containment criteria

    - Algorithms to construct a query using views and upper-bound-theorems
    (Halevy, Rajaraman)
    on the number of subgoals in minimal reformulated query. This provides a
    useful criteria
    which indicates that the analyzed query is indeed minimal. And, it
    is used as a signal to stop checking new queries and reformulating them
    to see if the current one works and if it is minimal.

    - Information Manifold (IM) is the AT&T prototype which uses a fragment of
    first-order logic to describe view rules and described by the theorems
    "exponentially bounded query search". It results in the CQ (or union of
    CQ's)
    closest to original query, but still not equivalent and being contained in
    it.

    - Tsimmis uses Mediator Specification Language(MSL) to define
    mediator-schema and
    rule expansion at the mediator to resolve the query against it. Some
    artifacts like empty answers if inconvenient objects are exported.
    However, this approach has the advantage of being able to perform some
    complex data transformation within the mediator to produce export objects,
    therefore being more powerful than IM.
    In addition, MSL language has been designed to deal with the lack of schema
    which makes it suitable for use with semi-structured data.

    The disadvantage of Tsimmis approach is in inflexibility to changes in
    underlying schemas because
    mediator administrator should first figure out how to better integrate it
    into
    the mediator and recompile MSL after it.
    In contrast, for IM adding of new sources is seamless. No change to query
    processing
    algorithm is needed. The newly defined views will be used whenever they are
    appropriate.

    Relevance:
    This is very interesting paper as it addresses an important problem of data
    integration which is still unsolved today. It is still the subject of the
    ongoing
    research and spawned many start-ups (Nimble, Lixto, etc)


  • Next message: Neva Cherniavsky: "Information Integration Using Logical Views"

    This archive was generated by hypermail 2.1.6 : Mon May 03 2004 - 11:17:16 PDT