From: Michael Gubanov (mgubanov@cs.washington.edu)
Date: Mon May 03 2004 - 11:17:10 PDT
Paper: Information Integration Using Logical Views
Summary: The paper first reviews the theoretical background under
conjunctive query (CQ), Datalog reformulation using views and second
discusses the differences
in query reformulation in two research prototypes "Information Manifold"
and "Tsimmis"
Main Ideas:
- Review of CQ and Datalog definitions and complexity analysis
of pure CQ's, CQ's with negation, CQ's with Arithmetic Comparisons,
and Datalog programs containment criteria
- Algorithms to construct a query using views and upper-bound-theorems
(Halevy, Rajaraman)
on the number of subgoals in minimal reformulated query. This provides a
useful criteria
which indicates that the analyzed query is indeed minimal. And, it
is used as a signal to stop checking new queries and reformulating them
to see if the current one works and if it is minimal.
- Information Manifold (IM) is the AT&T prototype which uses a fragment of
first-order logic to describe view rules and described by the theorems
"exponentially bounded query search". It results in the CQ (or union of
CQ's)
closest to original query, but still not equivalent and being contained in
it.
- Tsimmis uses Mediator Specification Language(MSL) to define
mediator-schema and
rule expansion at the mediator to resolve the query against it. Some
artifacts like empty answers if inconvenient objects are exported.
However, this approach has the advantage of being able to perform some
complex data transformation within the mediator to produce export objects,
therefore being more powerful than IM.
In addition, MSL language has been designed to deal with the lack of schema
which makes it suitable for use with semi-structured data.
The disadvantage of Tsimmis approach is in inflexibility to changes in
underlying schemas because
mediator administrator should first figure out how to better integrate it
into
the mediator and recompile MSL after it.
In contrast, for IM adding of new sources is seamless. No change to query
processing
algorithm is needed. The newly defined views will be used whenever they are
appropriate.
Relevance:
This is very interesting paper as it addresses an important problem of data
integration which is still unsolved today. It is still the subject of the
ongoing
research and spawned many start-ups (Nimble, Lixto, etc)
This archive was generated by hypermail 2.1.6 : Mon May 03 2004 - 11:17:16 PDT