From: Atri Rudra (atri@cs.washington.edu)
Date: Mon May 03 2004 - 01:54:42 PDT
Information Integration Using Logical Views
-------------------------------------------
J.D.Ullman
The first part of the paper gives the theoretical results for the
containment of Conjunctive queries. A conjunctive query (CQ) is contained
in another if for every database instance the result of applying the
first query to the instance is contained within the result of applying the
second query. The author considers various CQs (the plain vanilla one, CQ
with negation, CQ with arithmetic comparisons and Datalog programs) and
gives various hardness results. The plain CQ the problem is
NP-complete in it's generality but is tractable for a common case: I am
curious to know if there are some analogous results for the CQ with
negation (where the general problem in \pi_2 P).
The author then gives an application of the theory viz given a query over
views of a database find an equivalent query (two CQs are equivalent if
they are contained in each other) over the database. This approach forms
the formal basis of the Information Manifold which is an Information
Integration System (where the problem is given data sources in different
formats give one "consistent" view to the user: the last place where I
heard about this problem was in the Computation Biology scenario where
there are many legacy data sources for the genome sequences and
information integration is an important problem there).
An alternate system considered by the author is Tsimmis where the model is
to consider the entries as objects and use wrappers around each data
source and then provide a view in a hierarchical manner (using mediators).
The various components talk to each other using a data model called OEM
and a query language called MSL. The scheme uses semi-structured data and
the whole comparison between IM and Tsimmis is very reminiscent of the
comparison between relational databases and XML (including the converting
of MSL to Datalog which is of the same flavor as the Shanmugasundaram et.
al. paper). One place where this analogy seems to differ is the fact that
including new data sources seems to require less changes in the existing
framework in IM than in Tsimmis while (atleast to the extent that I
understand it) changing the XML schema seems to entail lesser amount of
changes than if one changes a relational schema.
Interesting read though it would have been nice if the outline of the
paper was spelt out before the Introduction (though I guess Ullman being
Ullman can get away with that :-)]
This archive was generated by hypermail 2.1.6 : Mon May 03 2004 - 01:54:44 PDT