From: Bhushan Mandhani (bhushan@cs.washington.edu)
Date: Mon May 03 2004 - 11:01:27 PDT
Information Integration Using Logical Views
- Jeff Ullman
The paper starts out by describing CQ's and tests for checking their
containment, including the cases when they are extended thru negated
subgoals and comparison operators. It then describes query answering using
views. The views themselves are CQ's. I was surprised that so much space
was devoted to this review of known theory. It then looks at information
integration, which is identified as one of the important current problems,
and goes on to describe two mediator-based data integration systems: IM
and Tsimmis.
IM has a global schema in terms of which queries are expressed. Each data
source publishes one or more views, again defined using this global
schema. IM answers queries using these views. Although this query
rewriting problem is NP-complete, queries are typically short enuf for
this to not be a problem. I thought IM had some appealing features. Having
a global schema certainly makes it easy to pose queries to the system.
Adding new data sources is convenient in IM, the newly added source just
needs to export appropriate views. The idea of rewriting queries to answer
them using views is elegant, and brings out the great practical utility of
these theoretical ideas. I guess one downside is that the data sources are
restricted, since they are forced to export views over this fixed schema.
Tsimmis has a completely different architecture, with a hierarchy of
interacting mediators and wrappers. Data is exchanged in the OEM model,
which I feel is a downside, since data sources don't normally store data
this way, and here we are forcing them to export data as OEM objects.
Further, there is no global schema against which queries can be posed,
instead the objects exported by the mediator have to be queried, which
again is probably not as natural as having a given schema. Tsimmis uses
rule expansion to answer queries, and as an example in the paper shows,
this can sometimes be awkward. For adding new data sources, the mediators
themselves have to be redefined. The motivation behind Tsimmis was that
mediators would do complex processing to integrate data from different
sources, and I wonder whether any experimental and user studies of Tsimmis
were done, and if this was found to be the case. However, one appealing
feature of Tsimmis is that it handles semistructured data well.
Overall, the most interesting aspect of the paper was that it showed how
ideas in database theory are used in practice, in addition to showing two
interesting mediator systems.
This archive was generated by hypermail 2.1.6 : Mon May 03 2004 - 11:01:29 PDT