From: Charles Giefer (cgiefer@comcast.net)
Date: Mon May 03 2004 - 02:35:50 PDT
Information Integration Using Logical Views
Jeffrey D. Ullman
This paper describes some theory behind conjunctive queries and containment.
Then it uses this foundation to describe views, which are sets of
conjunctive queries that describe a set of data. Finally, the paper
explores two different implementations that use views to satisfy queries
over several views, which can represent a diverse set of data sources.
One of the major results in the theory section was that equivalence is
easiest determined by showing that each query is contained in the other.
Therefore, queries can be simplified and proven to be equivalent using this
check.
Another interesting fact that was reoccurring throughout the paper was that
many of the algorithms required exponential time in the general case, but
linear time in the common case. This result seems important to the
effectiveness of these applications. It would be interesting to see what
types of queries do not fall into the common case linear-time set. Similar
tradeoffs exist in processor architecture where modifications are made to
increase the average case performance, but may lead to a worse worst-case
performance.
While both implementations, Information Manifold and Tsimmis seemed like
realistic wrappers/mediators for the integration of data among several
sources, there is one design choice that clearly seemed better in Tsmmis
rather than IM. Tsimmis was able to handle semi-structured data better. It
would return results where not all fields were available. This would seem
to make it more applicable to current web processing such as XML. Also,
this seems like it would be necessary in a highly distributed system, where
reliability varies for each data source.
The idea to have recursive generation of an infinite number of views seemed
a bit far-fetched although clearly desirable. Being limited to any fixed
schema leads to performance tradeoffs when considering the number of joins
over different queries. In this application, their "recursive programs to
generate infinite families of views" seems like a long-winded way of saying
"dynamic generation of views" instead of being attached to a static set of
views. Again, this appears to be a case where the common case would benefit
from recursive definitions.
There are a few things that I found confusing in this paper. First, I
didn't completely understand the proof of why the containment test works.
The key seems to be that you need a canonical database to use to test each
query. It appears that the canonical databases are built to define
different cases in which the output may vary. I didn't quite understand how
to build these canonical databases. Second, they referred to Pi^p_2, which
I am not familiar with. It seems to be a complexity class, but its meaning
is unclear to me.
This archive was generated by hypermail 2.1.6 : Mon May 03 2004 - 02:36:05 PDT