From: CR (chrisre@cs.washington.edu)
Date: Mon May 24 2004 - 09:56:05 PDT
I liked many of their overview points about the changes in database
systems. The importance of rewrites and semi-join techniques in modern
DBs. The overview made it seem natural that cost based rewrites would
provide significant advantages. I would have liked more on the
estimation of statistics.
I am not sure how I feel about the idea of user defined functions and a
universal optimizer. It is not clear what information is needed to
correctly optimize in these extremely extensible settings. The idea of
semantic query optimization seems to require a lot more work. For
example you can imagine query operators that interact with visualization
servers to provide only the requisite level of detail. Expressing these
types of selectivity or sampling criteria seems to be a broad challenge.
Perhaps I missed something but, there seems to be a typo in their query
rewrite. Also, the query rewrite they give does not seem to be correct
unless Dept.name is a key. They say dept# is the key for dept but, not
name. If it is not a key then the query does not preserve output
duplicates. Also the second having is not legal since
dept.num-of-machines is not implied by any key and not group by’d. Also
if 0 = count(*) did not satisfy the predicate then the LOJ is
unnecessary. They seem to cite kim’s paper with the bug (and the
correction) but only make note of the LOJ fix. Very minor issues I guess
– more evidence that “It is especially tricky to preserve duplicates and
nulls”.
This archive was generated by hypermail 2.1.6 : Mon May 24 2004 - 09:56:12 PDT