An overview of query optimization in relation systems

From: CR (chrisre@cs.washington.edu)
Date: Mon May 24 2004 - 09:56:05 PDT

  • Next message: Ankur Jain: "Review 9"

    I liked many of their overview points about the changes in database
    systems. The importance of rewrites and semi-join techniques in modern
    DBs. The overview made it seem natural that cost based rewrites would
    provide significant advantages. I would have liked more on the
    estimation of statistics.

    I am not sure how I feel about the idea of user defined functions and a
    universal optimizer. It is not clear what information is needed to
    correctly optimize in these extremely extensible settings. The idea of
    semantic query optimization seems to require a lot more work. For
    example you can imagine query operators that interact with visualization
    servers to provide only the requisite level of detail. Expressing these
    types of selectivity or sampling criteria seems to be a broad challenge.

    Perhaps I missed something but, there seems to be a typo in their query
    rewrite. Also, the query rewrite they give does not seem to be correct
    unless Dept.name is a key. They say dept# is the key for dept but, not
    name. If it is not a key then the query does not preserve output
    duplicates. Also the second having is not legal since
    dept.num-of-machines is not implied by any key and not group by’d. Also
    if 0 = count(*) did not satisfy the predicate then the LOJ is
    unnecessary. They seem to cite kim’s paper with the bug (and the
    correction) but only make note of the LOJ fix. Very minor issues I guess
    – more evidence that “It is especially tricky to preserve duplicates and
    nulls”.


  • Next message: Ankur Jain: "Review 9"

    This archive was generated by hypermail 2.1.6 : Mon May 24 2004 - 09:56:12 PDT