From: Li Yan (lanti@u.washington.edu)
Date: Mon May 24 2004 - 02:54:37 PDT
This paper gives a recent review on query optimization of
SQL queries.
The first part gives an overview on System R approaches,
with highlight on dynamic programming and interesting
ordering. it also points a few limitations in System R's
approach, and extensions follows in subsequent sections.
The author defines query optimization as a search problem,
with three issues to be addressed, search space, cost
estimation and enumeration algorithm.
Algebraic transformation of original query parse tree were
discussed and warned that transformation does not
necessarily yield better plans. The two limitations of
System R, discarding bushy trees and delaying Cartesian
products were pointed that doing so may end up with
sub optimal plans. It is reasonable to expect bushy tree
perform well in some context, but the cost to keep bushy
tree in search space is too expensive. But I am not sure why
delaying Cartesian product is not a good idea in star join,
and in particular, why is the example in OLAP justifies
Cartesian product, since the whole point to get a good join
plan is to keep intermediate results small, or is there
something else?
It is interesting to see the technique using views to
optimize multi-block queries, which suggests another
possible plan by introducing views for intermediate results.
Statistics of database are central to cost estimation, but
the problem is to get it precise and
up-to-date. Unfortunately most of the time histogram is
probably the best you can expect.
In the context of distributed/parallel database, the cost of
communication should be taken into account in cost
estimation also, and parallelism presents another
interesting topic.
This archive was generated by hypermail 2.1.6 : Mon May 24 2004 - 02:54:43 PDT