From: Neva Cherniavsky (nchernia@cs.washington.edu)
Date: Mon May 24 2004 - 08:33:01 PDT
This paper describes query optimization in System R, an experimental
database management system built at IBM in the late '70s. The authors
discuss in detail the issues surrounding access path selection and
describe a new method for choosing optimal access paths that takes into
account the cost of CPU utilization as well as the cost of I/O operations.
Furthermore they describe how to do joins in an optimal manner, using
statistics from the tables and keeping things in "interesting order".
At first I was unsure why this paper didn't use dynamic programming for
the joins. From reading the second paper, it sounds like this paper did
use dynamic programming but didn't mention it by name. In fact, this
paper is rather familiar to us from class, because the concepts introduced
here are still very important to database optimization. The main points
that are still relevant today are:
1) Dynamic programming. Though not directly mentioned by name, the
authors go to some length discussing a "heuristic" that reduces the join
order permutations and constructs a tree of possible solutions. The
sub-solutions are optimal for the smaller joins, allowing some
sub-solutions to be discarded (and never considered again). Not all
sub-solutions are discarded though, because of:
2) Interesting orders. Sometimes a subquery may not be optimal for its
subproblem, but it returns a list in sorted order. In this case, we would
not want to discard this choice, because it may make the next join much
less expensive. This idea of saving possibilities that are not locally
optimal but may be globally optimal is used extensively in modern
optimizers, according to the second paper.
I'm not sure if the CPU utilization metric ended up being a significant
contribution. The authors thought it was important, but we don't study
the cost of the CPU operations today, mainly because processors have
gotten faster and faster while I/O operations remain a bottleneck.
This archive was generated by hypermail 2.1.6 : Mon May 24 2004 - 08:33:01 PDT