Access Path Selection in a Relational Database Management System

From: Neva Cherniavsky (nchernia@cs.washington.edu)
Date: Mon May 24 2004 - 08:33:01 PDT

  • Next message: Danny Wyatt: "System R Access Path Selection"

    This paper describes query optimization in System R, an experimental
    database management system built at IBM in the late '70s. The authors
    discuss in detail the issues surrounding access path selection and
    describe a new method for choosing optimal access paths that takes into
    account the cost of CPU utilization as well as the cost of I/O operations.
    Furthermore they describe how to do joins in an optimal manner, using
    statistics from the tables and keeping things in "interesting order".

    At first I was unsure why this paper didn't use dynamic programming for
    the joins. From reading the second paper, it sounds like this paper did
    use dynamic programming but didn't mention it by name. In fact, this
    paper is rather familiar to us from class, because the concepts introduced
    here are still very important to database optimization. The main points
    that are still relevant today are:

    1) Dynamic programming. Though not directly mentioned by name, the
    authors go to some length discussing a "heuristic" that reduces the join
    order permutations and constructs a tree of possible solutions. The
    sub-solutions are optimal for the smaller joins, allowing some
    sub-solutions to be discarded (and never considered again). Not all
    sub-solutions are discarded though, because of:

    2) Interesting orders. Sometimes a subquery may not be optimal for its
    subproblem, but it returns a list in sorted order. In this case, we would
    not want to discard this choice, because it may make the next join much
    less expensive. This idea of saving possibilities that are not locally
    optimal but may be globally optimal is used extensively in modern
    optimizers, according to the second paper.

    I'm not sure if the CPU utilization metric ended up being a significant
    contribution. The authors thought it was important, but we don't study
    the cost of the CPU operations today, mainly because processors have
    gotten faster and faster while I/O operations remain a bottleneck.


  • Next message: Danny Wyatt: "System R Access Path Selection"

    This archive was generated by hypermail 2.1.6 : Mon May 24 2004 - 08:33:01 PDT