Access Path Selection in a Relational Database Management System

From: Stavan Parikh (stavan@cs.washington.edu)
Date: Sun May 23 2004 - 18:29:54 PDT

  • Next message: Atri Rudra: "Review #8"

    Selinger et al. present how System R chooses access paths for both simple and complex queries.
     
    Here the authors focus on the optimization phase of query processing. They discuss both a nested-loop join and merge-join mechanisms and present cost metrics for choosing between plans. From the age of the paper it seems they were the first to propose some cool techniques of query processing
     
    First the proposed using pre-generated statistics of the data to estimate the cost of query processing. Specifically they present a wide variety of different cost metrics that take into account both I/O and cpu utilization. It was interesting to note that the stats are not maintained dynamically as the table is updated but a manual run is required to update them. While the overhead of dynamic maintenance for every update would be a high a periodic update might make sense.
     
    Second they present how one searches through a search space of different plans for the optimal one. They present an idea of following "interesting" plans, which are plans that tend to output the result in a useful order. This seems like a powerful idea. With the use of an example they show how once can rule of partial plans and end up choosing a good access plan.
     
    In conclusion the paper proposes some good ideas about choosing a query processing plan. However, in the conclusion they claim that even though the cost estimation is not accurate, the true optimal path is selected in majority of the cases. While this may be true, they present no evidence to back it up. Also I wonder how much cpu utilization really factors with today's processing power?
     
     


  • Next message: Atri Rudra: "Review #8"

    This archive was generated by hypermail 2.1.6 : Sun May 23 2004 - 18:29:54 PDT