System R review

From: Alexander Moshchuk (anm@cs.washington.edu)
Date: Mon May 24 2004 - 10:28:10 PDT

  • Next message: Steven Balensiefer: "System R"

    The paper describes how System R implements query optimization. I
    guess System R is to DBMS what Unix is to OS; the basic architecture
    and many ideas, especially concerning the optimizer described here,
    can still be found in today's systems.

    The paper focuses on choosing access paths (i.e. finding good plans)
    for efficient query evaluation. This optimization phase comes after
    parsing and is followed by the actual code generation and execution
    steps. The authors note that optimization doesn't introduce
    significant overhead compared to other stages. The paper presents
    several important optimization-related concepts, including the cost
    model, use of dynamic programming, and the concept of interesting orders.

    The cost model in System R relies on several components, namely on (1)
    maintaining statistics about relations and indexes such as number of
    pages in a relation, (2) using these statistics to estimate
    selectivity of predicates, and (3) formulas to estimate the actual
    execution cost in terms of both I/O cost (pages fetched) and CPU
    utilization. In the latter, there is a weighing factor W in the
    formula to supposedly adjust between CPU and I/O cost, though the
    authors don't specify how to pick W. It would have been nice to at
    least know what they have picked for their configuration. I'm
    guessing in general, the CPU utilization cost should not be as
    relevant in the cost formula today as it was back then, seeing how CPU
    speeds have been outpacing disk speeds in huge proportions.

    As we've seen and read in the other paper, System R introduced the
    dynamic programming approach for doing a sequence of joins, telling
    how to organize the search for the best plan in terms of finding the
    best join order for successively larger subsets of tables. A
    heuristic based on join predicates is used to reduce the permutations
    which are considered. Only two types of joins are considered - nested
    loops and merge-join. System R also considers "interesting orders",
    meaning that at each stage in the algorithm, in addition to the
    lowest-cost option, it also identifies and considers options that lead
    to tuple orderings which could reduce the cost of later stages in the
    execution plan.

    Overall, the optimization strategies were presented well, though I
    would have liked to see some kind of performance measurements to
    compare various optimization strategies (and unoptimized case). It's
    always nice to be able to quantitatively evaluate the benefits of
    anything that describes new optimization methods.


  • Next message: Steven Balensiefer: "System R"

    This archive was generated by hypermail 2.1.6 : Mon May 24 2004 - 10:28:15 PDT