SystemR

From: Michael Gubanov (mgubanov@cs.washington.edu)
Date: Mon May 24 2004 - 10:52:09 PDT

  • Next message: Lucas Kreger-Stickles: "Review: Acess Path Selection in a Relational DBMS"

    Paper: Access Path Selection in a Relational Database Management System,

    P. Selinger, M. Astrahan, D. Chamberlin, R. Lorie, T. Price

     

    Summary: The paper describes query optimization for single relation

    access as well as for join and nested queries evaluation used in System
    R.

     

    Main Ideas:

    - The idea to benefit from permuting join order and selecting different

    access paths for each table in SQL statement

     

    - RSS: using separate storage system for storing relations, indexes,

    maintaining locks and logs for recovery

     

    - Two ways of accessing tuples in a relation:

                - segment scan, which examines all the pages in the segment
    and returns

                only these which belong to the needed relation. Each page in
    the segment

                is touched regardless of has it needed tuples or not, but it
    is touched only once.

     

                - index scan examines only the pages which contain tuples
    from needed relation,

                but, in contrast to segment scan, pages might end up being
    touched more than once.

                If the index is clustered, than the pages are touched
    exactly once.

     

    - Detailed statistics on each table and index, stored for cost
    estimation

                - Cost estimation formula for single relation which takes
    into account

                time spent to fetch pages from disk and CPU time spent to
    return tuples.

     

                - Selectivity factors are calculated based on the statistics
    stored at the moment

                for each of the possible Boolean condition in WHERE clause

     

                - Taking into account in cost estimation tuples' order which
    is

                different for different access paths

     

    - Two access paths for joins

                - Nested loop

                - Merge join

                - N-way joins are represented as a sequence of two-way joins

     

    - The order in which relations are joined matters

                - To reduce the number of permutations to try, the
    heuristics is presented

                which is about evaluating predicates as early as possible.

                After applying the heuristics, the tree of of possible
    solutions is constructed

                to find the cheapest one easily.

     

    - Cost formulas for nested loop and merge joins are the same, but the
    difference

    is that merge join benefits if the inner relation is sorted. This
    significantly

    reduces the cost.

     

    - Evaluation of nested subquery only once if it is not a correlated
    subquery.

    Correlated subquery could be evaluated conditionally if the referenced
    relation

    is ordered on the referenced attribute. This allows the test of whether
    the

    current referenced value is the same as the one in the previous
    candidate tuple.

    This could reduce the total number of correlated query re-evaluation if
    the previous

    result can be reused.

     

    Relevance:

    This is very interesting and amazingly easy to read paper, despite the
    fact

    it covers many query optimization ideas which are widely used today in
    any of

    contemporary relational engines.


  • Next message: Lucas Kreger-Stickles: "Review: Acess Path Selection in a Relational DBMS"

    This archive was generated by hypermail 2.1.6 : Mon May 24 2004 - 10:52:10 PDT