Review : An overview of QO in Relational Systems

From: Joe Xavier (joexav@microsoft.com)
Date: Tue May 25 2004 - 16:17:59 PDT

  • Next message: Educational Sales: "Adobe @ up to 60% OFF for Students, Schools, Teachers, Staff 28198"

    Surajit Chaudhuri - An overview of QO in Relational Systems

    This is a survey paper covering a number of aspects of query
    optimization.

    The three main concepts identified to solve the problem of query
    optimization are:
    - a search space of possible plans
    - techniques to extimate the cost of various plans
    - an algorithm to search through the execution space

    The interesting ideas in the section on the System-R optimizer are:
    - the search space for select-projection-join queries corressponds to
    the linear sequence of join operators
    - the cost model relies on
      1. statistics on indexes and relations
      2. estimation of the selectivity of predicates
      3. estimation of CPU and I/O costs of query execution for each
    operator
    - the use of dynamic programming and interesting order
     dynamic programming - reduce the search space. select only left linear
    plans. reduces to O(2**n-1) from O(n!)
     interesting order - generalized to physical order - characteristic of a
    plan that is not shared by all plans for the same logical expression but
    can impace the cost of subsequent operations. Can handle violation of
    the principle of optimality in the cost model.

    Interesting ideas in the Search Space section:
     - query graph model (used in the Startburst system) - nodes are
    relations and edges are join predicates between relations
     - bushy joins require materialization of intermediate results
     - Join(R, S LOJ T) = Join(R,S) LOJ T => joins can be freely reordered
    among themselves since the sequence of joins and outerjoins do not
    freely commute
     - specific cases when the GROUP BY can be pushed down without arbitrary
    side-effects
     - correlated query blocks (covered in more detail in the Accesss Paths
    paper)

    Interesting ideas in the Statistics and Cost Estimation section:
     - the estimation framework used in System-R
     - histograms on base data
    - sampling for statistics

    Enumeration Architectures:
     - Starburst
       Query Graph Model representation of the SQL query
    - Volcano/Cascades

    It would have been nice to have seen some more detail in the section on
    Materialized Views

    Overall it's a great paper to touch upon the areas of interest in the
    topic but doesn't provide an in-depth treatment.


  • Next message: Educational Sales: "Adobe @ up to 60% OFF for Students, Schools, Teachers, Staff 28198"

    This archive was generated by hypermail 2.1.6 : Tue May 25 2004 - 16:18:02 PDT