Qpt

From: Michael Gubanov (mgubanov@cs.washington.edu)
Date: Mon May 24 2004 - 10:54:41 PDT

  • Next message: Steven Balensiefer: "Optimizations"

    Paper: An Overview of Query optimization in Relational Systems, S.
    Chaudhuri

     

    Summary: The paper presents a review of main query optimization
    techniques

    known at the time starting from these introduced by System R.

    In addition, some ideas employed by distributed databases

    and stored procedures optimizations are presented.

     

    Main Ideas:

    - A desirable optimizer is the one with low cost plans in search space,

    accurate costing technique and efficient enumeration algorithm.

    This definition is a cornerstone of the article as all described
    techniques

    are evaluated in respect to this definition.

     

    - An overview of main ideas in System R optimizer (see previous review)

     

    - The idea the optimizer might want to use several representations of

    a query during the lifecycle of optimizing a query

                - Query graph is an alternative representation, but it has
    its own

                drawbacks. It can not represent unions, outer joins and it
    does not

                capture the fact that SQL statements could be nested.

                In Starburst, multigraph queries are represented as a set of
    subgraphs

                with edges among subgraphs which represent predicates across
    query blocks.

                In contrast, Exodus uses query trees on all optimization
    stages.

     

    - Bushy trees are usually not considered ny contemporary optimizers as

    they expand the search space considerably and therefore its cost of
    enumerating.

    Deferring Cartesian products could results in poor performance in some
    cases (! OLAP).

    These features could be used adaptively on per-query basis in extensible
    optimizers.

     

    - LOJ and join do not commute freely, however the asymmetric
    associativity applies

    and can be used for some class of queries.

     

    - Pushing down group by clauses (and leave them on the top as well)

    can significantly reduce the bottom-top data flow and therefore improve

    overall performance.

     

    - Reducing multi-block queries into single-block by unrolling views,

    unnesting correlated subqueries and benefiting from materialized views

    (if there are any) are other possible optimizations which could be used

    for some classes of queries.

     

    - Histograms on a column as a way to provide statistics on the single
    column.

    It does not capture the correlations between columns, which is an
    important

    fact for latter optimization. It could have been used for estimating
    selectivity

    factor on other columns while already having it for a one.

    To address this problem many systems use the selectivity of the most
    selective

    predicate and try to identify potential correlation.

     

    - Short description and comparison of optimizations architecture

    in Starburst and Exodus's successors.

     

    - Some ideas for stored procedures, materialized views optimizations.

    Using semi-joins for distributed query optimization.

     

    Relevance:

    I think it is very interesting review of contemporary state of query
    optimization.

    But, still, the core ideas which have been proposed by System R
    optimizer

    are the most influential


  • Next message: Steven Balensiefer: "Optimizations"

    This archive was generated by hypermail 2.1.6 : Mon May 24 2004 - 10:54:40 PDT