An Overview of Query Optimization in Relational Systems

From: Neva Cherniavsky (nchernia@cs.washington.edu)
Date: Mon May 24 2004 - 09:16:33 PDT

  • Next message: Danny Wyatt: "Overview of Query Optimization"

    This paper is a survey of query optimization techniques. First, the
    contributions of System R from IBM is discussed (I looked at this in the
    last review). The author then describes the search space for
    optimization. He briefly discusses bushy trees and deferring Cartesian
    products, which he then says are computationally expensive and hard to
    quantify. I wondered if this is still true today; for instance, it seems
    like bushy trees still receive a lot of attention. He then discusses
    outerjoins and group-bys. There doesn't seem to be a lot of interesting
    material here.

    The next area of discussion is about "flattening" queries, which will
    obviously speed things up by not creating temporary tables that need to be
    written and read (in the usual case that they are larger than main
    memory). This seems to be an important area of research, but also a very
    complex one; there are lots of cases when it is simply too difficult to
    flatten the query. It seems like it would be hard to automate this
    process, since there are lots of special cases (DISTINCT keyword,
    correlated queries, aggregates). A solution is the use of partial views;
    it seems from the paper that this is a very recent research area, and
    sounds promising (though there is a tradeoff between the cost of computing
    the views and the use of the views to reduce computation).

    The author then describes different ways of determining the cost of
    operations. This is very important to database optimizers, because an
    estimation of the cost is what determines which actions take place and in
    what order. Different statistical methods are described and there is a
    brief description of the overall cost computation.

    Next, two specific enumeration architectures for extensible optimizers are
    discussed, Starburst and Volcano/Cascades. My favoroite was
    Volcano/Cascades, because it used algebraic expressions and the top-down
    dynamic programming technique sounded neat. However, I would have liked
    more detail about both of these systems (if they were to be discussed at
    all, more information would have been better).

    Lastly, the author discussed some interesting new directions of database
    optimizer research, including parallel databases, user-defined function,
    and materialized views. Parallelization seems especially useful for
    databases, and I'm curious if they've become widespread commercially. I'd
    also like to see a study of the quantifiable advantages of parallel vs.
    single processor databases - do the advantages of reduced response time
    override the disadvantages of processor communication?


  • Next message: Danny Wyatt: "Overview of Query Optimization"

    This archive was generated by hypermail 2.1.6 : Mon May 24 2004 - 09:16:33 PDT