An Overview of Query Optimization

From: Charles Giefer (cgiefer@comcast.net)
Date: Mon May 24 2004 - 02:26:20 PDT

  • Next message: Li Yan: "review 9"

    An Overview of Query Optimization in Relational Systems
    Surajit Chaudhuri

    This paper was a general overview of optimizing queries. It explained much
    of what System R did in contemporary language. It also showed the current
    direction of query optimization and the difficulties associated with good
    optimizers.

    Normal joins seem like the easiest class of physical operations to optimize.
    This is because they are associative and commutative. The problem gets more
    complicated when aggregates, user-defined functions, other types of joins,
    and nested queries and views are introduced.

    There are a few interesting details I discovered in this paper. First, I
    never thought about parallel execution in quite the same terms as defined in
    this paper. They describe that the goal of parallel execution is to reduce
    response. Sequential systems, on the other hand, focus on reducing total
    work. This is entirely true and in terms of the cost models addressed in
    this paper, a different cost analysis is needed.

    Another interesting topic brought up in the paper was unnesting the
    structure of an SQL query. I wonder if it is as simple as applying specific
    rules to the query to rewrite it as a single level, or if it requires a more
    intense and sometimes custom examination.

    Also, I don't understand completely why bushy paths are avoided. To me it
    seemed like the only reason was because it avoided temporary tables. Are
    there ever reasons a bushy table would be preferred and the cost of the
    temporary tables would be offset?

    The user-defined functions was the final area that peaked my interest. The
    example proposed (searching images stored in BLOBs) seemed like a very
    applicable usage. For instance, this would occur if I'm matching parts of a
    fingerprint in a database of fingerprint data and images. It is difficult
    to characterize the cost model of this user-defined function. But, since it
    is user-defined, shouldn't the user be able to give some sort of hint of how
    to estimate its cost? Also, it should probably state the associativity and
    commutativity of the function. This would be part of integrating the
    function into the database query execution.

    While it was a good paper, it seemed to reiterate much of what was said in
    the System R paper. The best part of this paper was its focus on today's
    research and relevant topics. It avoided a lot of boring details of query
    optimization but made up for this space by providing a fair amount of fluff.


  • Next message: Li Yan: "review 9"

    This archive was generated by hypermail 2.1.6 : Mon May 24 2004 - 02:26:34 PDT