From: Bhushan Mandhani (bhushan@cs.washington.edu)
Date: Tue May 25 2004 - 03:28:02 PDT
An Overview of Query Optimization
Query optimization is the problem of choosing an efficient execution plan
for a query. This paper gives a high-level but insightful picture of this
topic. It identifies the search space of plans, the method for plan cost
estimation, and the algorithm for enumeration of plans as the three key
components making up a query optimizer.
It starts out by describing the System R optimizer, and identifies the
inability to expand its search space as a weakness. It then demonstrates
the big size of the search space by showing a variety of transformations
that can be applied to optimize SQL queries. These include choice of join
order, pushing group-by's ahead of joins, flattening a nested SQL query
into a single block query, etc. Finally, it shows semijoin-like methods,
in which selection predicates are moved from one query block to another.
It briefly describes how cost estimation of plans is done, what statistics
are maintained, etc. It also defines the idea of an extensible optimizer,
and gives an overview of two different systems with an extensible
optimizer: Starburst and Volcano. It concludes with a mention of
specialized optimization settings such as distributed databases, queries
with user-defined functions as predicates, optimization in the presence of
materialized views, etc.
The paper is short, and probably does not even reveal the tip of the
iceberg. Moreover, it gives a very high-level picture, and doesn't make an
effort to give a formal framework for query optimization, which is
disappointing. However, it does give a flavor of the topic, and has
nice motivating examples.
This archive was generated by hypermail 2.1.6 : Tue May 25 2004 - 03:28:02 PDT