From: Lucas Kreger-Stickles (lucasks@cs.washington.edu)
Date: Mon May 24 2004 - 12:08:55 PDT
In this paper the authors intend to provide an overview of the
techniques employed in optimizing querries for relational databases.
They begin by indicating querries must be transformed into phsical
plans, but that many such plans exist with widely differeng costs.
Therefore, in order to constuct optimal plans one must be able to know
and enumerate the space of plans which properly execute the given querry
and have a way to estimate the cost of various concrete plans.
The author then explors the contributions of system R, a relational
database that was one of the first to employ many of the still prevelent
techniques used today in querry optimization. This system used
statistics regarding realations and indexes, formulas to estimate the
output of various operations, and formulas to estimate CPU costs
associated with every operator to estimate the relative costs of various
querries. One of the nice things about System R is its use of dynamic
programming techniques such that subplans can be used to create optimal
plans of greater size (and furthermore can be re-used). The author
indicates that despitre many of the groundbreaking features of System R,
its focus on join ordering limited its feasibility in other ares of
importance to querry optimization.
Next, the author discusses the search space of various querries. One of
the interesting comments in this section is that different plans for
querry exectuion are not necessariuly better, and thus search must be
constrained in such a way as to not reduce the benifit of optimization.
The author then goes on to discuss techniques for reducing multi-block
querreis to querries over a single block and some of the benefits/limits
of these approaches.
Next, the author discusses means by which statistics and models can be
used for cost estimation of plans. One observation here is that cost
data must be efficent since it is used quite frequently. Another
interesting point to note is that the statisical summarty of a plan is a
result of the tables and fields in that plan and is therefore a logical
property (being the same for all plans which execute that querry),
however, costs of a plan are physical operations which relate to the
actuall execution plan.
Overall, I found the paper to be quite informative, and it ddid a good
job of explaining and exposing aspects of System R which I did not fully
understand when I first read the paper. In addition, it brought up many
good points relating to the potential benifits that can be gained from
techniques as opposed to the cost of actually employing that technique.
-Lucas Kreger-Stickles
This archive was generated by hypermail 2.1.6 : Mon May 24 2004 - 12:09:00 PDT