From: Charles Giefer (cgiefer@comcast.net)
Date: Mon May 24 2004 - 02:26:20 PDT
An Overview of Query Optimization in Relational Systems
Surajit Chaudhuri
This paper was a general overview of optimizing queries. It explained much
of what System R did in contemporary language. It also showed the current
direction of query optimization and the difficulties associated with good
optimizers.
Normal joins seem like the easiest class of physical operations to optimize.
This is because they are associative and commutative. The problem gets more
complicated when aggregates, user-defined functions, other types of joins,
and nested queries and views are introduced.
There are a few interesting details I discovered in this paper. First, I
never thought about parallel execution in quite the same terms as defined in
this paper. They describe that the goal of parallel execution is to reduce
response. Sequential systems, on the other hand, focus on reducing total
work. This is entirely true and in terms of the cost models addressed in
this paper, a different cost analysis is needed.
Another interesting topic brought up in the paper was unnesting the
structure of an SQL query. I wonder if it is as simple as applying specific
rules to the query to rewrite it as a single level, or if it requires a more
intense and sometimes custom examination.
Also, I don't understand completely why bushy paths are avoided. To me it
seemed like the only reason was because it avoided temporary tables. Are
there ever reasons a bushy table would be preferred and the cost of the
temporary tables would be offset?
The user-defined functions was the final area that peaked my interest. The
example proposed (searching images stored in BLOBs) seemed like a very
applicable usage. For instance, this would occur if I'm matching parts of a
fingerprint in a database of fingerprint data and images. It is difficult
to characterize the cost model of this user-defined function. But, since it
is user-defined, shouldn't the user be able to give some sort of hint of how
to estimate its cost? Also, it should probably state the associativity and
commutativity of the function. This would be part of integrating the
function into the database query execution.
While it was a good paper, it seemed to reiterate much of what was said in
the System R paper. The best part of this paper was its focus on today's
research and relevant topics. It avoided a lot of boring details of query
optimization but made up for this space by providing a fair amount of fluff.
This archive was generated by hypermail 2.1.6 : Mon May 24 2004 - 02:26:34 PDT