From: Michael Gubanov (mgubanov@cs.washington.edu)
Date: Mon May 24 2004 - 10:54:41 PDT
Paper: An Overview of Query optimization in Relational Systems, S.
Chaudhuri
Summary: The paper presents a review of main query optimization
techniques
known at the time starting from these introduced by System R.
In addition, some ideas employed by distributed databases
and stored procedures optimizations are presented.
Main Ideas:
- A desirable optimizer is the one with low cost plans in search space,
accurate costing technique and efficient enumeration algorithm.
This definition is a cornerstone of the article as all described
techniques
are evaluated in respect to this definition.
- An overview of main ideas in System R optimizer (see previous review)
- The idea the optimizer might want to use several representations of
a query during the lifecycle of optimizing a query
- Query graph is an alternative representation, but it has
its own
drawbacks. It can not represent unions, outer joins and it
does not
capture the fact that SQL statements could be nested.
In Starburst, multigraph queries are represented as a set of
subgraphs
with edges among subgraphs which represent predicates across
query blocks.
In contrast, Exodus uses query trees on all optimization
stages.
- Bushy trees are usually not considered ny contemporary optimizers as
they expand the search space considerably and therefore its cost of
enumerating.
Deferring Cartesian products could results in poor performance in some
cases (! OLAP).
These features could be used adaptively on per-query basis in extensible
optimizers.
- LOJ and join do not commute freely, however the asymmetric
associativity applies
and can be used for some class of queries.
- Pushing down group by clauses (and leave them on the top as well)
can significantly reduce the bottom-top data flow and therefore improve
overall performance.
- Reducing multi-block queries into single-block by unrolling views,
unnesting correlated subqueries and benefiting from materialized views
(if there are any) are other possible optimizations which could be used
for some classes of queries.
- Histograms on a column as a way to provide statistics on the single
column.
It does not capture the correlations between columns, which is an
important
fact for latter optimization. It could have been used for estimating
selectivity
factor on other columns while already having it for a one.
To address this problem many systems use the selectivity of the most
selective
predicate and try to identify potential correlation.
- Short description and comparison of optimizations architecture
in Starburst and Exodus's successors.
- Some ideas for stored procedures, materialized views optimizations.
Using semi-joins for distributed query optimization.
Relevance:
I think it is very interesting review of contemporary state of query
optimization.
But, still, the core ideas which have been proposed by System R
optimizer
are the most influential
This archive was generated by hypermail 2.1.6 : Mon May 24 2004 - 10:54:40 PDT