Qpt

From: Michael Gubanov (mgubanov@cs.washington.edu)
Date: Mon May 24 2004 - 10:54:41 PDT

Next message: Steven Balensiefer: "Optimizations"

Previous message: Alexander Moshchuk: "Overview of query optimization"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

Paper: An Overview of Query optimization in Relational Systems, S.
Chaudhuri

Summary: The paper presents a review of main query optimization
techniques

known at the time starting from these introduced by System R.

In addition, some ideas employed by distributed databases

and stored procedures optimizations are presented.

Main Ideas:

- A desirable optimizer is the one with low cost plans in search space,

accurate costing technique and efficient enumeration algorithm.

This definition is a cornerstone of the article as all described
techniques

are evaluated in respect to this definition.

- An overview of main ideas in System R optimizer (see previous review)

- The idea the optimizer might want to use several representations of

a query during the lifecycle of optimizing a query

- Query graph is an alternative representation, but it has
its own

drawbacks. It can not represent unions, outer joins and it
does not

capture the fact that SQL statements could be nested.

In Starburst, multigraph queries are represented as a set of
subgraphs

with edges among subgraphs which represent predicates across
query blocks.

In contrast, Exodus uses query trees on all optimization
stages.

- Bushy trees are usually not considered ny contemporary optimizers as

they expand the search space considerably and therefore its cost of
enumerating.

Deferring Cartesian products could results in poor performance in some
cases (! OLAP).

These features could be used adaptively on per-query basis in extensible
optimizers.

- LOJ and join do not commute freely, however the asymmetric
associativity applies

and can be used for some class of queries.

- Pushing down group by clauses (and leave them on the top as well)

can significantly reduce the bottom-top data flow and therefore improve

overall performance.

- Reducing multi-block queries into single-block by unrolling views,

unnesting correlated subqueries and benefiting from materialized views

(if there are any) are other possible optimizations which could be used

for some classes of queries.

- Histograms on a column as a way to provide statistics on the single
column.

It does not capture the correlations between columns, which is an
important

fact for latter optimization. It could have been used for estimating
selectivity

factor on other columns while already having it for a one.

To address this problem many systems use the selectivity of the most
selective

predicate and try to identify potential correlation.

- Short description and comparison of optimizations architecture

in Starburst and Exodus's successors.

- Some ideas for stored procedures, materialized views optimizations.

Using semi-joins for distributed query optimization.

Relevance:

I think it is very interesting review of contemporary state of query
optimization.

But, still, the core ideas which have been proposed by System R
optimizer

are the most influential

Next message: Steven Balensiefer: "Optimizations"

Previous message: Alexander Moshchuk: "Overview of query optimization"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.6 : Mon May 24 2004 - 10:54:40 PDT