From: Michael Gubanov (mgubanov@cs.washington.edu)
Date: Mon May 24 2004 - 10:52:09 PDT
Paper: Access Path Selection in a Relational Database Management System,
P. Selinger, M. Astrahan, D. Chamberlin, R. Lorie, T. Price
Summary: The paper describes query optimization for single relation
access as well as for join and nested queries evaluation used in System
R.
Main Ideas:
- The idea to benefit from permuting join order and selecting different
access paths for each table in SQL statement
- RSS: using separate storage system for storing relations, indexes,
maintaining locks and logs for recovery
- Two ways of accessing tuples in a relation:
- segment scan, which examines all the pages in the segment
and returns
only these which belong to the needed relation. Each page in
the segment
is touched regardless of has it needed tuples or not, but it
is touched only once.
- index scan examines only the pages which contain tuples
from needed relation,
but, in contrast to segment scan, pages might end up being
touched more than once.
If the index is clustered, than the pages are touched
exactly once.
- Detailed statistics on each table and index, stored for cost
estimation
- Cost estimation formula for single relation which takes
into account
time spent to fetch pages from disk and CPU time spent to
return tuples.
- Selectivity factors are calculated based on the statistics
stored at the moment
for each of the possible Boolean condition in WHERE clause
- Taking into account in cost estimation tuples' order which
is
different for different access paths
- Two access paths for joins
- Nested loop
- Merge join
- N-way joins are represented as a sequence of two-way joins
- The order in which relations are joined matters
- To reduce the number of permutations to try, the
heuristics is presented
which is about evaluating predicates as early as possible.
After applying the heuristics, the tree of of possible
solutions is constructed
to find the cheapest one easily.
- Cost formulas for nested loop and merge joins are the same, but the
difference
is that merge join benefits if the inner relation is sorted. This
significantly
reduces the cost.
- Evaluation of nested subquery only once if it is not a correlated
subquery.
Correlated subquery could be evaluated conditionally if the referenced
relation
is ordered on the referenced attribute. This allows the test of whether
the
current referenced value is the same as the one in the previous
candidate tuple.
This could reduce the total number of correlated query re-evaluation if
the previous
result can be reused.
Relevance:
This is very interesting and amazingly easy to read paper, despite the
fact
it covers many query optimization ideas which are widely used today in
any of
contemporary relational engines.
This archive was generated by hypermail 2.1.6 : Mon May 24 2004 - 10:52:10 PDT