SystemR

From: Michael Gubanov (mgubanov@cs.washington.edu)
Date: Mon May 24 2004 - 10:52:09 PDT

Next message: Lucas Kreger-Stickles: "Review: Acess Path Selection in a Relational DBMS"

Previous message: Steven Balensiefer: "System R"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

Paper: Access Path Selection in a Relational Database Management System,

P. Selinger, M. Astrahan, D. Chamberlin, R. Lorie, T. Price

Summary: The paper describes query optimization for single relation

access as well as for join and nested queries evaluation used in System
R.

Main Ideas:

- The idea to benefit from permuting join order and selecting different

access paths for each table in SQL statement

- RSS: using separate storage system for storing relations, indexes,

maintaining locks and logs for recovery

- Two ways of accessing tuples in a relation:

- segment scan, which examines all the pages in the segment
and returns

only these which belong to the needed relation. Each page in
the segment

is touched regardless of has it needed tuples or not, but it
is touched only once.

- index scan examines only the pages which contain tuples
from needed relation,

but, in contrast to segment scan, pages might end up being
touched more than once.

If the index is clustered, than the pages are touched
exactly once.

- Detailed statistics on each table and index, stored for cost
estimation

- Cost estimation formula for single relation which takes
into account

time spent to fetch pages from disk and CPU time spent to
return tuples.

- Selectivity factors are calculated based on the statistics
stored at the moment

for each of the possible Boolean condition in WHERE clause

- Taking into account in cost estimation tuples' order which
is

different for different access paths

- Two access paths for joins

- Nested loop

- Merge join

- N-way joins are represented as a sequence of two-way joins

- The order in which relations are joined matters

- To reduce the number of permutations to try, the
heuristics is presented

which is about evaluating predicates as early as possible.

After applying the heuristics, the tree of of possible
solutions is constructed

to find the cheapest one easily.

- Cost formulas for nested loop and merge joins are the same, but the
difference

is that merge join benefits if the inner relation is sorted. This
significantly

reduces the cost.

- Evaluation of nested subquery only once if it is not a correlated
subquery.

Correlated subquery could be evaluated conditionally if the referenced
relation

is ordered on the referenced attribute. This allows the test of whether
the

current referenced value is the same as the one in the previous
candidate tuple.

This could reduce the total number of correlated query re-evaluation if
the previous

result can be reused.

Relevance:

This is very interesting and amazingly easy to read paper, despite the
fact

it covers many query optimization ideas which are widely used today in
any of

contemporary relational engines.

Next message: Lucas Kreger-Stickles: "Review: Acess Path Selection in a Relational DBMS"

Previous message: Steven Balensiefer: "System R"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.6 : Mon May 24 2004 - 10:52:10 PDT