From: Bhushan Mandhani (bhushan@cs.washington.edu)
Date: Mon May 24 2004 - 11:57:09 PDT
Access Path Selection in a RDBMS
This paper describes the query optimizer of System R. It introduces the
key idea of estimating costs of different plans using statistics
maintained in the system catalog, and then choosing the plan with least
cost. It is thus a pioneering paper in query optimization in relational
systems.
For accessing a relation, the paper defines a cost model that considers
both I/O and CPU cost. In moderns systems, we are not too concerned about
the CPU cost. It introduces the idea of maintaining statistics like no of
tuples and disk pages for a relation, no of values and disk pages for an
index, etc in the system catalog. It then describes how it uses these
stats in estimating the costs of access paths. It also introduces
"selectivity factors" for individual predicates occuring in the Where
clause of a query block. It describes how this is computed for different
kinds of predicates, and how they are used in cost estimation for an
access path.
The other key thing it does is give a method for plan selection for
multiway joins. The join order to choose is an important issue, and can
dramtically effect query execution time. It first gives very natural
heuristics which restrict the space of alternatives considered. It then
finds the actual join plan using a dynamic programming algorithm. It
starts out by finding the best way to access each individual relation. It
next uses these to find the best way to join each pair of relations. These
in turn are used to find the best 3-way joins, and so on. It does this
efficiently using tree search, and keeps track of intermediate plan costs
and tuple orderings.
It is clear the key impact of this paper, and the System R project, was to
establish the feasibility of having a relational database system, and
support for a non-procedural query language like SQL.
This archive was generated by hypermail 2.1.6 : Mon May 24 2004 - 11:57:09 PDT