From: Michael Gubanov (mgubanov@cs.washington.edu)
Date: Wed May 19 2004 - 11:49:40 PDT
Paper: Query Evaluation Techniques for Large Databases, G. Graefe
Summary: The paper presents the detailed survey of different algorithms
for query evaluation and optimization in relational databases
Main Ideas:
- Iterator model for implementation of physical plan
- Scheduling all the physical operators within one OS
process
and make them calling each other sequentially. One procedure
call
is much cheaper than RPC/LRPC or interprocess communication,
thus
by combining the operators in different ways it is possible
to
implement any complex physical query plan, at the same time
preserving simplicity and performance.
- Two main approaches to create initial runs for merge-sort
- Quicksort for in-memory processing
- Replacement sections, based on priority heap for runs
larger than
main memory
- Hashing algorithms
- Input file is first partitioned into F=M/C partition files
before
start in-memory hash table initialization
- Bucket tuning and dynamic destaging are
additional optimizations
- Hybrid hashing is able to use entire main memory, and at
the same time
create partitions on disk if overflow occurs
- It is important to have extents to support efficient read-ahead. IBM
DB/2
supports its own extents by leveraging UNIX raw block device drivers
- Clustered/Unclustered, Sparse/Dense indexes
- Using nested loops, hashing and sorting for aggregation and duplicates
removal.
All of the algorithms (without optimizations) are pretty
straightforward.
- When sorting make early aggregation, this will compact the
files' size
The latter two are usually used in commercial databases
- Nested loop join, block nested loop join as optimization, merge-join,
hash joins
- heap-filter merge join which first sorts the smaller
relation, stores
the intermediate result. creates sorted runs for the larger
relation
and merge-joins them immediately with stored intermediate
table
- hybrid merge-join uses merge to join the outer input with
the B-Tree
index for inner table
- Sort-based join algorithms require flattening of nested relations
first
- Semi-joins for distributed join [P. Bernstein, et al]
Relevance:
This is definitely an interesting overview of algorithms and data
structures used for query evaluation.
This archive was generated by hypermail 2.1.6 : Wed May 19 2004 - 11:49:38 PDT