From: Joe Xavier (joexav@microsoft.com)
Date: Wed May 19 2004 - 11:12:09 PDT
Review: Goetz Graefe - Query Evaluation Techniques for Large Databases
This paper reviews different operational aspects of a DBMS's query
execution engine. The most prominent aspects of the sections we reviewed
are the treatment of sorting and hashing algorithms.
Sorting - the algorithm described for sorting is essentially quicksort
of datasets that fit in memory and merge for larger datasets.
Sorting-based aggregation outperforms nested-loops and aggregating as
early as possible is useful. Merge-join, the sorting based join
algorithm is widely used since it's rough cost can be estimated and it
performs well enough. Can also be used to implement division between
tables.
Hashing - hybrid hashing writes only necessary data to disk saving I/O
cost. Aggregation based on hybrid hash performs better than
sorting-based algorithm with early aggregation. Although there is an
overhead in computing the hash value, the search space is made smaller
and so it outperforms merge-join. Combined with bit map, the hash method
is efficient for processing division operation.
The paper also briefly covers Indexing. The paper compares various index
structures according to their support for ordering and sorted scans and
their support for point data and range data.
It would've taken a few more days to do a more thorough review ...
This archive was generated by hypermail 2.1.6 : Wed May 19 2004 - 11:12:18 PDT