paper 7 review

From: Bhushan Mandhani (bhushan@cs.washington.edu)
Date: Wed May 19 2004 - 12:04:34 PDT

Next message: CR: "Query Evaluation Techniques for Large Databases"

Previous message: Lucas Kreger-Stickles: "Review: Query Evaluation Techniques for Large Databases."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

Query Evaluation Techniques for Large Databases
- Goetz Graefe

This paper does a comprehensive survey of the algorithms used in
implementing the query execution engine of a database system. It starts
out by defining a physical algebra, and emphasizing the issue of data
transfer between operators. This issue is then resolved by implementing
operators as iterators. I found the iterator model very appealing since it
allows the operators to execute and schedule each other in a single OS
process, and allows the combining of operators into complex query plans.

It then discusses mergesort, and the issues involved in the choice of
either quicksort or replacement selection for creating the initial runs.
The latter alternates read and write operations to disk, and that gives me
the impression it should perform better when it is in the middle of a
complex query plan. However, memory management introduces an overhead in
this case.

It then discusses hashing, and does a good job of motivating and
describing hybrid hashing. The section on data access using indexes was
quite insightful. It describes how query evaluation might be done by doing
operations like joins, union and intersection on relations of RID's
obtained using indices, and then extracting the tuples only at the end
using the result RID's. Clearly, indices offer opportunities for smart
query optimization. Also, different index structures are compared along
various dimensions.

It then describes the join and its variants, and gives insight into the
ubiquity of this operation, and the importance of efficiently evaluating
it. It then discussed the nested loops, sorting and hash based algorithms
for joins (and also for aggregation and duplicate removal). It describes
the basic algorithms, methods to optimize them, and computes and compares
their disk I/O cost. A greater amount of analysis of the merge join and
hash join would have been nice, since it is clear they are the two most
important methods, and deserved greater depth of coverage.

Overall the paper was comprehensive, and gave great insight into the
things that go into the implementation of a database query engine, and was
thus, a great read.

Next message: CR: "Query Evaluation Techniques for Large Databases"

Previous message: Lucas Kreger-Stickles: "Review: Query Evaluation Techniques for Large Databases."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.6 : Wed May 19 2004 - 12:04:35 PDT