paper 7 review

From: Bhushan Mandhani (bhushan@cs.washington.edu)
Date: Wed May 19 2004 - 12:04:34 PDT

  • Next message: CR: "Query Evaluation Techniques for Large Databases"

            Query Evaluation Techniques for Large Databases
                  - Goetz Graefe

    This paper does a comprehensive survey of the algorithms used in
    implementing the query execution engine of a database system. It starts
    out by defining a physical algebra, and emphasizing the issue of data
    transfer between operators. This issue is then resolved by implementing
    operators as iterators. I found the iterator model very appealing since it
    allows the operators to execute and schedule each other in a single OS
    process, and allows the combining of operators into complex query plans.

    It then discusses mergesort, and the issues involved in the choice of
    either quicksort or replacement selection for creating the initial runs.
    The latter alternates read and write operations to disk, and that gives me
    the impression it should perform better when it is in the middle of a
    complex query plan. However, memory management introduces an overhead in
    this case.

    It then discusses hashing, and does a good job of motivating and
    describing hybrid hashing. The section on data access using indexes was
    quite insightful. It describes how query evaluation might be done by doing
    operations like joins, union and intersection on relations of RID's
    obtained using indices, and then extracting the tuples only at the end
    using the result RID's. Clearly, indices offer opportunities for smart
    query optimization. Also, different index structures are compared along
    various dimensions.

    It then describes the join and its variants, and gives insight into the
    ubiquity of this operation, and the importance of efficiently evaluating
    it. It then discussed the nested loops, sorting and hash based algorithms
    for joins (and also for aggregation and duplicate removal). It describes
    the basic algorithms, methods to optimize them, and computes and compares
    their disk I/O cost. A greater amount of analysis of the merge join and
    hash join would have been nice, since it is clear they are the two most
    important methods, and deserved greater depth of coverage.

    Overall the paper was comprehensive, and gave great insight into the
    things that go into the implementation of a database query engine, and was
    thus, a great read.


  • Next message: CR: "Query Evaluation Techniques for Large Databases"

    This archive was generated by hypermail 2.1.6 : Wed May 19 2004 - 12:04:35 PDT