Review of Query Evaluation Techniques for Large Databases

From: Neva Cherniavsky (nchernia@cs.washington.edu)
Date: Tue May 18 2004 - 16:18:58 PDT

  • Next message: Stavan Parikh: "Access Path Selection in a Relational Database Management System"

    This paper surveys query evaluation techniques for large databases,
    specifically exploring algorithms and architectures of database query
    execution engines. In general, the paper looks at complex queries (those
    that require a number of different algorithms to work together) on files
    too large to fit into main memory.

    The sections we cover are section 1, which explores the architecture of
    query execution engines; section 2, which discusses sorting and hashing;
    section 3, which quantifies the cost of disk accesses for large databases;
    section 4, which describes methods for removing duplicates and
    implementing aggregate functions; section 5, which discusses algorithms
    for the join operation; section 8, which explores execution of very
    complex plans with many operators; section 11.1, which is about nested
    relations; and section 12.4, which is about bit vector filters.

    Section 1 explores the physical algebra of a database query engine, as
    opposed to the logical algebra of the database system. The authors note
    that it is impossible to tell from reading a logical expression what the
    execution time would be, without mapping it to a physical system. They
    note that logical operations usually do not map directly to physical ones,
    making this process tricky. One solution is to separate the functions for
    the operators into prepare, produce, and cleanup. This actually leads to
    a quite elegant implementation of a join operator, where the entire query
    plan can be executed within a single process and items never have to wait
    in a temporary buffer, and the operator is unaffected by the complexity of
    the whole plan.

    Section 2 details sorting and hashing algorithms in database systems.
    Sorting is quite common and merging is the type of sort used. The authors
    note that sorting should be implemented the same way as above, i.e. with
    prepare, produce, and cleanup procedures. We went over much of what they
    discuss here in class. The authors describe a nice algorithm for
    producing runs for merge sort that uses a priority heap. Keys are read in
    and stored in the heap, then written out in order; the one that's been
    written is replaced by a read, so the memory is utilized. When the
    smallest key found after reading is smaller than the smallest that has
    been written to disk, that key is marked for the next run. When all keys
    have been marked for the next run, the next run starts. Though I found
    this algorithm quite appealing, apparently there's a trade off in fewer
    runs versus the different I/O pattern it causes and the resulting increase
    in complexity of memory management.

    Hashing seems to me to be a very good way of accomplishing matching tasks
    in a database management system. The authors note that hashing is easy
    when the entire hash table fits into memory; otherwise, you must deal with
    hash table overflow. The two methods they describe for this are avoidance
    and resolution, though it seems that hybrid-hash algorithms work the best.
    These use all available memory for in-memory processing but are also able
    to process large input files. According to the authors, a major problem
    with hash-based algorithms is you must have a good hash function; all the
    benefits of hashing diminish (considerably) when the hash is skewed.

    Section three discusses disk access in database systems, and observes that
    read-ahead is a good way for a database to increase efficiency; this
    requires contiguous file allocation, which is not supported by UNIX. I
    was interested to read that some database designers for UNIX have
    essentially implemented their own file system that does read and write
    contiguous files, and I wondered if Postgres did this. The authors then
    describe data structures, such as the B-trees which we've gone over in
    class. They also talk about index only scans and note when they can be
    more efficient. I thought the newer systems that implemented a dynamic
    index scan (e.g., decided at run time which indices to scan) sounded neat
    (though they must be quite complex).

    Section 4 describes aggregation and duplicate removal. These are closely
    related and are discussed interchangeably. According to the authors, in
    most commercial systems aggregation and duplicate removal algorithms are
    based on sorting, using temporary files for output. The authors note that
    with today's memory improvement, iterators could also be used to exploit
    the larger memory size. I wondered if since this survey, any commercial
    system had been changed to use iterators. It does seem rather inefficient
    to continue using temporary files for this task.

    The authors go on to describe nested loops, removing duplicates as early
    as possible, and hashing. I thought the second was a nice improvement,
    and easy to implement via sorting or hashing. From the graphs, it looks
    like hybrid hashing is the best way to do aggregation, though of course it
    is highly dependent on how good the hash function is.

    Joins are discussed in the next section. There are many techniques,
    though the authors note that newer ones are sometimes unable to answer the
    question, "Does this new technique apply to joining three inputs without
    interrupting data flow between the join operations?" I would like to know
    more about this question; is it widely considered an important property of
    join techniques? What kinds of techniques fail this test that have been
    seriously examined by the database community?

    Nested loop join, merge-join, and hash-join algorithms are described; we
    talked about these in class. The heap-filter merge join sounds like a
    good way to save I/O operations over merge join and be more efficient that
    nested join. Again, it was emphasized that skew in the hash function
    messes things up for hash join.

    Section 8 discusses execution of complex query plans. Optimal scheduling
    of multiple operators and the allocation of resources are the main issues.
    One way to do this is through the use of "stop points", which allow the
    engine to switch between subplans. It is also important to let the engine
    decide if it must force a stop point due to overflow. Also, proportional
    memory division is suggested as the best solution for multiple operators
    being active concurrently. For recursion, it is suggested that the
    algorithms should proceed in phases, with one phases completed before
    another begins. It is noted that disk bandwidth and disk arms for seeking
    are important issues that must be dealt with; I wondered if these issues
    had been discussed by the database community beyond this survey. The
    authors describe several optimizations that should be made by database
    designers, but note that the optimizations are complex and will require
    further research.

    Nested relations are discussed in section 11.1. I thought nested
    relations were quite a natural construct for the database system, though I
    can see how it adds complexity to allow this feature in a system.
    However, I also thought this was a big difference between XML and
    relational DBMS; so if DBMS can support nested queries, it would seem that
    the answer would be to use XML as a data transfer language but keep the
    data on disk in a RDBMS. Joins over nested queries seem to be the biggest
    complication, and one that is not really solved yet (though they do
    discuss a system that uses parallel partitioned nested-hashed loops that
    sounds promising).

    Finally, in section 12.4, bit vector filters are discussed. These filters
    are used to decide when a tuple is important to the operation and when it
    can be discarded. A hash filter determines if a tuple will be examined;
    this way, some tuples that don't need to be looked at are, but no tuples
    that should have been looked at are missed. In this way, a large fraction
    of the probe tuples can be discarded before incurring network costs.
    Also, during recursive partitioning, filters can be used at increasingly
    small granularities to remove items from the probe input that do not
    contribute to the join result. Bit vectors seem like a very elegant
    solution to the low bandwidth problem (better than semi-joins) and I don't
    really see any disadvantages, just that they may not always help very
    much.

    Long paper = long review...


  • Next message: Stavan Parikh: "Access Path Selection in a Relational Database Management System"

    This archive was generated by hypermail 2.1.6 : Tue May 18 2004 - 16:18:58 PDT