QEval

From: Michael Gubanov (mgubanov@cs.washington.edu)
Date: Wed May 19 2004 - 11:49:40 PDT

  • Next message: Lucas Kreger-Stickles: "Review: Query Evaluation Techniques for Large Databases."

    Paper: Query Evaluation Techniques for Large Databases, G. Graefe

     

    Summary: The paper presents the detailed survey of different algorithms

    for query evaluation and optimization in relational databases

     

    Main Ideas:

    - Iterator model for implementation of physical plan

                - Scheduling all the physical operators within one OS
    process

                and make them calling each other sequentially. One procedure
    call

                is much cheaper than RPC/LRPC or interprocess communication,
    thus

                by combining the operators in different ways it is possible
    to

                implement any complex physical query plan, at the same time

                preserving simplicity and performance.

     

    - Two main approaches to create initial runs for merge-sort

                - Quicksort for in-memory processing

                - Replacement sections, based on priority heap for runs
    larger than

                main memory

     

    - Hashing algorithms

                - Input file is first partitioned into F=M/C partition files
    before

                start in-memory hash table initialization

                            - Bucket tuning and dynamic destaging are
    additional optimizations

                - Hybrid hashing is able to use entire main memory, and at
    the same time

                create partitions on disk if overflow occurs

     

    - It is important to have extents to support efficient read-ahead. IBM
    DB/2

    supports its own extents by leveraging UNIX raw block device drivers

     

    - Clustered/Unclustered, Sparse/Dense indexes

     

    - Using nested loops, hashing and sorting for aggregation and duplicates
    removal.

    All of the algorithms (without optimizations) are pretty
    straightforward.

                - When sorting make early aggregation, this will compact the
    files' size

    The latter two are usually used in commercial databases

     

    - Nested loop join, block nested loop join as optimization, merge-join,
    hash joins

                - heap-filter merge join which first sorts the smaller
    relation, stores

                the intermediate result. creates sorted runs for the larger
    relation

                and merge-joins them immediately with stored intermediate
    table

     

                - hybrid merge-join uses merge to join the outer input with
    the B-Tree

                index for inner table

     

    - Sort-based join algorithms require flattening of nested relations
    first

     

    - Semi-joins for distributed join [P. Bernstein, et al]

     

    Relevance:

    This is definitely an interesting overview of algorithms and data

    structures used for query evaluation.


  • Next message: Lucas Kreger-Stickles: "Review: Query Evaluation Techniques for Large Databases."

    This archive was generated by hypermail 2.1.6 : Wed May 19 2004 - 11:49:38 PDT