The big paper

From: Steven Balensiefer (alaska@cs.washington.edu)
Date: Wed May 19 2004 - 11:34:06 PDT

  • Next message: Alexander Moshchuk: "review"

    Query Evaluation Techniques for Large Databases

     This paper provides an extensive survey of all issues related to query
    evalutions, from sorting to complex query plans and bit-vector filtering.
    The information provided is enough to get a grasp on the techniques, but
    not enough to actually implement them. As a survey paper, I think this
    level of depth is quite appropriate, especially considering the wealth of
    references to other papers that likely contain the pertinent details.

     I thought the discussion of the difference between physical and logical
    algebras was interesting. I'd naively assumed that most of the DB Engine
    contained the same "toolbox" and that differences in performance came from
    the application of those tools. It makes sense that they would rely on a
    smaller set of physical operators that could be applied to multiple
    logical operators.

     I felt the treatment of sorting was especially helpful because it not
    only presented the algorithms but motivated them as well. The observation
    that the CPU processing speed could be distilled down to its ability to
    assemble new pages was also and eye-opener for someone who studies
    architecture.

     My one complaint with section 2 was that I'm not totally clear on when
    to choose sorting over hashing or vice versa. Is it simply the case that
    they both require tradeoffs and are therefore a design decision? Even this
    complaint is relatively minor.

     The discussion of joins was particularly interesting, because we simply
    assume they can be computed quickly. I thought the idea of a zig-zag join
    was appealing, but had to agree that I couldn't think of any case where it
    would improve performance. On the other hand, I could instantly see the
    benefits of the hybrid merge-join. Sorting based on physical location to
    optimize disk access is intuitively appealing, and DB2 has certainly shown
    that it works in practice.

     In the discussion of complex query plans, the claim was made that a
    left-deep tree required a sequential ordering, but that a right-deep tree
    plan could have all build phases of hash joins performed concurrently.
    While I have no cause to dispute this, I don't understand why the
    difference would occur. Apart from that, the concerns about complex
    query plans were all clear, and believable.

     I was surprised to see the section on nested relations, because I was
    under the impression that they'd been discarded based on the principle of
    normalization. The discussion about flattening an unnesting only did more
    to confuse me, because it suggested that the only benefit of nested
    relations is to prevent some redundancy, while slowing the query
    processing down to perform these operations.

     Finally, the bit-vector filtering was definitely an idea that I liked.
    The elegance of representing an entire table as a bit vector during
    transmission is impressive. Even better was the discussion of bit vectors
    during recursive hashing to eliminate unnecessary elements at every step
    along the way. The best part of this approach is that the hashing
    calculation has already been done, so that there's no need to apply a
    bit-vector function

     Overall, I thought this was a great paper to read because it hit on so
    many alternatives, and gave enough information that we could understand
    the more detailed papers necessary to actually implement these techniques.


  • Next message: Alexander Moshchuk: "review"

    This archive was generated by hypermail 2.1.6 : Wed May 19 2004 - 11:34:07 PDT