From: Steven Balensiefer (alaska@cs.washington.edu)
Date: Wed May 19 2004 - 11:34:06 PDT
Query Evaluation Techniques for Large Databases
This paper provides an extensive survey of all issues related to query
evalutions, from sorting to complex query plans and bit-vector filtering.
The information provided is enough to get a grasp on the techniques, but
not enough to actually implement them. As a survey paper, I think this
level of depth is quite appropriate, especially considering the wealth of
references to other papers that likely contain the pertinent details.
I thought the discussion of the difference between physical and logical
algebras was interesting. I'd naively assumed that most of the DB Engine
contained the same "toolbox" and that differences in performance came from
the application of those tools. It makes sense that they would rely on a
smaller set of physical operators that could be applied to multiple
logical operators.
I felt the treatment of sorting was especially helpful because it not
only presented the algorithms but motivated them as well. The observation
that the CPU processing speed could be distilled down to its ability to
assemble new pages was also and eye-opener for someone who studies
architecture.
My one complaint with section 2 was that I'm not totally clear on when
to choose sorting over hashing or vice versa. Is it simply the case that
they both require tradeoffs and are therefore a design decision? Even this
complaint is relatively minor.
The discussion of joins was particularly interesting, because we simply
assume they can be computed quickly. I thought the idea of a zig-zag join
was appealing, but had to agree that I couldn't think of any case where it
would improve performance. On the other hand, I could instantly see the
benefits of the hybrid merge-join. Sorting based on physical location to
optimize disk access is intuitively appealing, and DB2 has certainly shown
that it works in practice.
In the discussion of complex query plans, the claim was made that a
left-deep tree required a sequential ordering, but that a right-deep tree
plan could have all build phases of hash joins performed concurrently.
While I have no cause to dispute this, I don't understand why the
difference would occur. Apart from that, the concerns about complex
query plans were all clear, and believable.
I was surprised to see the section on nested relations, because I was
under the impression that they'd been discarded based on the principle of
normalization. The discussion about flattening an unnesting only did more
to confuse me, because it suggested that the only benefit of nested
relations is to prevent some redundancy, while slowing the query
processing down to perform these operations.
Finally, the bit-vector filtering was definitely an idea that I liked.
The elegance of representing an entire table as a bit vector during
transmission is impressive. Even better was the discussion of bit vectors
during recursive hashing to eliminate unnecessary elements at every step
along the way. The best part of this approach is that the hashing
calculation has already been done, so that there's no need to apply a
bit-vector function
Overall, I thought this was a great paper to read because it hit on so
many alternatives, and gave enough information that we could understand
the more detailed papers necessary to actually implement these techniques.
This archive was generated by hypermail 2.1.6 : Wed May 19 2004 - 11:34:07 PDT