From: CR (chrisre@cs.washington.edu)
Date: Wed May 19 2004 - 13:27:42 PDT
This paper was basically a textbook on techniques to do Query
evaluation. Being a survey paper there is not a ton to comment about in
terms of strengths a weaknesses. One thing I would have liked more on
was the role of indices and their selection during execution. Also
support for things like R+, R* trees and cache conscious algorithms.
The last of these brings me to the freshest thoughts I had while reading
this paper. I read it over many days so the beginning despite my notes
is hazy. I would have liked more cache conscious algorithms. I think
these algorithms are increasingly important as main memory grows and the
cost to disk also increases. In the architecture course here, we were
told how the time it costs to access instructions is changing from tens
to hundreds to thousands of seconds in penalties. These orders of
magnitude are comparable to the cost to access disk as compared to main
memory. It seems that there should be similar strategies that map over
from one gap to another. For truly high performance query evaluation
with multiple users this seems to me to be one area that really needs work.
Also, computation is cheap in many respects, deriving some of the data
through compression and codes (as was mentioned) has many interesting
effects. I would bet that even naïve strategies, such as RID list
intersections for joins across unclustered indicies would be faster than
sophisticated strategies provided they were cache aware. Just as the old
cost model of Random and sequential I/Os is used to estimate query time,
it seems that cache should be added – hopefully these two models can be
merged into one for the challenges coming after this (such as when it
takes 100 cycles to cross a chip let alone memory).
I would have also liked to see more about the control flow operators
that are actually used. I was told that parallel databases were not that
highly scalable and that true algorithmic changes were needed before we
could parallelize queries in practice. We learned that some DBs scale up
to 64 processors but what about scaling queries to internet size over
federated sources. This seems to be the scale we should be thinking
about – some DBs currently handle such large numbers of independent
transactions that 64 processors is essentially a slop factor for them.
This archive was generated by hypermail 2.1.6 : Wed May 19 2004 - 13:27:49 PDT