Query Evaluation Techniques for Large Databases

From: CR (chrisre@cs.washington.edu)
Date: Wed May 19 2004 - 13:27:42 PDT

  • Next message: Ankur Jain: "Review"

      This paper was basically a textbook on techniques to do Query
    evaluation. Being a survey paper there is not a ton to comment about in
    terms of strengths a weaknesses. One thing I would have liked more on
    was the role of indices and their selection during execution. Also
    support for things like R+, R* trees and cache conscious algorithms.

    The last of these brings me to the freshest thoughts I had while reading
    this paper. I read it over many days so the beginning despite my notes
    is hazy. I would have liked more cache conscious algorithms. I think
    these algorithms are increasingly important as main memory grows and the
    cost to disk also increases. In the architecture course here, we were
    told how the time it costs to access instructions is changing from tens
    to hundreds to thousands of seconds in penalties. These orders of
    magnitude are comparable to the cost to access disk as compared to main
    memory. It seems that there should be similar strategies that map over
    from one gap to another. For truly high performance query evaluation
    with multiple users this seems to me to be one area that really needs work.

    Also, computation is cheap in many respects, deriving some of the data
    through compression and codes (as was mentioned) has many interesting
    effects. I would bet that even naïve strategies, such as RID list
    intersections for joins across unclustered indicies would be faster than
    sophisticated strategies provided they were cache aware. Just as the old
    cost model of Random and sequential I/Os is used to estimate query time,
    it seems that cache should be added – hopefully these two models can be
    merged into one for the challenges coming after this (such as when it
    takes 100 cycles to cross a chip let alone memory).

    I would have also liked to see more about the control flow operators
    that are actually used. I was told that parallel databases were not that
    highly scalable and that true algorithmic changes were needed before we
    could parallelize queries in practice. We learned that some DBs scale up
    to 64 processors but what about scaling queries to internet size over
    federated sources. This seems to be the scale we should be thinking
    about – some DBs currently handle such large numbers of independent
    transactions that 64 processors is essentially a slop factor for them.


  • Next message: Ankur Jain: "Review"

    This archive was generated by hypermail 2.1.6 : Wed May 19 2004 - 13:27:49 PDT