From: Atri Rudra (atri@cs.washington.edu)
Date: Wed May 19 2004 - 02:25:46 PDT
Query Evaluation Techniques for Large Databases
------------------------------------------------------------
G. Graefe
This (humongous) survey looks at various techniques for query execution
for large databases with emphasis on complex queries. We looked at the
architecture of query execution engines; Sorting and Hashing; Disk access;
Aggregation algorithms based on nested loops, sorting and hashing; Binary
Matching algorithms based on nested loops, sorting and hashing; Scheduling
and resource allocation in complex queries; Nested Relations and Bit
Vector Filtering.
Here are my questions/comments:
(i) The one things that stood out for me in this paper is the omnipresence
of nested loops, sorting and hashing. Since this survey was written has
any new way of doing the algorithms evolved ? If not (which is my guess),
any particular reason for the non-existence of an alternate paradigm?
(ii) The interplay between logical and physical algebra is basically the
association of data structures to algorithms: which begs a question which
is perhaps related to the last one-- have any other "sophisticated" data
structures made their way into query execution ?
(iii) I found the references to iterators in the Sorting section
interesting as I have
used iterators in my project (though iterators were used there to go
through all possible database instances).
(iv) The part about using statistical information to create better hash
buckets (specially in the recursive partitioning case) seemed to be "ripe"
ground for using online learning techniques-- has there been any work in
this direction ?
(v) The author mentions aggregations where inputs items are not divided
into equivalence classes: do people use such queries in real life and/or
is there a motivating example for such unusual aggregates ?
(vi) The author says things like what would
happen if object oriented databases completely take over relational
databases which is interesting in the view of the Decade of turmoil paper
we read earlier in the quarter.
(vii) The author mentions some work using classical economic concepts in
Sec 8. Two questions: (i) was the work done in the online or offline case
and (ii) [This is way off on the tangent] Has there been work using
game theory to solve some database problem ?
(viii) The concept of having stop points in some query processing
algorithms seems similar to the OS concepts of timesharing and context
switching.
(ix) Nested Relations seem similar to semi structured data like XML: is
this connection valid ? Also in queries involving NF^2, can the select
have conditions which involve sets ?
(x) Overall a pretty comprehensive survey though I am glad we did not have
to go through all of it :-)
This archive was generated by hypermail 2.1.6 : Wed May 19 2004 - 02:25:47 PDT