Query Evaluation Techniques for Large Databases

From: Stavan Parikh (stavan@cs.washington.edu)
Date: Tue May 18 2004 - 22:18:52 PDT

Next message: Academic Sales: "Student, Teacher, School - Academic Software Discounts 6825"

Previous message: Ankur Jain: "(no subject)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

Graefe presents a survey of the techniques needed for query execution focusing on complex queries over large databases.This paper covers a variety of different topics - including basic techniques like sorting and hashing that are commonly used in query evaluation, disk access affects, and the major algorithms for query execution.

The author first defines the difference between logical and physical operators. The logical operators are programming concepts available in the query language, which can be implemented by some set of physical operators at the implementation level. This survey is about understanding the physical operators available in the face of memory constraints and large data sets. All physical operators can be classified as a set of open-next-close operators which simplifies implementation.

Almost all query evaluation techniques use sorting or hashing algorithms for processing. As discussed in class the sort algorithms are merge-sort based for disk-access. For hashing there are overflow avoidance or resolution techniques that can be used. For disk access they look at associative access using indices that can allow blocks of data to be tested quickly.

Next we looked at algorithms for aggregration and duplicate removal and at join techniques.

The author points out a list of concerns in generating complex query plans. These concerns are well thought out and logical. While reading I felt a discussion on how real databases dealt with these concerns would have been useful.

Next we looked at nested relations and the author realizes that operations that first convert the nested table into a flat table before query execution is very inefficient. The need for better algorithms to handle nested tables is needed. However, it seems from what we have done in class that nested tables are not that well handled even today (???).

Finally, we looked at how one can use bloom filters to help distribute data between processors to enable parallelism with low overhead.

Overall this paper, was a very thorough survey of the physical operators needed for query evaluation. It provided a good overview for a novice on query execution plan.

Then the author discusses aggregation and duplicate removal techniques that use sorting, hashing, or the basic nested loop methods. They provide a comparison between the three and show that they hybrid hashing scheme is most useful.

Next message: Academic Sales: "Student, Teacher, School - Academic Software Discounts 6825"

Previous message: Ankur Jain: "(no subject)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.6 : Tue May 18 2004 - 22:18:53 PDT