Access Path Selection in a Relational Database Management System

From: Joe Xavier (joexav@microsoft.com)
Date: Mon May 24 2004 - 19:44:13 PDT

  • Next message: Education Sales: "Academic Discounts on Adobe, Microsoft, & Symantec 101250"

    P. G. Selinger et al - Access Path Selection in a Relational Database Management System

    The paper describes System R's query optimizer and in general covers the four stages in processing a SQL query - parsing (briefly), optimization, (coming up with plans), code generation (machine code) and execution.
    Section 3 talks about the Research Storage System which is the storage subsystem. Access is through a tuple-oriented interface - RSI. The section talks about how data is laid out on disk in the case of tuples and in the case when the data is indexed. It describes the I/O required for each. The two kinds of scans for SQL statements are the segment sacn (all tuples for a relation) and the second is an index scan. In teh case of a segment scan, all non-empty pages of a segmant are touched once. In the case of a index scan on an non-clustered, every page in the index is touched once but data pages may be touched more than once. This is avoided in a non-clustered index.
    Section 4 talks about the costs for a single relation access path.
    The estimated cost of a single relation access plan has two parts - I/O and CPU. The aspects which affect teh cost include whether an index can be used, if the index is clustered, and page numbers of the index. The assumption that the distribution of data is uniform is not always close to being true so the rough statistics that are used can lead to inaccurate estimates. The section contains some formulae for cost estimation for different kinds of predicates.
    Section 5 talks about the costs for access path selection for joins.
    They consider two techniques for joins on tables - nested loops method and merging scans. They then extend this discussion to N-way joins since they can be expressed as a sequence of 2-way joins. The section also deals with computation of the costs for the joins. An intereting point is that the formulas for coputing the cost for nested loops joins and merging scans joins are essentially the same.
    The next section deals with nested queries. It introduces the concept of a co-relation subquery which is a subquery that references tuples from a higher level query implying that it needs to be reevaluated for each tuple in the referenced query block.

     


  • Next message: Education Sales: "Academic Discounts on Adobe, Microsoft, & Symantec 101250"

    This archive was generated by hypermail 2.1.6 : Mon May 24 2004 - 19:44:13 PDT