Notes
Slide Show
Outline
1


DBMS Internals
Execution and Optimization
  • May 10th, 2004
2
Agenda
  • Questions on phase 2 of the project
  • Today: DBMS internals part 2 --
    • Query execution
    • Query optimization
  • Next week:
    • Thursday, not Monday.
    • Mostly Phil Bernstein on meta-data management.
3
                   Query Execution
4
Query Execution Plans
5
The Leaves of the Plan: Scans
  • Table scan: iterate through the records of the relation.
  • Index scan: go to the index, from there get the records in the file (when would this be better?)
  • Sorted scan: produce the relation in order. Implementation depends on relation size.
6
How do we combine Operations?
  • The iterator model. Each operation is implemented by 3 functions:
    • Open: sets up the data structures and performs initializations
    • GetNext: returns the the next tuple of the result.
    • Close: ends the operations. Cleans up the data structures.
  • Enables pipelining!
  • Contrast with data-driven materialize model.
  • Sometimes it’s the same (e.g., sorted scan).
7
Implementing Relational Operations
  • We will consider how to implement:
    • Selection  (     )    Selects a subset of rows from relation.
    • Projection  (     )   Deletes unwanted columns from relation.
    • Join  (        )  Allows us to combine two relations.
    • Set-difference  Tuples in reln. 1, but not in reln. 2.
    • Union  Tuples in reln. 1 and in reln. 2.
    • Aggregation  (SUM, MIN, etc.) and GROUP BY
8
Schema for Examples

  • Purchase:
    • Each tuple is 40 bytes long,  100 tuples per page, 1000 pages (i.e., 100,000 tuples, 4MB for the entire relation).
  • Person:
    • Each tuple is 50 bytes long,  80 tuples per page, 500 pages (i.e, 40,000 tuples, 2MB for the entire relation).
9
Simple Selections
  • Of the form


  • With no index, unsorted:  Must essentially scan the whole relation; cost is M (#pages in R).
  • With an index on selection attribute:  Use index to find qualifying data entries, then retrieve corresponding data records.  (Hash index useful only for equality selections.)
  • Result size estimation:
  •               (Size of R)  * reduction factor.
  •      More on this later.
10
Using an Index for Selections
  • Cost depends on #qualifying tuples, and clustering.
    • Cost of finding qualifying data entries (typically small) plus cost of retrieving records.
    • In example, assuming uniform distribution of phones, about 54% of tuples qualify (500 pages, 50000 tuples).  With a clustered index, cost is little more than 500 I/Os; if unclustered, up to 50000 I/Os!
  • Important refinement for unclustered indexes:
    • 1. Find sort the rid’s of the qualifying data entries.
    • 2. Fetch rids in order.  This ensures that each data page is looked at just once (though # of such pages likely to be higher than with clustering).
11
Two Approaches to General Selections
  • First approach: Find the most selective access path, retrieve tuples using it, and apply any remaining terms that don’t match the index:
    • Most selective access path: An index or file scan that we estimate will require the fewest page I/Os.
    • Consider city=“seattle AND phone<“543%” :
      •  A hash index on  city can be used; then, phone<“543%” must be checked for each retrieved tuple.
      •  Similarly, a b-tree index on phone could be used; city=“seattle” must then be checked.
12
Intersection of Rids
  • Second approach
    • Get sets of rids of data records using each matching index.
    • Then intersect these sets of rids.
    • Retrieve the records and apply any remaining terms.
13
Implementing Projection
  • Two parts:
  •     (1) remove unwanted attributes,
  •     (2) remove duplicates from the result.
  • Refinements to duplicate removal:
    • If an index on a relation contains all wanted attributes, then we can do an index-only scan.
    • If the index contains a subset of the wanted attributes, you can remove duplicates locally.
14
Equality Joins With One Join Column
  • R       S is a common operation. The cross product is too large. Hence, performing  R    S and then a selection is too inefficient.
  • Assume: M pages in R, pR tuples per page, N pages in S, pS tuples per page.
    • In our examples, R is Person and S is Purchase.
  • Cost metric:  # of I/Os.  We will ignore output costs.
15
Discussion
  • How would you implement join?
16
Simple Nested Loops Join
  • For each tuple in the outer relation R, we scan the entire inner relation S.
    • Cost:  M +  (pR * M) * N  =  1000 + 100*1000*500  I/Os:  140 hours!
  • Page-oriented Nested Loops join:  For each page of R, get each page of S, and write out matching pairs of tuples <r, s>, where r is in R-page and S is in S-page.
    • Cost:  M + M*N = 1000 + 1000*500 (1.4 hours)
17
Index Nested Loops Join
  • If there is an index on the join column of one relation (say S), can make it the inner.
    • Cost:  M + ( (M*pR) * cost of finding matching S tuples)
  • For each R tuple, cost of probing S index is about 1.2 for hash index, 2-4 for B+ tree.  Cost of then finding S tuples depends on clustering.
    • Clustered index:  1 I/O (typical), unclustered: up to 1 I/O per matching S tuple.
18
Examples of Index Nested Loops
  • Hash-index on name of Person (as inner):
    • Scan Purchase:  1000 page I/Os, 100*1000 tuples.
    • For each Person tuple:  1.2 I/Os to get data entry in index, plus 1 I/O to get (the exactly one) matching Person tuple.  Total:  220,000 I/Os. (36 minutes)
  • Hash-index on buyer of  Purchase (as inner):
    • Scan Person:  500 page I/Os, 80*500 tuples.
    • For each Person tuple:  1.2 I/Os to find index page with data entries, plus cost of retrieving matching Purchase tuples.  Assuming uniform distribution, 2.5 purchases per buyer (100,000 / 40,000).  Cost of retrieving them  is 1 or 2.5 I/Os depending on clustering.
19
Block Nested Loops Join
  • Use one page as an input buffer for scanning the inner S, one page as the output buffer, and use all remaining pages to hold ``block’’ of outer R.
    • For each matching tuple r in R-block, s in S-page, add      <r, s> to result.  Then read next R-block, scan S, etc.
20
Sort-Merge Join  (R     S)
  • Sort R and S on the join column, then scan them to do a ``merge’’  on the join column.
    • Advance scan of R until current R-tuple >= current S tuple, then advance scan of S until current S-tuple >= current R tuple; do this until current R tuple = current S tuple.
    • At this point, all R tuples with same value and all S tuples with same value match;  output <r, s> for all pairs of such tuples.
    • Then resume scanning R and S.
21
Cost of Sort-Merge Join
  • R is scanned once; each S group is scanned once per matching R tuple.
  • Cost:  M log M + N log N + (M+N)
  • But really, we can do it in 3(M+N) with some trickery.


    • The cost of scanning, M+N, could be M*N (unlikely!)
  • With 35, 100 or 300 buffer pages, both Person and Purchase can be sorted in 2 passes; total: 7500. (75 seconds).



22
Hash-Join
  • Partition both relations using hash fn h:  R tuples in partition i will only match S tuples in partition i.
23
Cost of Hash-Join
  • In partitioning phase, read+write both relations; 2(M+N). In matching phase, read both relations; M+N I/Os.
  • In our running example, this is a total of 4500 I/Os. (45 seconds!)
  • Sort-Merge Join vs. Hash Join:
    • Given a minimum amount of memory both have a cost of 3(M+N) I/Os.  Hash Join superior on this count if relation sizes differ greatly.  Also, Hash Join shown to be highly parallelizable.
    • Sort-Merge less sensitive to data skew; result is sorted.
24
Double Pipelined Join (Tukwila)
  • Hash Join
    • Partially pipelined: no output until inner read
    • Asymmetric (inner vs. outer) — optimization requires source behavior knowledge
25
Query Optimization
26
Discussion
  • How would you build a query optimizer?
27
Query Optimization Process
(simplified a bit)
  • Parse the SQL query into a logical tree:
    • identify distinct blocks (corresponding to nested sub-queries or views).
  • Query rewrite phase:
    • apply algebraic transformations to yield a cheaper plan.
    • Merge blocks and move predicates between blocks.
  • Optimize each block: join ordering.
  • Complete the optimization: select scheduling (pipelining strategy).
28
Building Blocks
  • Algebraic transformations (many and wacky).
  • Statistical model: estimating costs and sizes.
  • Finding the best join trees:
    • Bottom-up (dynamic programming): System-R
  • Newer architectures:
    • Starburst: rewrite and then tree find
    • Volcano: all at once, top-down.
29
Key Lessons in Optimization
  • There are many approaches and many details to consider in query optimization
    • Classic search/optimization problem!
    • Not completely solved yet!
  • Main points to take away are:
    • Algebraic rules and their use in transformations of queries.
    • Deciding on join ordering: System-R style (Selinger style) optimization.
    • Estimating cost of plans and sizes of intermediate results.
30
Operations (revisited)
  • Scan ([index], table, predicate):
    • Either index scan or table scan.
    • Try to push down sargable predicates.
  • Selection (filter)
  • Projection (always need to go to the data?)
  • Joins: nested loop (indexed), sort-merge, hash, outer join.
  • Grouping and aggregation (usually the last).
31
Algebraic Laws
  • Commutative and Associative Laws
    • R U S = S U R,  R U (S U T) = (R U S) U T
    • R ∩ S = S ∩ R,  R ∩ (S ∩ T) = (R ∩ S) ∩ T
    • R     S = S     R,  R     (S      T) = (R     S)     T
  • Distributive Laws
    • R     (S U T)  =  (R    S)  U  (R     T)
32
Algebraic Laws
  • Laws involving selection:
    •  s C AND C’(R) = s C(s C’(R)) = s C(R) ∩ s C’(R)
    •  s C OR C’(R) = s C(R) U s C’(R)
    •  s C (R      S) = s C (R)      S
      • When C involves only attributes of R
    •  s C (R – S) = s C (R) – S
    •  s C (R U S) = s C (R) U s C (S)
    •  s C (R ∩ S)  = s C (R) ∩ S
33
Algebraic Laws
  • Example:  R(A, B, C, D), S(E, F, G)
    •  s F=3 (R      S) =                                     ?
    •  s A=5 AND G=9 (R      S) =                         ?
34
Algebraic Laws
  • Laws involving projections
    •  PM(R     S) = PN(PP(R)      PQ(S))
      • Where N, P, Q are appropriate subsets of attributes of M
    •  PM(PN(R)) = PM,N(R)
  • Example R(A,B,C,D), S(E, F, G)
    •  PA,B,G(R      S) = P ? (P?(R)       P?(S))
35
Query Rewrites: Sub-queries
  • SELECT Emp.Name
  • FROM Emp
  • WHERE Emp.Age < 30
  •                AND   Emp.Dept# IN
  •                   (SELECT Dept.Dept#
  •                     FROM Dept
  •                     WHERE  Dept.Loc = “Seattle”
  •                      AND       Emp.Emp#=Dept.Mgr)
36
The Un-Nested Query
  • SELECT Emp.Name
  • FROM Emp, Dept
  • WHERE      Emp.Age < 30
  •          AND   Emp.Dept#=Dept.Dept#
  •          AND   Dept.Loc = “Seattle”
  •          AND    Emp.Emp#=Dept.Mgr
37
Converting Nested Queries
  • Select distinct x.name, x.maker
  • From product x
  • Where x.color= “blue”
  •   AND x.price >= ALL (Select y.price
  •                                         From  product y
  •                                         Where x.maker = y.maker
  •                                            AND y.color=“blue”)
38
Converting Nested Queries
  • Select distinct x.name, x.maker
  • From product x
  • Where x.color= “blue”
  •   AND x.price < SOME (Select y.price
  •                                         From  product y
  •                                         Where x.maker = y.maker
  •                                            AND y.color=“blue”)
39
Converting Nested Queries
  • Select distinct x.name, x.maker
  • From product x, product y
  • Where x.color= “blue” AND x.maker = y.maker
  •   AND y.color=“blue”  AND x.price < y.price
40
Converting Nested Queries
  • (Select x.name, x.maker
  •  From product x
  •  Where x.color = “blue”)


  • EXCEPT


  • (Select x.name, x.maker
  •  From product x, product y
  •  Where x.color= “blue” AND x.maker = y.maker
  •    AND y.color=“blue”  AND x.price < y.price)
41
Semi-Joins, Magic Sets
  • You can’t always un-nest sub-queries (it’s tricky).
  • But you can often use a semi-join to reduce the computation cost of the inner query.
  • A magic set is a superset of the possible bindings in the result of the sub-query.
  • Also called “sideways information passing”.
  • Great idea; reinvented every few years on a regular basis.
42
Rewrites: Magic Sets
  • Create View DepAvgSal AS
  •    (Select  E.did, Avg(E.sal) as avgsal
  •      From   Emp E
  •      Group By E.did)


  • Select E.eid, E.sal
  • From Emp E, Dept D, DepAvgSal V
  • Where E.did=D.did AND D.did=V.did
  •    And  E.age < 30 and D.budget > 100k
  •    And  E.sal > V.avgsal
43
Rewrites: SIPs
  • Select E.eid, E.sal
  • From Emp E, Dept D, DepAvgSal V
  • Where E.did=D.did AND D.did=V.did
  •    And  E.age < 30 and D.budget > 100k
  •    And  E.sal > V.avgsal
  • DepAvgsal needs to be evaluated only for departments where V.did IN
  •    Select E.did
  •    From   Emp E, Dept D
  •    Where  E.did=D.did
  •       And   E.age < 30  and D.budget > 100K
44
Supporting Views
  • 1.  Create View PartialResult as
  •      (Select E.eid, E.sal, E.did
  •       From   Emp E, Dept D
  •       Where  E.did=D.did
  •          And   E.age < 30  and D.budget > 100K)
  • Create View Filter AS
  •         Select DISTINCT P.did FROM PartialResult P.
  • Create View LimitedAvgSal as
  •         (Select F.did Avg(E.Sal) as avgSal
  •         From Emp E, Filter F
  •         Where E.did=F.did
  •         Group By F.did)
45
And Finally…
  • Transformed query:


  •   Select  P.eid, P.sal
  •   From  PartialResult P,  LimitedAvgSal V
  •   Where P.did=V.did
  •       And  P.sal > V.avgsal
46
Rewrites: Group By and Join
  • Schema:
    • Product (pid, unitprice,…)
    • Sales(tid, date, store, pid, units)
  • Trees:
47
Rewrites:Operation Introduction
  • Schema: (pid determines cid)
    • Category (pid, cid, details)
    • Sales(tid, date, store, pid, amount)
  • Trees:
48
Schema for Some Examples
  • Reserves:
    • Each tuple is 40 bytes long,  100 tuples per page, 1000 pages (4000 tuples)
  • Sailors:
    • Each tuple is 50 bytes long,  80 tuples per page, 500 pages (4000 tuples).
49
Query Rewriting: Predicate Pushdown
50
Query Rewrites: Predicate Pushdown (through grouping)
51
Query Rewrite:
Predicate Movearound
52
Query Rewrite: Predicate Movearound
53
Query Rewrite: Predicate Movearound
54
Query Rewrite Summary
  • The optimizer can use any semantically correct rule to transform one query to another.
  • Rules try to:
    • move constraints between blocks (because each will be optimized separately)
    • Unnest blocks
  • Especially important in decision support applications where queries are very complex.
  • In a few minutes of thought, you’ll come up with your own rewrite. Some query, somewhere, will benefit from it.
  • Theorems?


55
Cost Estimation
  • For each plan considered, must estimate cost:
    • Must estimate cost of each operation in plan tree.
      • Depends on input cardinalities.
    • Must estimate size of result for each operation in tree!
      • Use information about the input relations.
      • For selections and joins, assume independence of predicates.
  • We’ll discuss the System R cost estimation approach.
    • Very inexact, but works ok in practice.
    • More sophisticated techniques known now.
56
Statistics and Catalogs
  • Need information about the relations and indexes involved.  Catalogs typically contain at least:
    • # tuples (NTuples) and # pages (NPages) for each relation.
    • # distinct key values (NKeys) and NPages for each index.
    • Index height, low/high key values (Low/High) for each tree index.
  • Catalogs updated periodically.
    • Updating whenever data changes is too expensive; lots of approximation anyway, so slight inconsistency ok.
  • More detailed information (e.g., histograms of the values in some field) are sometimes stored.
57
Size Estimation and Reduction Factors
  • Consider a query block:
  • Maximum # tuples in result is the product of the cardinalities of relations in the FROM clause.
  • Reduction factor (RF) associated with each term reflects the impact of the term in reducing result size.  Result cardinality = Max # tuples  *  product of all RF’s.
    • Implicit assumption that terms are independent!
    • Term col=value has RF 1/NKeys(I), given index I on col
    • Term col1=col2 has RF 1/MAX(NKeys(I1), NKeys(I2))
    • Term col>value has RF (High(I)-value)/(High(I)-Low(I))
58
Histograms
  • Key to obtaining good cost and size estimates.
  • Come in several flavors:
    • Equi-depth
    • Equi-width
  • Which is better?
  • Compressed histograms: special treatment of frequent values.
59
Histograms
  • Statistics on data maintained by the RDBMS
  • Makes size estimation much more accurate (hence, cost estimations are more accurate)


60
Histograms
  • Employee(ssn, name, salary, phone)
  • Maintain a histogram on salary:






  • T(Employee) = 25000, but now we know the distribution


61
Histograms
  • Ranks(rankName, salary)
  • Estimate the size of Employee          Ranks






62
Histograms
  • Assume:
    • V(Employee, Salary) = 200
    • V(Ranks, Salary) = 250
  • Then T(Employee            Ranks) =
    = Si=1,6 Ti Ti’ / 250
    = (200x8 + 800x20 + 5000x40 +
         12000x80 + 6500x100 + 500x2)/250
    = ….


63
Plans for Single-Relation Queries
(Prep for Join ordering)
  • Task: create a query execution plan for a single Select-project-group-by block.
  • Key idea: consider each possible access path to the relevant tuples of the relation. Choose the cheapest one.
  • The different operations are essentially carried out together (e.g., if an index is used for a selection, projection is done for each retrieved tuple, and the resulting tuples are pipelined into the aggregate computation).
64
Example
  • If we have an Index on rating:
    • (1/NKeys(I)) * NTuples(R) = (1/10) * 40000 tuples retrieved.
    • Clustered index: (1/NKeys(I)) * (NPages(I)+NPages(R)) = (1/10) * (50+500) pages are retrieved (= 55).
    • Unclustered index: (1/NKeys(I)) * (NPages(I)+NTuples(R)) = (1/10) * (50+40000) pages are retrieved.
  • If we have an index on sid:
    • Would have to retrieve all tuples/pages.  With a clustered index, the cost is 50+500.
  • Doing a file scan: we retrieve all file pages (500).
65
Determining Join Ordering
  • R1       R2        ….       Rn
  • Join tree:





  • A join tree represents a plan. An optimizer needs to inspect many (all ?) join trees
66
Types of Join Trees
  • Left deep:
67
Types of Join Trees
  • Bushy:


68
Types of Join Trees
  • Right deep:


69
Problem
  • Given: a query  R1     R2      …      Rn
  • Assume we have a function cost() that gives us the cost of every join tree
  • Find the best join tree for the query
70
Dynamic Programming
  • Idea: for each subset of {R1, …, Rn}, compute the best plan for that subset
  • In increasing order of set cardinality:
    • Step 1: for {R1}, {R2}, …, {Rn}
    • Step 2: for {R1,R2}, {R1,R3}, …, {Rn-1, Rn}
    • …
    • Step n: for {R1, …, Rn}
  • A subset of {R1, …, Rn} is also called a subquery
71
Dynamic Programming
  • For each subquery Q ⊆ {R1, …, Rn} compute the following:
    • Size(Q)
    • A best plan for Q: Plan(Q)
    • The cost of that plan: Cost(Q)
72
Dynamic Programming
  • Step 1: For each {Ri} do:
    • Size({Ri}) = B(Ri)
    • Plan({Ri}) = Ri
    • Cost({Ri}) = (cost of scanning Ri)
73
Dynamic Programming
  • Step i: For each Q ⊆ {R1, …, Rn} of cardinality i do:
    • Compute Size(Q)    (later…)
    • For every pair of subqueries Q’, Q’’
      s.t. Q = Q’ U Q’’
      compute cost(Plan(Q’)       Plan(Q’’))
    • Cost(Q) = the smallest such cost
    • Plan(Q) = the corresponding plan
74
Dynamic Programming
  • Return Plan({R1, …, Rn})
75
Dynamic Programming
  • Summary: computes optimal plans for subqueries:
    • Step 1: {R1},  {R2}, …, {Rn}
    • Step 2:  {R1, R2}, {R1, R3}, …, {Rn-1, Rn}
    • …
    • Step n: {R1, …, Rn}
  • We used naïve size/cost estimations
  • In practice:
    • more realistic size/cost estimations (next)
    • heuristics for Reducing the Search Space
      • Restrict to left linear trees
      • Restrict to trees “without cartesian product”
    • need more than just one plan for each subquery:
      • “interesting orders”