Solution to Homework 4

Michael Gubanov, mgubanov@cs

1.     Query Execution

a.

i.      Query 1

select region, count(d.did) as DCount

from depts as d

group by region

Estimated Running Time:    1.77 seconds

Actual Running Time:              less than a second[1]

ii.      Query 2

select region, count(d.did) as DCount

from depts as d

join depts_prods as dp on d.did = dp.did

group by region

Estimated Running Time:    16.7 seconds

Actual Running Time:              4 seconds

Join Method:               Hash Join

b.

i.      Query 3

select d.region, avg(p.price) as AvgPrice

`                         from prods as p `
`                                  join depts_prods as dp on dp.pid =  p.pid`
`                                  join depts as d on d.did = dp.did`

group by d.region

Output:

region                          AvgPrice

------------------------------ --------------------------

Central                        599.92060525118245

Mountain                       498.32875574601167

South                          400.95740872043115

NorthWest                      200.12154503004854

East                           700.28072857480288

SouthWest                      298.5248630158319

(7 row(s) affected)

` `
`              Estimated Running Time:       41.4 seconds`
`     Actual Running Time: 7 seconds`
`              Join Method:         Both Hash Join`

c.

The right aggregate operator is redundant.  Query optimizer introduces it to significantly reduce the number of tuples to join with dept table. Should that optimization be omitted the output wont have changed. However, performance would have decreased dramatically.

gA, agg(D)(R(A,B) B=C S(C,D)) =

gA, agg(D)(R(A,B) B=C (gA, agg(D)S(C,D)))

` `

d.

i.      Query 1

`                          select d.region, dp.did`
`                                   from depts as d`
`                                   join depts_prods as dp on d.did =  dp.did`

Estimated Running Time:    21.7 seconds

Actual Running Time:              20 seconds

Estimated Tuples:          2,643,293

Actual Tuples:                    3,050,663

` `

ii.      Query 2

`                select count(*)`
`                                   from depts as d`
`                                   join depts_prods as dp on d.did =  dp.did`
` `

Estimated Running Time:    16.1 seconds

Actual Running Time:              2 seconds

The second query performs better because of the aggregation on depts_prods being used by optimizer before join. This aggregation is useful to reduce the number of tuples for a join which significantly affects performance. A temporary table contains the number of tuples in depts_prods for each unique did which is later used to join with depts and to perform a final count.

This optimization could not be applied to the first query since it outputs d.region, dp.did for each tuple and therefore needs each tuple of both tables to be joined first before the output could be computed. That results in a large join since both tables are large.

However, should the first query contain distinct or any other operation which would allow optimizer to reduce a number of rows for a join it would result in a similar optimization technique to be applied.

e.

i.      Query 1

select d.region, avg(p.price) as AvgPrice

`                         from prods as p `
`                                  join depts_prods as dp on dp.pid = p.pid`
`                                  join depts as d on d.did = dp.did`

group by d.region

Output:

region                          AvgPrice

------------------------------ --------------------------

Central                        599.92060525118245

Mountain                       498.32875574601167

South                          400.95740872043115

NorthWest                      200.12154503004854

East                           700.28072857480288

SouthWest                      298.5248630158319

(7 row(s) affected)

` `
`              Estimated Running Time:       40.9 seconds`
`     Actual Running Time: 7-30 seconds`
`              Join Method:         Both Hash Join`
` `

ii.      Query 2

`                          select d.region, dp.did`
`                                   from depts as d`
`                                   join depts_prods as dp on d.did =  dp.did`

Estimated Running Time:    19.7 seconds

Actual Running Time:              17 seconds

Estimated Tuples:          2,567,569

Actual Tuples:                    3,050,663

` `

iii.      Query 3

`                select count(*)`
`                                   from depts as d`
`                                  join depts_prods as dp on d.did =  dp.did`

Estimated Running Time:    8.36 seconds

Actual Running Time:              less than a second

Estimated cost for Query 3 is almost a half less as well as actual execution time.

2.     Query Optimization

a.     Block Nested Loop join

i.      V(R,A) = 10000

Cost = B(R)+(B(s(R))*B(S)) = 10,000 + 10,000 = 20,000

ii.      V(R,A) = 1

a.     M = 5002

Cost = B(R)+(B(s(R))*B(S)/(M-2)) =

= 10,000+((10,000*10,000)/(5002–2)) = 30,000

b.     M = 101

Cost = B(R)+(B(s(R))*B(S)/(M-2)) =

= 10,000+((10,000*10,000)/(101–2))= 1,020,101

b.     Partitioned Hash join

i.      V(R,A) = 1

Cost = B(R)+(2*B(s(R))+3*B(S)) = 10,000+(2*10,000+3*10,000)=60,000

ii.      V(R,A) = 10,000

Cost = B(R)+(2*B(s(R))+3*B(S)) = 10,000+(2*1+3*10,000)=40,002

c.     Index join

i.      V(R,A) = 1

Cost = B(R)+4*T(s(R))*B(S)/V(S,C)=10,000+(10,000*4*10,000/10,000)=50,000

ii.      V(R,A) = 10,000

Cost = B(R)+4*T(s(R))*B(S)/V(S,C)=10,000+(1*4*10,000/10,000)=10,004

d.     Recommendations

 V(R,A) = 1 V(R,A) = 10,000 M=101 Index join Index join M=5002 Block Nested Loop join Index join

3.     Recovery

a.

i.      <START CKPT T1>

ii.      <START CKPT (T2, T3)>

iii.      <START CKPT (T4, T5)>

b.     X4 = 6, X5 = 7

c.     Recovery Engine needs to read up to <START CKPT (T2, T3)>

[1] Estimated and actual values may differ depending on SQL Server version, machine being used,  SQL Server current load and statistics