Solution to Homework 4
Michael Gubanov, mgubanov@cs
1. Query Execution
a.
i. Query 1
select region, count(d.did) as DCount
from depts as d
group by region
Estimated Running Time: 1.77 seconds
Actual Running Time: less than a second[1]
ii. Query 2
select region, count(d.did) as DCount
from depts as d
join depts_prods as dp on d.did = dp.did
group by region
Estimated Running Time: 16.7 seconds
Actual Running Time: 4 seconds
Join Method: Hash Join
b.
i. Query 3
select d.region, avg(p.price) as AvgPrice
from prods as p
join depts_prods as dp on dp.pid = p.pid
join depts as d on d.did = dp.did
group by d.region
Output:
region AvgPrice
------------------------------ --------------------------
Central 599.92060525118245
Mountain 498.32875574601167
South 400.95740872043115
NorthWest 200.12154503004854
East 700.28072857480288
Canada 799.53329222268451
SouthWest 298.5248630158319
(7 row(s) affected)
Estimated Running Time: 41.4 seconds
Actual Running Time: 7 seconds
Join Method: Both Hash Join
c.
The right aggregate operator is redundant. Query optimizer introduces it to significantly reduce the number of tuples to join with dept table. Should that optimization be omitted the output wont have changed. However, performance would have decreased dramatically.
gA, agg(D)(R(A,B) ⋈B=C S(C,D)) =
gA, agg(D)(R(A,B) ⋈B=C (gA, agg(D)S(C,D)))
d.
i. Query 1
select d.region, dp.did
from depts as d
join depts_prods as dp on d.did = dp.did
Estimated Running Time: 21.7 seconds
Actual Running Time: 20 seconds
Estimated Tuples: 2,643,293
Actual Tuples: 3,050,663
ii. Query 2
select count(*)
from depts as d
join depts_prods as dp on d.did = dp.did
Estimated Running Time: 16.1 seconds
Actual Running Time: 2 seconds
The second query performs better because of the aggregation on depts_prods being used by optimizer before join. This aggregation is useful to reduce the number of tuples for a join which significantly affects performance. A temporary table contains the number of tuples in depts_prods for each unique did which is later used to join with depts and to perform a final count.
This optimization could not be applied to the first query since it outputs d.region, dp.did for each tuple and therefore needs each tuple of both tables to be joined first before the output could be computed. That results in a large join since both tables are large.
However, should the first query contain distinct or any other operation which would allow optimizer to reduce a number of rows for a join it would result in a similar optimization technique to be applied.
e.
i. Query 1
select d.region, avg(p.price) as AvgPrice
from prods as p
join depts_prods as dp on dp.pid = p.pid
join depts as d on d.did = dp.did
group by d.region
Output:
region AvgPrice
------------------------------ --------------------------
Central 599.92060525118245
Mountain 498.32875574601167
South 400.95740872043115
NorthWest 200.12154503004854
East 700.28072857480288
Canada 799.53329222268451
SouthWest 298.5248630158319
(7 row(s) affected)
Estimated Running Time: 40.9 seconds
Actual Running Time: 7-30 seconds
Join Method: Both Hash Join
ii. Query 2
select d.region, dp.did
from depts as d
join depts_prods as dp on d.did = dp.did
Estimated Running Time: 19.7 seconds
Actual Running Time: 17 seconds
Estimated Tuples: 2,567,569
Actual Tuples: 3,050,663
iii. Query 3
select count(*)
from depts as d
join depts_prods as dp on d.did = dp.did
Estimated Running Time: 8.36 seconds
Actual Running Time: less than a second
Estimated cost for Query 3 is almost a half less as well as actual execution time.
2. Query Optimization
a. Block Nested Loop join
i. V(R,A) = 10000
Cost = B(R)+(B(s(R))*B(S)) = 10,000 + 10,000 = 20,000
ii. V(R,A) = 1
a. M = 5002
Cost = B(R)+(B(s(R))*B(S)/(M-2)) =
= 10,000+((10,000*10,000)/(5002–2)) = 30,000
b. M = 101
Cost = B(R)+(B(s(R))*B(S)/(M-2)) =
= 10,000+((10,000*10,000)/(101–2))= 1,020,101
b. Partitioned Hash join
i. V(R,A) = 1
Cost = B(R)+(2*B(s(R))+3*B(S)) = 10,000+(2*10,000+3*10,000)=60,000
ii. V(R,A) = 10,000
Cost = B(R)+(2*B(s(R))+3*B(S)) = 10,000+(2*1+3*10,000)=40,002
c. Index join
i. V(R,A) = 1
Cost = B(R)+4*T(s(R))*B(S)/V(S,C)=10,000+(10,000*4*10,000/10,000)=50,000
ii. V(R,A) = 10,000
Cost = B(R)+4*T(s(R))*B(S)/V(S,C)=10,000+(1*4*10,000/10,000)=10,004
d. Recommendations
|
V(R,A) = 1
|
V(R,A) = 10,000 |
M=101 |
Index join |
Index join |
M=5002 |
Block Nested Loop join |
Index join |
3. Recovery
a.
i. <START CKPT T1>
ii. <START CKPT (T2, T3)>
iii. <START CKPT (T4, T5)>
b. X4 = 6, X5 = 7
c. Recovery Engine needs to read up to <START CKPT (T2, T3)>
[1] Estimated and actual values may differ depending on SQL Server version, machine being used, SQL Server current load and statistics