CSE 544 Homework 4
Due date: Wednesday, May 24, 2006, in class.
1. [20 points] Consider a concurrency
control manager by timestamps. Below are several sequences of events,
including start events, where sti means
that transaction Ti starts and coi means Ti
commits. These sequences represent real time, and the timestamp-based
scheduler will allocate timestamps to transactions
in the order of their starts. In each case below tell what happens with
the last write request.
You have to choose between one of the following four possible answers:
(1) the request is accepted
(2) the request is ignored
(3) the transaction is delayed
(4) the transaction is rolled back
a) st1; st2; st3; r1(A); r2(B);r2(C); r3(B); co3; w2(B);w1(C)
b) st1; st2; r1(A), r2(B);w2(A); co2; w1(B)
c) st1; st3; st2; r1(A); r2(B); r3(B);w3(A);w2(B); co3; w1(A)
d) st1; r1(A); st2;w2(B); r2(A);w1(B)
2. [16 points] Consider three tables R(a,b,f), S(b,c,h), T(c,d,g).
a) Consider the following SQL query:
Select R.a, sum(T.d)
From R, S, T
Where R.b = S.b And S.c = T.c And R.f = 5 And T.g = 9
Group By R.a
Having sum(S.h) > 10
Draw a logical plan for the query. You may choose any plan as long as it is correct (i.e. no need to worry about efficiency).
b)
Write
a SQL query that is equivalent to the logical plan below (some non-IE
browsers may not display the plan correctly. You might need to either
use IE or open the homework Word file).
T(c,d,g) S(b,c,h) ∏cg ∏bc sa>5 R(a,b,f) d ∏fg S.c=T.c R.b=S.b ∏bf sg<9
3. [16 points] Consider the drinkers, beers, bars schema shown below:
Likes(drinker, beer); Frequents(drinker, bar); Serves(bar, beer)
Write logical plans that compute the following:
(a) All drinkers that frequent only bars that serve only beer they like. (Optimists)
(b) All drinkers that frequent only bars that serve none of the beers they like. (Flagellators)
4. [16 points] (Selectivity Estimation) Assume a table R(A,B,C,D) and consider the problem of estimating the size of the following query:
Select *
From R
Where (R.A= �a� and R.B = �b�) and R.C = �c�
You are lucky and have lots of statistics on R: you know its cardinality and have five histograms: three one dimensional histograms on R.A, R.B, R.C and two 2-dimensional histograms on (R.A, R.B) and on (R.B, R.C)
|R| = n (say, 1,000,000)
# of tuples that have R.A = �a� is n1
# of tuples that have R.B = �b� is n2
# of tuples that have R.C = �c� is n3
# of tuples that have R.A = �a� and R.B=�b� is n12
# of tuples that have R.B = �b� and R.C=�c� is n23
Estimate the size of the query Q in terms of the six
numbers
above. You might find it convenient to use the notations p1 =
n1/n, p2 = n2/n,
etc, which represent probabilities rather than tuples, and express the
final
answer as p*n, where p is the expression you need to find. Attributes A
and C are known to be conditionally independent given B. Now,
suppose we modify query Q by replacing the second "and" in the where
clause (next to R.C) with an "or" instead. Repeat the problem for
this modified query.
5. [32 points] We have n+1 relations, with the following schemas:
S(C1,C2,...,
Cn), R1(A1,B1),..., Rn(An,Bn)
Consider two join expressions:
E = R1 JoinB1=A2 R2 JoinB2=A3 . . . Rn-1 JoinBn-1=An Rn
F = S JoinC1=A1 R1 JoinC2=A2 R2 . . . Rn-1 JoinCn=An Rn
E is a chain join, and F is a star join. Assume that we restrict the joins plans only to join trees without cartesian products. We distinguish between a plan A Join B and a plan B Join A. Compute how many plans exist for E and for F, and how many subplans will be inspected by the dynamic programming algorithm. Assume only left-linear join trees are allowed. Your answer should consist of four functions in n.