CSE 544 Homework 4

Due date: Wednesday, May 24, 2006, in class.

1. [20 points]
Consider a concurrency control manager by timestamps. Below are several sequences of events, including start events, where sti     means that transaction Ti starts and coi means Ti commits. These sequences represent real time, and the timestamp-based scheduler will allocate     timestamps to transactions in the order of their starts. In each case below tell what happens with the last write request.

You have to choose between one of the following four possible answers:

(1)  the request is accepted

(2)  the request is ignored

(3)  the transaction is delayed

(4)  the transaction is rolled back

a)    st1; st2; st3; r1(A); r2(B);r2(C); r3(B); co3; w2(B);w1(C)

b)   st1; st2; r1(A), r2(B);w2(A); co2; w1(B)

c)    st1; st3; st2; r1(A); r2(B); r3(B);w3(A);w2(B); co3; w1(A)

d)   st1; r1(A); st2;w2(B); r2(A);w1(B)

2. [16 points]
Consider three tables R(a,b,f), S(b,c,h), T(c,d,g).

a)    Consider the following SQL query:

Select R.a, sum(T.d)

From R, S, T

Where R.b = S.b And S.c = T.c And R.f = 5 And T.g = 9

Group By R.a

Having sum(S.h) > 10

Draw a logical plan for the query. You may choose any plan as long as it is correct (i.e. no need to worry about efficiency).

b)   Write a SQL query that is equivalent to the logical plan below (some non-IE browsers may not display the plan correctly. You might need to either use IE or open the homework Word file).

 T(c,d,g)

 S(b,c,h)

 ∏cg

 ∏bc

 sa>5

 R(a,b,f)

 d

 ∏fg

 S.c=T.c

 R.b=S.b

 ∏bf

 sg<9

3. [16 points] Consider the drinkers, beers, bars schema shown below:

Likes(drinker, beer); Frequents(drinker, bar); Serves(bar, beer)

Write logical plans that compute the following:

(a) All drinkers that frequent only bars that serve only beer they like. (Optimists)

(b) All drinkers that frequent only bars that serve none of the beers they like. (Flagellators)

4. [16 points] (Selectivity Estimation) Assume a table R(A,B,C,D) and consider the problem of estimating the size of the following query:

Select *

From R

Where (R.A= ‘a’ and R.B = ‘b’) and R.C = ‘c’

You are lucky and have lots of statistics on R: you know its cardinality and have five histograms: three one dimensional histograms on R.A, R.B, R.C and two 2-dimensional histograms on (R.A, R.B) and on (R.B, R.C)

|R| = n (say, 1,000,000)

# of tuples that have R.A = ‘a’ is n1

# of tuples that have R.B = ‘b’ is n2

# of tuples that have R.C = ‘c’ is n3

# of tuples that have R.A = ‘a’ and R.B=’b’ is n12

# of tuples that have R.B = ‘b’ and R.C=’c’ is n23

Estimate the size of the query Q in terms of the six numbers above.  You might find it convenient to use the notations p1 = n1/n, p2 = n2/n, etc, which represent probabilities rather than tuples, and express the final answer as p*n, where p is the expression you need to find. Attributes A and C are known to be conditionally independent given B.  Now, suppose we modify query Q by replacing the second "and" in the where clause  (next to R.C) with an "or" instead. Repeat the problem for this modified query.

5. [32 points]  We have n+1 relations, with the following schemas:

S(C1,C2,..., Cn), R1(A1,B1),...,  Rn(An,Bn)
Consider two join expressions:

E = R1  JoinB1=A2  R2 JoinB2=A3  . . . Rn-1 JoinBn-1=An   Rn

F = S  JoinC1=A1  R1 JoinC2=A2  R2 . . .  Rn-1 JoinCn=An  Rn

E is a chain join, and F is a star join. Assume that we restrict the joins plans only to join trees without cartesian products. We distinguish between a plan A Join B   and a plan B Join A.  Compute how many plans exist for E and for F, and how many subplans will be inspected by the dynamic programming algorithm. Assume only left-linear join trees are allowed. Your answer should consist of four functions in n.