**CSE 544 Homework 4**

Due date: Wednesday, May 24, 2006, in class.

**
1. [20 points]** Consider a concurrency
control manager by timestamps. Below are several sequences of events,
including start events, where st

You have to choose between one of the following four possible answers:

(1)
the request is **accepted**

(2)
the request is **ignored**

(3)
the transaction is **delayed**

(4)
the transaction is **rolled
back**

** **

a) st1; st2; st3; r1(A); r2(B);r2(C); r3(B); co3; w2(B);w1(C)

b) st1; st2; r1(A), r2(B);w2(A); co2; w1(B)

c) st1; st3; st2; r1(A); r2(B); r3(B);w3(A);w2(B); co3; w1(A)

d) st1; r1(A); st2;w2(B); r2(A);w1(B)

**
2. [16 points]** Consider three tables R(a,b,f), S(b,c,h), T(c,d,g).

a) Consider the following SQL query:

Select R.a, sum(T.d)

From R, S, T

Where R.b = S.b And S.c = T.c And R.f = 5 And T.g = 9

Group
By** **R.a

Having sum(S.h) > 10

Draw a logical plan for the query. You may choose any plan as long as it is correct (i.e. no need to worry about efficiency).

b)
Write
a SQL query that is equivalent to the logical plan below (some non-IE
browsers may not display the plan correctly. You might need to either
use IE or open the homework Word file).

T(c,d,g) S(b,c,h) ∏ ∏ s R(a,b,f) d ∏ S.c=T.c R.b=S.b ∏ s

_{cg}

_{bc}

_{a>5}

_{fg}

_{bf}

_{g<9}

** **

** **

** 3.
[16
points]** Consider the drinkers, beers, bars schema shown below:

Likes(drinker, beer); Frequents(drinker, bar); Serves(bar, beer)

Write logical plans that compute the following:

(a) All drinkers that frequent only bars that serve only beer they like. (Optimists)

(b) All drinkers that frequent only bars that serve none of the beers they like. (Flagellators)

** 4. [16
points] **(Selectivity Estimation) Assume a table R(A,B,C,D) and
consider the
problem of estimating the size of the following query:

Select *

From R

Where (R.A= ‘a’ and R.B = ‘b’) and R.C = ‘c’

You are lucky and have lots of statistics on R: you know its cardinality and have five histograms: three one dimensional histograms on R.A, R.B, R.C and two 2-dimensional histograms on (R.A, R.B) and on (R.B, R.C)

|R| = n (say, 1,000,000)

# of tuples that have R.A = ‘a’ is n1

# of tuples that have R.B = ‘b’ is n2

# of tuples that have R.C = ‘c’ is n3

# of tuples that have R.A = ‘a’ and R.B=’b’ is n12

# of tuples that have R.B = ‘b’ and R.C=’c’ is n23

Estimate the size of the query Q in terms of the six
numbers
above. You might find it convenient to use the notations p1 =
n1/n, p2 = n2/n,
etc, which represent probabilities rather than tuples, and express the
final
answer as p*n, where p is the expression you need to find. Attributes A
and C are known to be conditionally independent given B. Now,
suppose we modify query Q by replacing the second "and" in the where
clause (next to R.C) with an "or" instead. Repeat the problem for
this modified query.

** 5. [32
points] **We have n+1 relations, with the following schemas:

S(C_{1},C_{2},...,
C_{n}), R_{1}(A_{1},B_{1}),..., R_{n}(A_{n},B_{n})

Consider two join expressions:

E = R_{1}
Join_{B1=A2} R_{2}
Join_{B2=A3} . . . R_{n-1} Join_{Bn-1=An}
R_{n}

F = S Join_{C1}_{=A1}
R_{1}
Join_{C2=A2} R_{2} . . . R_{n-1}
Join_{Cn=An}
R_{n}

E is a chain join, and F is a star join. Assume that we restrict the joins plans only to join trees without cartesian products. We distinguish between a plan A Join B and a plan B Join A. Compute how many plans exist for E and for F, and how many subplans will be inspected by the dynamic programming algorithm. Assume only left-linear join trees are allowed. Your answer should consist of four functions in n.