CSE 444 Final Solutions
Autumn 2002


Total number of points: 100

1.Problem 1

 

a.

 

select x.url, count(*)

from webpage x

group by x.url

 

 

b.

 

select x.url, y.url

from webpage x, webpage y

where x.word = y.word and x.url != y.url

group by x.url, y.url

having count(*) > 1000

 

c.

 

select distinct x.url

from webpage x

where x.url not in (

                select y.url

                from webpage y

                where y.word not in (select z.word

                                                 from dictionary z

                                                 where z.language = "French"))

 

 

2.

 

q1:  partitioned hash-join.  Cost: 3B(R) + 3B(S) = 60000

 

q2: 
t1 = groupby(S,B,sum);
  Use main memory hash; cost = B(S)

    
t2 = R join_B t1;
  Use main memory hash; cost = B(R)

    
t3 =project_A,sum t2;
     pipleline from previous operation;

    need to materialize t3; cost = B(t3) = B(R)

answer = groupby(t3, A, sum); main memory hash; cost = B(t3) = B(R)

 

total cost: B(S) + 3B(R) = 40000

 

Notice that B is a key in

t1(B,sum), hence t2 = R join_B t1 has as many tuples as R, and so has

t3. Since t3's schema is essentially the same as that of R, we have

B(R) = B(t2)

 

 

3. a

 

i yes

 

ii yes

 

iii no:

 

R     A | B | C

     -----------

      1 | 2 | 3

 

S     A | B | C

     -----------

      1 | 2 | 3

      1 | 4 | 5

 

 

iv yes

 

v no: take R as above and:

 

T     C | D

     ------

      3 | 4

      3 | 5

 

 

4.

 

Subquery          Size                  Cost                 Optimal plan

================================================================

RS                   1k                    0

----------------------------------------------------------------

RT                   3k                    0

----------------------------------------------------------------

RU                   2k                    0

----------------------------------------------------------------

ST                    1.2k                 0

----------------------------------------------------------------

SU                   0.8k                 0

----------------------------------------------------------------

TU                   2.4k                 0

----------------------------------------------------------------

RST                 6k                    1k                    (RS)T

----------------------------------------------------------------

RSU                 4k                    0.8k                 (SU)R

----------------------------------------------------------------

RTU                 12k                  2k                    (RU)T

----------------------------------------------------------------

STU                 4.8k                 0.8k                 (SU)T

----------------------------------------------------------------

RSTU              24k                  3.2k                 (RU)(ST)

----------------------------------------------------------------

 

 

5.

 

select distinct x.e, x.v

from   Log x, Log y

where  x.t = y.t and x.e > 0

   and y.e = -1    // y is <START T>

   and not exists (select *

                   from Log z

                   where x.t = z.t

                     and z.e < -1)  // z is <COMMIT T> or <ABORT T>

   and not exists (select *

                   from Log u

                   where u.e > 0

                     and u.n < x.n)