1.Problem 1
a.
select x.url, count(*)
from webpage x
group by x.url
b.
select x.url, y.url
from webpage x, webpage y
where x.word = y.word and x.url != y.url
group by x.url, y.url
having count(*) > 1000
c.
select distinct x.url
from webpage x
where x.url not in (
select y.url
from webpage y
where y.word not in (select z.word
from dictionary z
where z.language = "French"))
2.
q1: partitioned hash-join. Cost: 3B(R) + 3B(S) = 60000
q2:
t1 = groupby(S,B,sum);
Use main memory hash; cost = B(S)
t2 = R join_B
t1;
Use main memory hash; cost = B(R)
t3 =project_A,sum t2;
pipleline from previous operation;
answer = groupby(t3, A, sum); main memory hash; cost = B(t3) = B(R)
total cost: B(S) + 3B(R) = 40000
Notice that B is a key in
t1(B,sum), hence t2 = R join_B t1 has as many tuples as R, and so has
t3. Since t3's schema is essentially the same as that of R, we have
B(R) = B(t2)
3. a
i yes
ii yes
iii no:
R A | B | C
-----------
1 | 2 | 3
S A | B | C
-----------
1 | 2 | 3
1 | 4 | 5
iv yes
v no: take R as above and:
T C | D
------
3 | 4
3 | 5
4.
Subquery Size Cost Optimal plan
================================================================
RS 1k 0
----------------------------------------------------------------
RT 3k 0
----------------------------------------------------------------
RU 2k 0
----------------------------------------------------------------
ST 1.2k 0
----------------------------------------------------------------
SU 0.8k 0
----------------------------------------------------------------
TU 2.4k 0
----------------------------------------------------------------
RST 6k 1k (RS)T
----------------------------------------------------------------
RSU 4k 0.8k (SU)R
----------------------------------------------------------------
RTU 12k 2k (RU)T
----------------------------------------------------------------
STU 4.8k 0.8k (SU)T
----------------------------------------------------------------
RSTU 24k 3.2k (RU)(ST)
----------------------------------------------------------------
5.
select distinct x.e, x.v
from Log x, Log y
where x.t = y.t and x.e > 0
and y.e = -1 // y is <START T>
and not exists (select *
from Log z
where x.t = z.t
and z.e < -1) // z is <COMMIT T> or <ABORT T>
and not exists (select *
from Log u
where u.e > 0
and u.n < x.n)