1. Which of the following
identities are correct? In each case, either answer YES, or answer NO and give
a counterexample:
a. δ(δ(R)) = δ(R)
b. ΠL(ΠL(R))= ΠL(R),
where L = a set of attributes
c. ΠK(ΠL(R))=
ΠK,L(R), where K, L =
sets of attributes
d. σC1(σC2(R)) =
σC1 and C2(R), where C1,
C2 are conditions
e. ΥL, agg(A)(ΥL, agg(A)(R))
= ΥL, agg(A)(R), where
L = a set of group-by attributes, agg is an aggregate operator, and A is an attribute
2. Consider the following relational
schema, representing a database of documents:
Doc(did, docTitle)
Occurs(did, wid)
Word(wid, word)
For the following query, show an equivalent relational algebra plan:
SELECT
z.word, count(*)
FROM
Doc x, Occurs y, Word z
WHERE
x.did = y.did and y.wid = z.wid
GROUP
BY z.wid, z.word
3. Suppose B(R) = 10,000 and
T(R) = 500,000. The table is clustered.
Let there be an index on R.a, and let V(R,a) = k, for some number k.
Give the cost of σa=3(R) as a function of k, under the following
circumstances. You may neglect the disk I/O’s needed to access the index
itself.
a. The index is clustered
b. The index is not clustered
c. No index is used.
4. Consider the two queries
below:
-- q1
select c.mid, COUNT(*)
from Movie m, Casts c
where m.id = c.mid
group by c.mid
-- q2
select m.name, COUNT(*)
from Movie m, Casts c
where m.id = c.mid
group by m.name
a. Run these two queries on the
IMDB database on SQL Server: one is about twice faster than the other. Examine the query plans for these two
queries, and explain the difference in running time.
b. The Casts_FK table is
identical to Casts but has foreign keys declared to the Movie and Actor tables.
Examine the plans for the two queries before and after replacing Casts with
Casts_FK. Which one changed and why.