# Homework  3

## What you will learn: understand the role of the relational algebra; learn to explain why queries are slow or fast.

1. Which of the following identities are correct? In each case, either answer YES, or answer NO and give a counterexample:

a.  δ(δ(R)) = δ(R)

b.   ΠLL(R))= ΠL(R),   where L = a set of attributes

c.  ΠKL(R))= ΠK,L(R),   where K, L = sets of attributes

d.   σCC(R)) = σC(R),   where C = a condition

e.   ΥL, agg(A)L, agg(A)(R)) = ΥL, agg(A)(R),   where L = a set of group-by attributes, agg is an aggregate operator, and A is an attribute

2. Consider the following relational schema, representing a database of documents and a set of keywords:

Doc(did, docTitle)
Occurs(did, wid)
Word(wid, word)
Keywords(word)

For each of the queries below, show an equivalent relational algebra plan:

a.       SELECT z.word, count(*)
FROM Doc x, Occurs y, Word z
WHERE x.did = y.did and y.wid = z.wid
GROUP BY z.wid, z.word

b.       SELECT x.did, x.docTitle
FROM Doc x
WHERE not exists (     SELECT *
FROM Keywords u
WHERE not exists (     SELECT *
FROM Occurs y, Word z
WHERE x.did=y.did and y.wid=z.wid
and z.word = u.word))

3. (based on Exercise 15.6.2, pp. 745 in the textbook) Suppose B(R) = 10,000 and T(R) = 500,000. The table is clustered.  Let there be an index on R.a, and let V(R,a) = k, for some number k. Give the cost of σa=0(R) as a function of k, under the following circumstances. You may neglect the disk I/OÕs needed to access the index itself.

a.  The index is clustered (the book says the index is clustering; it is the same thing)

b.  The index is not clustered

c.  No index is used.

4. (Inspired by a posting by one of the students in class) Consider the two queries below:
-- q1
select c.mid, COUNT(*)
from Movie m, Casts c
where m.id = c.mid
group by c.mid

-- q2
select m.name, COUNT(*)
from Movie m, Casts c
where m.id = c.mid
group by m.name

a. Run these two queries on the IMDB database on SQL Server: one is about twice faster than the other.  Examine the query plans for these two queries, and explain the difference in running time.

b. The IMDB_FOREIGNKEYS database is identical to IMDB but has foreign keys declared in Casts. Examine the plans for the two queries on the IMDB_FOREIGNKEYS database, and explain which one changed from the IMDB database and why.