1. Consider the following B+ Tree:
Show the state of the B+ tree after inserting elements with key values: 51, 52, and 16.
Starting from the original tree above, show the state of the B+ tree after removing 65 and then 60. If a node has too few keys, you can only move keys between siblings (another node with the same parent). If no such node has enough keys, you need to merge nodes.
2.
Which of the following identities
are correct? In each case, either answer YES, or answer NO and give a
counterexample:
a. δ(δ(R)) = δ(R)
b. ΠL(ΠL(R))=
ΠL(R), where L = a set of attributes
c. ΠK(ΠL(R))= ΠK,L(R), where K, L = sets of attributes
d. σC1(σC2(R))
= σC1 and C2 (R), where C1 = a condition and C2 = a condition
e. ΥL, agg(A)(ΥL,
agg(A)(R)) = ΥL, agg(A)(R), where L = a set of group-by attributes, agg is an
aggregate operator, and A is an attribute
3.
Consider the following
relational schema, representing a database of documents:
Doc(did, docTitle)
Occurs(did, wid)
Word(wid, word)
For the following query, show an equivalent relational algebra plan:
FROM
Doc x, Occurs y, Word z
WHERE
x.did = y.did and y.wid = z.wid
AND wid > 100
GROUP
BY z.wid, z.word
4. (based on Exercise 15.6.2,
pp. 745 in the textbook) Suppose B(R) = 10,000 and T(R) = 500,000. The table is
clustered. Let there be an index
on R.a, and let V(R,a) = k, for some number k. Give the cost of σa=3(R) as a function of k,
under the following circumstances. You may neglect the disk I/OÕs needed to
access the index itself.
a. The index is clustered (the book says the index is clustering; it is the same thing)
b. The index is not clustered
c. No index is used.
5. Consider the two queries below:
-- q1
select c.mid, COUNT(*)
from Movie m, Casts c
where m.id = c.mid
group by c.mid
-- q2
select m.name, COUNT(*)
from Movie m, Casts c
where m.id = c.mid
group by m.name
Run these two queries on the
IMDB database on SQL Server: one is about twice faster than the other. Examine the query plans for these two
queries, and explain the difference in running time.