CSE 594, Spring 2002

Assignment #4: due Thursday May 30^th

1. [20 points] Consider the B+ tree of order d=2 (i.e. each internal node has between 2 and 4 keys) represented here or below. Show the tree after each of the following operations:

a. Insert the key 70.

b. Insert the key 155.

c. Insert the key 165.

d. Delete the key 10.

e. Delete the key 8.

Your answer should show five B+ trees. Notice: each operation is applied to the tree resulting from the previous operation. That is, your answer in part e. should represent the tree with 70, 155, and 165 inserted and 10 and 8 deleted.

2. [30 points] Imagine a database application that allows sailors to reserve boats from a yacht club. The relational schema used includes two relations: Sailors (sid, rating, age) and Reserves (sid, bid), where sid is sailor ID and bid is boat ID. Assume that the higher the boat ID, the “harder” the boat is to sail. The following query asks for the minimal age of sailors in every rating class, but only if someone (not necessarily the youngest sailor) reserved a boat of ID greater than 100.

SELECT S.rating, Min(age)

FROM Reserves R, Sailors S

WHERE R.sid=S.sid

GROUP BY S.rating

HAVING Min(age) < 20 AND Max(bid) > 100

a. Can one or both of the predicates in the HAVING clause be pushed through the grouping into the WHERE clause? Why or why not.

b. Same question, but assuming that the select clause also includes Max(age) (i.e., we’re asking for the youngest and oldest sailors of every rating group.

3. [30 points] Consider the following schema:

Subscriber (ssn, name, city-name)

City (city-name, route-code, region-id)

Region (region-id, name, delivery-code, report)

Assume that each Subscriber record is 30 bytes long, each City record is 50 bytes long, and each Region record is 1000 bytes long on average. There are 15,000 tuples in Subscriber, 5000 tuples in City, and 1500 tuples in Region. Page size is 2000 bytes, and there are 20 pages available in the buffer pool. You can assume uniform distribution of values whenever necessary.

Consider the query:

SELECT *

FROM Subscriber S, City C, Region R

WHERE S.city-name = C.city-name AND C.region-id = R.region-id

a. Suggest a query execution plan for this query. Specify which join algorithms you’re using at every point. Estimate the cost of the plan using the formulas given in class.

b. The original database does not contain indexes. As a database administrator, choose one index that you think will have most impact on performance, and show a query plan that will utilize this index. Estimate the cost of your proposed plan.

Note that city-name is NOT a key of City. Remember to make explicit any assumptions you are making.

4. [20 points] Consider the following schema. Note that in this schema, every purchase involves a single book.

Books (ISBN, Title, AuthorID, PublisherID, SubjectArea, Price)

Authors (AuthorID, AuthorName, Nationality, LitGenre)

Bookstores (StoreID, StoreName, Address)

Publishers (PublisherID, PublisherName, Address)

Customers (SSN, CustomerName, PurchaseID)

Purchases (PurchaseID, ISBN, StoreID, Month)

Availability (ISBN, StoreID, Quantity)

Let Q be the query that lists the customer names and the total amount of money spent on books (at any store) by each customer who bought more than two books at Barnes & Noble this month.

Consider the following views:

CREATE VIEW V1 AS

SELECT SSN, StoreName, Sum(Price) as SumPurchases, Count(*) as NumPurchases

FROM Customers C, Purchases P, Books B, Bookstores S

WHERE C.PurchaseID=P.PurchaseID AND

P.ISBN = B.ISBN AND

P.StoreID = S.StoreID

GROUP BY SSN, StoreName

CREATE VIEW V2 AS

SELECT SSN, StoreName, Sum(Price) as SumPurchases, Count(*) as NumPurchases

FROM Customers C, Purchases P, Books B, Publishers Pb, Bookstores S

WHERE C.PurchaseID=P.PurchaseID AND

P.ISBN = B.ISBN AND

P.StoreID = S.StoreID AND

Pb.PublisherID=B.PublisherID AND

PublisherName=”Addison Wesley”

GROUP BY SSN, StoreName

a. Can the query Q be written using the view V 1? If so, show how. Otherwise, argue why not.

b. Same for V2.