CSE594

CSE594

Database Management Systems

Fall 1999
Learning Experience #33 (AKA Final Exam)

"Our life is frittered away by detail ... Simplify, simplify."

- Henry David Thoreau, on physical data independence

As stated earlier, you may use one sheet of paper (8.5" by 11") with anything written on it. If you have a question or need more paper, please raise your hand. You may use the backs of these sheets if necessary, but please clearly indicate where you answers reside. Read each question carefully and state any assumptions you make. Your answers, as always, should be clear, concise, and thorough. Good luck!

You may unstaple these sheets if you like.

Logical Data Models (25 points)

(10 points) Consider the following complete definition of an O2 database:

class Clothing inherit object public type

tuple (color: string,

size: real)

end;

class Pants inherit Clothing public type

tuple (style: string,

goes_with: unique set (Shirt))

end;

class Shirt inherit Clothing public type

tuple (num_buttons: integer)

end;

/* Assume ‘init’ methods for Pants, Shirt that maintain the following extents */

name PantsExtent: unique set (Pants);

name ShirtExtent: unique set (Shirt);

Translate this O2 database into a relational database. You may wish to add identifier fields to serve as primary keys. Indicate all key constraints, foreign and primary.

(5 points) Briefly explain how the concept of equality in the relation model differs from the concept of identity in the object-oriented model.

(10 points) Consider the relation R (A, B, C, D, E) with functional dependencies AB® C, B® D, and C® E. Identify all candidate keys for R. What is R’s highest normal form? Decompose R into two relations each of which is in at least the next-highest normal form; show these relations and their primary and foreign keys. One of them should still have significant redundancy, so decompose it into two relations and show these relations and their primary and foreign keys.

Logical Query Languages (25 points)

Consider a relational database with information about students, classes, and computers. Each student is enrolled in 0 or more classes, each machine is assigned to 0 or more classes, and each student has an account for each class in which he or she is enrolled. Here is the schema:

Class (cno, title, mno)

Machine (mno, mname, manager)

Student (sno, name, year, phone)

HasAcct (sno, mno, login)

Enroll (cno, sno)

"Mno" always represents a machine number, "cno" a class number, etc. The "manager" field is the name of the manager. You should assume that student names are unique and that all managers are also students.

Relational algebra (10 points)

(7 points) Write a relational algebra query to list the titles of all classes using a machine managed by someone enrolled in course "CSE594". If you need to rename attributes, you can either use the renaming operation or simply use the dot notation to disambiguate them (e.g. "h.sno" would refer to the "sno" field of HasAcct).

(3 points) Write the same query in a different way and explain why one version is likely to be faster than the other (identify the faster one, please).

SQL (10 points)

(6 points) Write the query of A(i) in SQL.

(4 points) Write an SQL query to retrieve, for each machine, the machine number and the largest "year" value of all students with accounts on that machine.

OQL (5 points)

In English, what does the following query do, when run on the database from Question I?

select S.size

from S in ShirtExtent

where not (S in (flatten (select P.goes_with

from P in PantsExtent)))

Physical Aspects of Data and Queries (20 points)

Recall that most relational optimizers only examine left-deep join execution plans and not bushy join execution plans ("bushy" here means anything that’s not either left-deep or right-deep). Give an example of a natural join of 4 relations A(a1, a2, b1), B(b1, b2, c1), C(c1, c2, d1), and D(d1, d2, d3) in which a bushy join execution plan would be better than any of the available linear (left-deep or right-deep) plans. Relevant information to include in your answer includes relation sizes (in pages), reduction factors, etc. Assume that no indexes are available, pipelining is available, and only the page-oriented nested-loops join method is being used. Assume that for this particular natural join, duplicate columns are not removed. There are no operations in this query other than the joins. Explain your answer and be specific.

Concurrency Control and Recovery (20 points)

(10 points) Consider the following two transactions:

T0	T1
READ (A)	READ (B)
READ (B)	READ (A)
IF A = 0 THEN B := B + 1	IF B = 0 THEN A := A + 1
WRITE (B)	WRITE (A)

The initial values are A = B = 0 and the consistency requirement in the database is

((A = 0) OR (B = 0)).

Do both serial schedules preserve the consistency of the database?

Write an interleaved (non-serial) schedule for T0 and T1 that will violate the consistency of the database. Do not use locks.

Draw the serialization (precedence) graph for your schedule.

(6 points) Recall that ARIES uses fuzzy checkpointing. Checkpointing can also be done as follows: Quiesce the system so that only checkpointing activity is in progress, force-write copies of all dirty pages, and include the dirty page table and transaction table in the end checkpoint record. What are the pros and cons of this approach versus the ARIES approach?

(4 points) Explain how recLSN fields in the dirty page table are modified and used during the Analysis and Redo phases of ARIES.

Minirel (10 points)

You wrote code at the File Manager and Relational Operator layers of the Minibase system.

(2 points) What layer (you need only list one) of Minibase code calls the heapfile code that you wrote? Give an example of something this layer might ask of your heapfile code and the context in which that request might occur.

(6 points) Assume we’re using Ripple Join to join relations R and S. Each relation has 1 tuple per page and each relation has 3 pages. Both the betas for Ripple Join are 1. Assuming our buffer pool has 3 frames (one for R, one for S, and one for output tuples), and it is initially empty, how many page reads are performed when evaluating the join? You need not show your work, but if your final answer is wrong, you must show work in order to get partial credit.

(2 points) What does your answer to (B) tell you about the suitability of Ripple Join as a general-purpose join algorithm (as opposed to its obvious utility for OLAP aggregate joins)?

Trivia (2 pts. Extra credit)

What does ARIES (as in the recovery algorithm) stand for?

"It is also hoped that this paper can contribute to greater precision in work on formatted data systems."

- E.F. Codd, 1970