CSE 444 Final

December 13, 2002

8:30-10:20 a.m.

Name___________________________________

Total number of points: 100

Total time: 1h 50’

 

 

1.[30 points] A Web search engine maintains a database with the following tables:

           Webpage(url, word)
           Dictionary(word, language)

A webpage is uniquely identified by its URL. The relation Webpage represents a many-to-many relationship from the set of urls to the set of words: notice that one url may contain multiple words, and conversely, one word may occur in multiple urls.  The table Dictionary consists of a many-to-many relationship between words and languages.  Here too, one word may belong to multiple languages. We say that a webpage is written in language L if all its words are in the language L (a page may happen to be in multiple languages). Write SQL queries to answer the questions below:


 

a. [10 points] Compute for each url the total number of words in that webpage.


 

b. [10 points] Retrieve all pairs of distinct urls which have more than 1000 words in common.


 

c. [10 points] Retrieve the urls of all pages written in “French”.


 

2. [20 points] Consider two tables R(A,B) and S(B,C), where all attributes are integers. Let Q1, Q2 be the following SQL queries:

Q1: select R.A, S.C
       from   R, S
       where R.B=S.B

Q2: select R.A, sum(S.C)
       from   R, S
       where R.B = S.B
       group by R.A

Assume M = 102,  B(R) = B(S) = 10000, V(R,B) = V(S, B) = 100. Give two optimal physical query execution plans for Q1 and Q2, and indicate their cost. Your plans may contain one or more instances of the physical operators from the list below. You do not need to consider any other physical operators.

Nested loop join or block nested loop join

Main memory hash join

Partitioned hash join

Main memory hash group-by

Partitioned hash group-by




3.[10 points] Consider three tables R(A,B,C), S(A,B,C), and T(D, E). A is a key in R. Answer the following questions.  If you answer “yes” you don’t have to justify your answer; if you answer “no” you need to give an example that justifies your answer. All relational algebra operators below are on sets, i.e. they eliminate duplicates.

Is A a key in PAB(R) ?




Is A a key in sC=55(R)  ?




Is A a key in R  È  S  ?




Is A a key in R – S ?




Is A a key in R B=D T


 

4. [20 points] Compute the optimal plan for R    S    T    U using the dynamic programming algorithm, assuming the following:
B(R) = 500,  B(S) = 200,  B(T) = 600,  B(U) = 400
The size of a join is estimated as:  B(A    B) = 0.01 * B(A) * B(B)
The cost of a join is estimated to be the cost of the subplans plus the size of the intermediate result(s).



 

5. [20 points] The SuperSQL database system performs undo recovery without checkpointing. It stores its undo log file in a table, with the following schema:

          Log(N, T, E, V)

where N is the entry number (0, 1, 2, 3, …), T is the transaction id, E is the element id, and V is the old value.  A log entry of the form <T, E, V> is simply represented by the tuple (N, T, E, V), where N is the entry number; here E > 0. The log entries <START T>, <COMMIT T>, and <ABORT T> are represented by a tuple (N, T, E, null), where E=-1 for START, E=-2 for COMMIT, and E=-3 for ABORT. For example, the log:

<START T1>

<T1 X1 55>

<START T2>

<T2 X2 99>

<COMMIT T1>

. . . .

Is represented by the table:

N

T

E

V

0

T1

-1

 

1

T1

X1

55

2

T2

-1

 

3

T2

X2

99

4

T1

-2

 

. . .

 

 

 

Recall that each transaction starts and ends at most once, i.e. a sequence <START T> … <COMMIT T> … <START T> … will not occur in the log. Moreover, any update by the transaction T will occur between its <START T> and <COMMIT T>, or between <START T> and <ABORT T> respectively.

Write a SQL query that can be run during database recovery, after a system crash. The answer to your query should return a table with two attributes, E, and V, indicating which elements have to be updated with what values. You should include each element E at most once in your answer: otherwise it would be ambiguous how to update it.