Final Exam Solutions

Winter 2001

1.a

select distinct Doc.url

from Doc, Occurs, Word

where Doc.url = Occurs.url and Occurs.word = Word.word and

Word.language = "Spanish"

common mistakes: forget "distinct"

notice: the join with Doc is neccessary, since we haven't said that

url in Occurs is a foreign key in Doc. Still, I haven't taken points

off if you dropped the join with Doc

1.b

select distinct Doc.url

from Doc

where Doc.url not in

(select Occurs.url

from Occurs, Word

where Occurs.word=Word.word
and Word.language != "Spanish")

common mistakes:

i. require documents to be nonempty (an
empty document contains

only Spanish words, hence should be
included)

ii. have an extra join in the inner select
(no need to join

with Doc again: A - A*B = A - B)

note: "distinct" is technically needed, since we didn't say that url

is a key in Doc. No points were taken off
if you dropped

"distinct".

note: other correct solutions are select... EXCEPT select... or

select...where Doc.url NOT IN (select...)

1.c

select distinct Doc.url

from Doc

where url not in (select occurs.url

from occurs, word

where occurs.word = word.word and

word.language = "Spanish" and

occurs.bytepos <= 300)

common mistakes: extra join with Doc in the subquery

note: other correct solutions are select... EXCEPT select... or

select...where Doc.url NOT IN (select...)

1.d

select d1.url, d2.url

from Doc d1, Doc d2, Occurs o1, Occurs o2

where d1.url = o1.url and o1.word = o2.word and o2.url = d2.url

groupby d1.url, d2.url

having count(distinct o1.word) >= 200

common mistakes:

i. extra join with Word

ii. count(*) >= 200 (this includes
pairs of documents that have

few words in common, byt they occur in
many positions)

2.a

/db//b

<b> 1 </b>

<b> 3 </b>

<b> 4 </b>

<b> 5 </b>

/db//c/b

<b> 4 </b>

<b> 5 </b>

/db//c[b]

<c> <b> 4 </b> </c>

<c> <b> 5 </b> </c>

2.b

/db/a/c <= 55

/db/a/[//b] >=1 and <=10

/db/a//*/*/b >=20 and <= 100

(the intent was to ask the question /db/a//*/*/*/b, but one
star

got lost during typing; the answer here is more interesting:
=80)

3.a: left tree: root and the three Red leaves;

right tree: the two Red leaves (not the root !)

3.b: yes, yes, yes, no

4

Partial Datalog(neg):

Win(x) :- Red(x), Leaf(x)

Lose(x) :- Blue(x), Leaf(x)

Win(x) :- Blue(x), R(x,y), Win(y)

Lose(x) :- Red(x), R(x,y), Lose(y)

Win(x) :- Red(x), Not(Lose(x))

Lose(x) :- Blue(x), Not(Win(x))

Inflationary datalog(neg), assuming perfectly balanced tree:

Win(x) :- Red(x), Leaf(x)

Lose(x) :- Blue(x), Leaf(x)

Win(x) :- Blue(x), R(x,y), Win(y)

Lose(x) :- Red(x), R(x,y), Lose(y)

Known(x):- Leaf(x)

Known(x):- R(x,y), Known(y)

Win(x) :- Red(x), Not(Lose(x)), Known(x)

Lose(x) :- Blue(x), Not(Win(x)), Known(x)

5.a:

------------

| a | c |

------------

| e | d |

------------

5.b

i. We have to count the largest number of nodes that are used by n

keys. Notation: ceil(x) means "round x up to the nearest
integer".

- there are ceil(n/2) leaves

- there are x internal nodes; each has 3 outgoing edges except

the root has only two
outgoing edges:

= 3x-1 outgoing edges

the edges' destinations are x-1 internal nodes and ceil(n/2)

leaves:

= x-1 + ceil(n/2) incoming edges

hence:

x =
ceil(ceil(n/2)/2)

- hence total number of splits:

= ceil(n/2) - 1 + ceil(ceil(n/2)/2) - 1

= (approx) n/2 + n/4 - 2 = 3n/4 - 2

ii. We have to count the smallest number of nodes used by n keys. As

above: ceil(n/4) leaves, ceil((ceil(n/4)-1)/4) internal nodes. Hence,

total number of splits is:

= ceil(n/4)-1 + ceil((ceil(n/4)-1)/4) - 1

= (approx) n/4 + n/16 - 1/4 - 2

= 5n/16 - 9/2