Final Exam Solutions

CSE 544

Winter 2001

1.a

select distinct Doc.url
from Doc, Occurs, Word
where Doc.url = Occurs.url and Occurs.word = Word.word and
 Word.language = "Spanish"

common mistakes: forget "distinct"
notice: the join with Doc is neccessary, since we haven't said that
url in Occurs is a foreign key in Doc. Still, I haven't taken points
off if you dropped the join with Doc

1.b

select distinct Doc.url
from Doc
where Doc.url not in
 (select Occurs.url
 from Occurs, Word
 where Occurs.word=Word.word and Word.language != "Spanish")

common mistakes:
 i. require documents to be nonempty (an empty document contains
 only Spanish words, hence should be included)
 ii. have an extra join in the inner select (no need to join
 with Doc again: A - A*B = A - B)

note: "distinct" is technically needed, since we didn't say that url
 is a key in Doc. No points were taken off if you dropped
 "distinct".

note: other correct solutions are select... EXCEPT select... or
select...where Doc.url NOT IN (select...)

1.c

select distinct Doc.url
from Doc
where url not in (select occurs.url
 from occurs, word
 where occurs.word = word.word and
 word.language = "Spanish" and
 occurs.bytepos <= 300)

common mistakes: extra join with Doc in the subquery

note: other correct solutions are select... EXCEPT select... or
select...where Doc.url NOT IN (select...)

1.d

select d1.url, d2.url
from Doc d1, Doc d2, Occurs o1, Occurs o2
where d1.url = o1.url and o1.word = o2.word and o2.url = d2.url
groupby d1.url, d2.url
having count(distinct o1.word) >= 200

common mistakes:
 i. extra join with Word
 ii. count(*) >= 200 (this includes pairs of documents that have
 few words in common, byt they occur in many positions)

2.a

/db//b
 1 
 3 
 4 
 5 

/db//c/b
 4 
 5 

/db//c[b]
 <c> 4 </c>
 <c> 5 </c>

2.b

/db/a/c <= 55
/db/a/[//b] >=1 and <=10
/db/a//*/*/b >=20 and <= 100
 (the intent was to ask the question /db/a//*/*/*/b, but one star
 got lost during typing; the answer here is more interesting: =80)

3.a: left tree: root and the three Red leaves;
 right tree: the two Red leaves (not the root !)

3.b: yes, yes, yes, no

4

Partial Datalog(neg):

Win(x) :- Red(x), Leaf(x)
Lose(x) :- Blue(x), Leaf(x)
Win(x) :- Blue(x), R(x,y), Win(y)
Lose(x) :- Red(x), R(x,y), Lose(y)
Win(x) :- Red(x), Not(Lose(x))
Lose(x) :- Blue(x), Not(Win(x))

Inflationary datalog(neg), assuming perfectly balanced tree:

Win(x) :- Red(x), Leaf(x)
Lose(x) :- Blue(x), Leaf(x)
Win(x) :- Blue(x), R(x,y), Win(y)
Lose(x) :- Red(x), R(x,y), Lose(y)
Known(x):- Leaf(x)
Known(x):- R(x,y), Known(y)
Win(x) :- Red(x), Not(Lose(x)), Known(x)
Lose(x) :- Blue(x), Not(Win(x)), Known(x)

5.a:

 ------------
 | a | c |
 ------------
 | e | d |
 ------------
5.b

i. We have to count the largest number of nodes that are used by n
keys. Notation: ceil(x) means "round x up to the nearest integer".

- there are ceil(n/2) leaves

- there are x internal nodes; each has 3 outgoing edges except
 the root has only two outgoing edges:
 = 3x-1 outgoing edges
the edges' destinations are x-1 internal nodes and ceil(n/2)
leaves:
 = x-1 + ceil(n/2) incoming edges
 hence:
 x = ceil(ceil(n/2)/2)

- hence total number of splits:
 = ceil(n/2) - 1 + ceil(ceil(n/2)/2) - 1
 = (approx) n/2 + n/4 - 2 = 3n/4 - 2

ii. We have to count the smallest number of nodes used by n keys. As
above: ceil(n/4) leaves, ceil((ceil(n/4)-1)/4) internal nodes. Hence,
total number of splits is:

 = ceil(n/4)-1 + ceil((ceil(n/4)-1)/4) - 1
 = (approx) n/4 + n/16 - 1/4 - 2
 = 5n/16 - 9/2