Name:__________________

 

 

CSE 544 --- Final Exam

June 10, 2002

2:30pm – 4:20pm

 

Time: 1h50’

Points: 100

This is an open book exam (solution)



1. [25points] A document d is a set of words, d = {w1, ..., wn}.  We ignore the

   word order in the document.  Consider a database of French

   documents, English documents, and a French/English dictionary.  The

   dictionary associates a confidence factor between 0 and 1 to a

   pair consisting of a French and an English word.  The schema is:

 

   FrenchDocs(fID, fWord)

   EnglishDocs(eID, eWord)

   Dictionary(fWord, eWord, confidence)

 

   Each document has a unique identifier (either an eID or fID).

   Thus, given a document identifier, the corresponding document is

   the collection of words associated with that identifier.

 

   Write SQL queries to answer each of the following:

 

   a. [5points] For each French document, count how many words it contains.


   b. [5points] Find all other English documents that have at least 100

    words in common with the English document eID = "e424".

















 

   c. [5points] Given two words, fWord, eWord, define their correlation,
   c(fWord,  eWord) to be the unqiue number c s.t. (fWord, eWord, c) is in

   Dictionary, or 0 if they don't occur in the dictionary.  Given two

   documents d1 = {w1, ..., wm} and d2 = {v1, ..., vn}, define their

   correlation to be: Si,j(c(wi,vj))/Ö(m*n).  We say that fID is a

   translation of eID if their correlation is >= 0.9.  Write a SQL

   query that retrieves all French documents that are translations of

   "e424".  Assume that the DBMS supports sqrt for computing the square root.

 


   d. [5points] Given a French word w, define its translation, w' = t(w) to be

   the English word s.t. (w, w', 1.0) is in Dictionary: we assume that

   for each w there exists at most one such w'.  Given a document D =

   {w1, ..., wn}, define its translation to be t(D) = {t(w1), ...,

   t(wn)} (there may be fewer than n words, if some of the t(wi)

   values are undefined).  Write a SQL query that inserts an English

   translation of the French document "f822" into the database.  Call

   the new English document "e916".

 

 

 

 

 

 

 

 

 

 

 

 

   e. [5points] The following conjecture has been suggested by an expert

   familiar with the application domain: "all English documents that

   contain the words 'vacation' and 'summer' also contain the words

   'hiking' and 'biking'.  Write a SQL query to prove or disprove this

   conjecture.  Explain how the result of the SQL query can be used to

   prove or disprove the conjecture.


2. [15points] We have the following DTD:

 

        <!ELEMENT vacation (region*)>

        <!ELEMENT region (name, package*)>

        <!ELEMENT package (hotel, airline, month, price)>

 

   (all other elements are assumed to be #PCDATA)

 

   a. [5points] write XPath expressions to return the following:

 

      - return all package elements that fly "Northwest Airlines" and

        are under $2000.  Assume that prices are expressed in

        integers, i.e. a test "<= 2000" should suffice.

 

 

 

 

 

 

 

 

 

 

 

 

      - return all hotel elements in the "North West" region that are

        available in "December"

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

      It is OK to include duplicates in the answers (XPath does not

      eliminate duplicates).

 

 


 b. [10points] write an XQuery expression that translates the document into the

   following structure:

 

<!ELEMENT vacation (hotel*)>

<!ELEMENT hotel (name, package*)>

<!ELEMENT package (region, airline, month, price)>

 

   Your result should have a single entry for each hotel, even if that

   hotel occurs under several packages, or under several regions in the input.


3. [15points] Consider a graph with colored edges, G(x,c,y), where x,y are nodes

   and c is a color.

 

   a. [5points] We say that x is a "red" node if all its outgoing edges are

   "red".  Write a FO query that returns all red nodes.

 

 

 

 

 

 

 

 

 

 

   b. [5points] If (x,c,y) is in G, then we say that y is a c-neighbor of x.

   Write an FO query that returns all nodes where all "blue"-neighbors

   have at least one "green"-neighbor that is a "red" node.

 

 

 

 

 

 

 

 

 

 

 

 

   c. [5points] Write a datalog program to compute the pairs of nodes (x,y)

   connected by a path in which all edges have the same color. Your program should

   compute a binary predicate Answer(x,y) consisting of all pairs (x,y) in the answer.

 


4. [25points]
a. [10points] Assume a B+ tree with d=1, i.e. each node has between 1 and 2

      keys. Starting from an empty tree, insert the keys 1, 2, 3, ..., 16, in this order.
      Show at least the following four trees:
              after inserting 5, after inserting 7, after inserting 14, after inserting 16.


   b. [5points] We sort a file using multi-way merge-join.  The main memory can

   store M = 101 pages.  A record is as large as one block (=page).

   There are 10000 records, with the following keys (in this order):

 

   1 2 ... 99 100000 101 102 ...199 200000 201 202 ...299 300000 ... 9901 9902 ... 9999 10000000

 

   These are all keys 1, 2, ..., 10000 in sequence, with every 100th

   key multiplied by 1000.  The sorted file will be:

 

   1 2 ... 99 101 102 ...199 201 202 ...299... 9901 9902 ... 9999 100000 200000 300000 … 10000000

 

 

   How many times do we need to read and

   write each record from/to disk if

(i)                  we use quicksort during run formation:

 

 

 

 

(ii)                we use replacement selection during run formation:




 

 


 c. [10points] Consider two tables, R(A,B) and S(C,D), and the selection-join operation

   sA=v(R) ´B=C S.  Assume that a record is a large as a block and that
   T(R)=B(R)=B(S)=10000.  There is a

   clustered index on S.C, and we need four disk accesses to lookup a

   value in that index (including the disk access to get the record in

   S).  We consider the following methods for the join:

 

            - block nested loop join

            - partitioned hash-join

            - index join

 

   In each of the four cases below, indicate the best access method its cost.







 

 

V(R,A) = 1

V(R,A) = 10000

M = 101

 

 

M = 5002

 

 

 

 

 

 


5. [20points] Consider the following relational schema:

 

       Product(name, manufacturer, city)

 

The following XQuery specification defines an XML view over the

relational schema.  Here $db denotes a canonical relational view of

the relational schema:

 

<sales-by-city>

    let $c-all := distinct-values($db/Product/tuple/city/text())

    for $c in $c-all

    return

      <city>

         <name> { $c } </name>

        { let $m-all := distinct-values($db/Product/tuple[city/text()=$c]/manufacturer/text())

           for $m in $m-all

           return

             <manufacturer>

                <name> $m </name>

                { for $p in $db/Product/tuple[city/text()=$c][manufacturer/text()=$m]

                   return <product> $p/name/text()  </product>

                }

             </manufacturer>

         }

      </city>

</sales-by-city>

 

   a. [5points] write the DTD of the XML view.


   b. [15points] translate the following XPath queries into SQL.    
     Your SQL queries have to return the same number of occurrences,

      i.e. if "Gizmo Plus" occurs three times in the result of the

      second XPath expression, then it must also occur three times in

      the SQL's answer.  The order in your answer is irrelevant.



        (i)  [5points]  /sales-by-city/city 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

        (ii) [5points]  /sales-by-city/city[name/text()="Seattle"]//product/text()
          (iii)  [5points]  /sales-by-city/city/manufacturer/name/text()