Assignment 3: XML and Database Theory
Due Date: February 15
Total number of points: 100

 

  1. [20 points] Consider the following relational data:

    Products:
    pid  Name  Price  Description
    323  gizmo  22.99 great
    233 gizmo plus 99.99 more features
    312 gadget 59.99 good value


    Stores:

    sid Name Phone
    s282 Wiz 555-1234
    s521 Econo-Wiz 555-6543



    Sells:

    pid Markup sid
    323 10% s521
    233 25% s282
    233 15% s521


    a. [2 points] We want to export this data into an XML file. Write a DTD describing the following structure for the XML file:
        - there is one root element called products
        - the products element contains a sequence of product  sub elements, one for each product in the database
        - each product element contains one name, one price, and one description subelement, and a sequence of store subelements, one for each store that sells that product:
        - each store element contains one name, one phone, and one markup .
    You have to turn in a DTD.

    b. [2 points] Write the XML document obtained by exporting the database above; you have to turn in an XML document.

    c. [5 points] Assuming that you have XML documents with the structure given in a., write an XML-QL query that returns the names and prices of all products that are sold at least at one store with a markup of 25%.   You have to turn in an XML-QL  query.

    d. [3 points] Write the same query in SQL over the original relational database schema.  You have to turn in a SQL query.

    e. [8 points] Assume the same database is represented in an XML document whose structure follows the relational tables:

       <db> <products> <row> <pid>323 </pid> <name> gizmo </name> <price> 22.99 </price> <description> great </description> <row>

                                   <row> ... <row>

                                   <row> ... <row>

                <products>

                <stores> <row> ... <row> ... </stores>

                <sells>   <row> ... </row> ... </sells>

       </db>

    Write an XML-QL query that, when given an input with this structure, constructs an XML document with the structure described in a.  You have to turn in an XML-QL query.

  2. [25 points] Consider the following DTD fragment:

    <!ELEMENT wnyc (piece*)>
    <!ELEMENT piece (time, title, composer, conductor?, orchestra?, soloist*, publisher>

    (elements time, title, composer, conductor, orchestra, soloist, publisher are all PCDATA and are omitted from this DTD; also, each element has an attribute id of type ID, which is omitted too). Consider the XML document below, which is valid w.r.t that DTD:

    <wnyc id="0">
       <piece id="1"> <time id="10"> 12:26 PM </time>
                    <title id="11"> Mad Rush </title>
                    <composer id="12"> Philip Glass </composer>
                    <soloist id="13"> Aleck Karis, piano </soloist>
                    <publisher id="14"> Romeo 7204 </publisher>
       </piece>
       <piece id="2"> <time> id="21" 12:49 PM </time>
                    <title id="22"> Concerto No. 12 in E, Op. 3, RV 265 </title>
                    <composer id="23"> Antonio Vivaldi </composer>
                    <conductor id="24"> Fabio Biondi </conductor>
                    <orchestra id="25"> Europa Galante </orchestra>
                    <publisher id="26"> Virgin Classics 45315 </publisher>
        </piece> 
        <piece id="3"> <time id="31"> 1:01 PM </time>
                      <title id="32"> Symphony No. 1 in e, Op. 39 </title>
                      <composer id="33"> Jean Sibelius </composer>
                      <conductor id="34"> Leonard Bernstein </conductor>
                      <orchestra id="35"> Vienna Philharmonic Orchestra </orchestra>
                      <publisher id="36"> Deutsche Grammophon 435351 </publisher>
        </piece>
        <piece id="4"> <time id="41"> 1:47 PM </time>
                      <title id="42"> Andante for Piano ("Andante favori") </title>
                      <composer id="43"> Ludwig van Beethoven </composer>
                      <soloist id="44"> Alfred Brendel, piano </soloist>
                      <publisher id="45"> Philips 438472 </publisher>
         </piece>
         <piece id="5"> <time id="51"> 1:57 PM </time>
                      <title id="52"> Upon Enchanted Ground </title>
                       <composer id="53"> Alan Hovhaness </composer>
                      <soloist id="55"> Yolanda Kondonassis, harp </soloist>
                      <soloist id="56"> Frank Hendrickx, alto flute </soloist>
                      <soloist id="57"> Herwig Coryn, cello </soloist>
                      <soloist id="58"> Patrick De Smet, tam-tam </soloist>
                      <publisher id="59"> Telarc 80530 </publisher>
          </piece>
    </wnyc>

    This problem is concerned with storing the XML data into relations and translating XML-QL queries into SQL queries. The XML-QL query is:

                        where <wnyc> <piece> <title> $t </> <soloist> Hendrickx </> <publisher> $p 
                                                </piece>
                                  </wnyc> in "file.xml"
                        construct <result> <title> $t </> <publisher> $p </> </>

    a. [9 points] Assume we don't know the DTD and don't know the XML data. We design the following relational schema:
                Ref(src, lable, dst)
                Val(oid, value) 
        (see Data on the Web, page 173).
            - when we store the XML instance above in this database, how many tuples will Ref contain, and how many tuples will Val contain ?
            - translate the XML-QL query above into a SQL query over this schema.

    b. [8 points] Assume we know the DTD above.  We design the following relational schema: 
                Piece(oid, time, title, composer, conductor, orchestra, publisher)
                Soloist(soloist, pieceOid)
        (Notice: fields are allowed be null.)
            - populate the database schema with the XML instance above; how many tuples are there in the two tables ?
            - translate the XML-QL query above into SQL.

    c. [8 points] Assume we have a particular XML instance, and we can analyze its structure before deciding on a relational schema.  E.g. if we have the instance above, then we infer that there are two kinds of pieces: those having a conductor and an orchestra, and those having soloists, and there are at most four soloists.  Then we design the following relational schema:
                Piece1(oid, time, title, composer, conductor, orchestra, publisher)
                Piece2(oid, time, title, composer, soloist1, soloist2, soloist3, soloist4, publisher)
            - populate the database with the XML instance above; how many tuples are there in the two tables ?
            - translate the XML-QL query into SQL

  3. [15 points] Consider the following XPath expressions:
                - q1= /bib/book/title
                - q2= /bib/paper[author[firstname="John"][lastname="Smith"][year="1995"] ]
                - q3= /bib/paper[author[firstname="John"]][author[lastname="Smith"]]
    Assuming we store XML data in the relations Ref and Val in problem 2, Translate these expressions into conjunctive queries over Ref and Val.  Prove formally that expression q2 is contained in q3.
  4. [20 points] Consider a database schema with three relations: R(x,y), S(x), U(x,y) (i.e. R is binary, S is unary, and U is binary).
       a. Write a query qa(x) that computes the active domain of a database with that schema.
    For each of the FO queries below indicate whether they are safe or not. When the query is not safe, give another FO query that, under the active domain semantics, is equivalent to the given query. For example, if they query were {(x,y) | not(R(x,y))}, then you would answer "no" and give the equivalent query  {(x,y) | qa(x) and qa(y) and not(R(x,y))}, where qa(x) is the query computing the active domain.
        b. [4 points] { x |  S(x) and (forall y. (R(x,y) ==> exists z.(S(z) or U(y,z))))  }
        c. [4 points] { x |  exists y.(S(y) ==> forall z. (R(x,y) and U(y,z)))  }
        d. [4 points] { x | S(x) and forall y.(S(y) and R(x,y))  }
        e  [4 points] { x | S(x) and forall y.(U(x,y) or forall z. (not R(y,z)))  }
        f.  [4 points] { (x,y) | exists z. (R(x,z)  or U(z,y))  }
  5. [20 points] Consider the database schema Flight(a,x,y), meaning that airline a offers a direct flight from city x to city y. Write datalog queries to compute the following:
       a. [5 points] The pairs of cities (x,y) s.t. there exists a connection from city x to city y, using any number of plane changes and any combination of airlines.
       b. [5 points] The pairs of cities (x,y) s.t. there exists an airline offering a connection from city x to city y, using any number of plane changes.
       c. [10 points] The pairs of cities (x,y) s.t. there exists two connections from x to y, one using only "United", the other using only "Delta", and the two itineraries have the same number of plane changes.