CSE 544

University of Washington

Spring 2000

Homework 3

Due June 2 (Last day of class)

For this homework please choose 4 of the 5 lettered (A - E) problems. If you do more than 4, you may receive extra credit.

Optimization

A. Query Rewriting [25 pts]
The aim of this question is to explore the possibilities available via algebraic rewritting.

Transform the following nested queries into semantically equivalent queries without subqueries, or argue that there is no such transformation.

  1. Assume that S.d is a key for S. (5 pts)
      SELECT R.c
      FROM R
      WHERE R.c = ( SELECT S.c
                    FROM S
                    WHERE S.d = 42 )
    
  2. (5 pts)
      SELECT ssn
      FROM   employee
      WHERE  ssn = ( SELECT MAX(essn)
                     FROM works_on )
    
  3. (6 pts)
      SELECT fname
      FROM   employee
      WHERE  ssn = ( SELECT MAX(essn)
                     FROM works_on )
    
  4. (9 pts) The query Q1 over the view V1 can be rewritten as Q2 if certain functional dependencies hold and/or certain integrity constraints are satisfied. Describe the conditions which allow this rewriting.

    Q1 is a query returning the name, max selling price, and max per day sales for all products sold in the northwest.

      SELECT   NorthWestProductSales.productName, NorthWestProductSales.price, 
                MAX(Inventory.perDaySales)
      FROM     NorthWestProductSales, Inventory
      WHERE    NorthWestProductSales.productName = Inventory.productName and
               NorthWestProductSales.price = Inventory.price
      GROUP BY NorthWestProductSales.productName, NorthWestProductSales.price
    

    V1 is a view of all products sold in the northwest, with their highest selling price.

      CREATE VIEW NorthWestProductSales (productName, price) AS
        SELECT   ProductSales.name, MAX(ProductSales.price)
        FROM     ProductSales, NorthWesternStates
        WHERE    ProductSales.zipcode = NorthWesternStates.zipcode
        GROUP BY ProductSales.name
    

    Q2 is another query returning the name, max selling price, and max per day sales for all products sold in the northwest.

      SELECT   ProductSales.name, MAX(ProductSales.price),
                MAX(Inventory.perDaySales)
      FROM     ProductSales, Inventory
      WHERE    ProductSales.price = Inventory.price and
               ProductSales.name = Inventory.productName
      GROUP BY ProductSales.name
    

B. Plan generation [25 pts]

This question will familiarize you with the workings of a commericial database optimizer. You should be familiar enough with the operation of a typical query optimizer to be able to pose a query which forces the optimizer to choose a particular plan. You will need to use SQL Server. You can view the query plan in SQL Server by typing CTRL-L in the Query Analyzer.

Use the same database and relation instances you used for HW 1. Follow this link if you need to re-create this database. Do not add any rows or indices to this database -- create it precisely as we have given it to you. Answer the question below for each of the following join algorithms:

  1. Index Nested Loops Join
  2. Sort Merge Join

Write an SQL query whose execution plan contains the algorithm. The query must consist of exactly one query block ("select...from...where... group by...order by...", where the "where", "group by", and "order by" clauses are optional). The query need not return any rows.

Theory

C. All of the following three theory related questions. [25 pts]
Query equivalence (6 pts)
Give two queries which are equivalent under set semantics, but not bag semantics. You can use any schema you desire, including this one.
Query Containment (7 pts)
Prove that if a conjunctive query is contained in the union of conjunctive queries, then it is contained in one of the union members.
Datalog (12 pts)
A datalog query is said to be satisfiable if there is some database D, such that the result of the query over D is not empty. Show the following:
  1. It is decidable (show how) that a datalog query with no arithmetic comparison operators or predicates is satisfiable. (6 pts)
  2. Show that determining whether a datalog query with stratified negation is satisfiable is undecidable. Recall that equivalence and containment of datalog programs is undecidable. (6 pts)

Data Integration

D. The following question on data integration. [25 pts]

YourBookstore.com has data from several sources. They have a relational source with basic book information. They have an XML source which has some additional information they buy from Amazon.com. And finally they have a source of reviews.

The relational source is a typical relational database, with primary keys and foreign keys as specified. On the other hand, Amazon.com is picky about how they will give out data. They do not want YourBookstore.com to be able to recreate their database completely, so in order to get any additional information the ISBN number for the particular book must be specified. The source of reviews allows fulltext keyword searches, with the ability to specify a search only on the book title, author and/or ratings fields.

  1. Design a mediated schema including relevant information from each of the three sources. This schema should allow users to find all the information about books, including reviews. (3 pts)
  2. Give source descriptions for each data source. (4 pts)
  3. Explain how a system of your design would answer the following queries.
    1. I really like pictures of animals. I want to find a book with pictures of animals in it. But not just any pictures. I want pictures that are good enough that a reviewer would mention them in their review. Find all 'picture' books whose description includes 'animals' and whose review includes the phrase 'excellent pictures'. (6 pts)
    2. Those were such great books! I think I would like to raise animals. I am looking for a book about animal husbandry which has reasonably good reviews and a lot of people have bought. Find all books whose description or title includes 'animal husbandry', whose reviews are better than 3, and which have a marketshare of more than 20%. (6 pts)
    3. Now that I know how to care for animals, I want to know how to butcher them (I didn't know they would reproduce so fast!). I have heard from friends that there is a particularly good book on butchering written in 1923 or 1924 by an englishman. It is really thick, probably over 700 pages and covers all the important butchering techniques as well as some advanced topics in the slaughter of such popular animals as cows and goats. I am not sure it is still in print however; the demand for such seminal texts on butchering has really gone down in recent years. Find a book that is in stock, over 700 pages long, and in whose description 'butchering techniques' and 'cows' are mentioned.(6 pts)

Transaction Processing

E. The following question on transaction processing. [25 pts]

The question of updating a record R in a recoverable manner was addressed in class. The following order of operations was declared to be the only one which works.

  1. FETCH page P
  2. PIN page P
  3. WRITE LOCK record R (*)
  4. LATCH page P
  5. UPDATE page P (*)
  6. LOG update of page P (*)
  7. UNLATCH page P (*)
  8. UNPIN page P

For each operation marked with (*) in the list above, argue whether it must come after the operation which preceeds it in order to achieve the goal of a recoverable update to record R. Be sure to state your assumptions clearly. (6 pts each + 1 pt for good measure)