CSE 444 Homework 6

Objectives:: To be able to understand constraints, triggers, indexes, query optimization and execution
Reading Assignments:: Book 1: 6.1-6.4. Book 2: Chapter 4, 6.1-6.10 (but not 6.5.3-4, 6.6.4)
Number of points:: 100
Due date:: December 3

Remember: this assignment can only help your grade. We will be dropping the lowest homework grade (other than hw5). This assignment is for you so that you can know what we think is important for the final.

[20 points] Constraints:
Consider the relation:
```
Flight(source, destination, dep-time, arrival-time)
```
1. Write an SQL constraint that guarantees that for every tuple in Flight, the departure time is always less than the arrival time (you can assume that times can be compared by the < predicate, and that you can ignore international timelines and date changes).
2. Suppose we also have the relation
```
  FlightTime(source, destination, time)
```
  that for every pair of cities stores the net flight time between the two cities (the relation will have a tuple for every ordered pair of cities).
  Write an SQL constraint that guarantees that the Flight table will obey the times in the FlightTime table: i.e, the arrival time should be at least the departure time plus the time it takes to fly between the cities.
3. Can you write part B with a single tuple constraint? Why or why not?
[16 points] Triggers:
Consider the relation R(A,B). Write a set of SQL triggers that enforce the functional dependency A->B. Note that you need to consider inserations and updates to the relation R.
[10 points] Extra Credit!!! More triggers
Consider the following trigger on the relation Purchase. Assume that the attribute purchaseID is a key of the relation (i.e., uniquely determines all the other attributes in the tuple).
```
CREATE TRIGGER homeworkTrigger
BEFORE UPDATE OF Price on Purchase
REFERENCING OLD as OldTuple
            NEW as NewTuple

WHEN  (OldTuple.price > NewTuple.price AND NewTuple.price > 1)

      UPDATE Purchase
      SET price = NewTuple.price/2
      WHERE purchaseID = NewTuple.purchaseID

FOR EACH ROW.
```
1. Suppose we issue an update to the Purchase table that changes the tuple (111, 4) to (111,3) (where the first column is the purchaseID and the second is the price).
  Describe what will happen. What sequence of trigger firings will occur on the database and what will be the end result?
2. Repeat part A, except where BEFORE is relaced by AFTER.
3. Repeat part A, except where BEFORE is relaced by INSTEADOF.
[20 points] Indexing:
Consider a typical table for storing web-log data:
```
Logs(date, time, source-url, destination-url, ip-address)
```
Suppose you are implementing an application as you did in Homework 5. What organization would you choose for the file storing the Logs relation, and what additional indexes would you maintain on the relation.
Note: there are very few wrong answers to this question. The key is to write down the assumptions you're making about the application and other characteristics of the data that justify your choice.
[20 points] Execution and Optimization
Consider the following schema:
```
Subscriber (ssn, name, city-name)
City (city-name, route-code, region-id)
Region (region-id, name, delivery-code, report)
```
Assume that each Subscriber record is 30 bytes long, each City record is 50 bytes long, and each Region record is 1000 bytes long on average. There are 15,000 tuples in Subscriber, 5000 tuples in City, and 1500 tuples in Region. Page size is 2000 bytes, and there are 20 pages available in the buffer pool. You can assume uniform distribution of values whenever necessary.
Consider the query:
```
SELECT *
FROM Subscriber S, City C, Region R
WHERE S.city-name = C.city-name AND C.region-id = R.region-id
```
Suppose that there is a non-clustered b-tree index on city-name of Subscriber. Also, note that city-name is NOT a key of City. List all the plans that system-R would consider and identify the plan with the lowest cost.
Note:
- You need to consider the following join algorithms: block nested-loop join, hash join, and sort-merge join.
- The cost metric of plans is the number of I/O operations. Ignore the cost of writing out the final result.
- Remember to make explicit any other assumptions you are making.
[20 points] Optimization
In SQL Server, when you write a query you can also view the execution plan rather than executing the query. In this question you will create queries in such a way that you'll force SQL Server into using a particular join algorithm.
To view the execution plan rather than executing the query, hit "Control + L", or press the button to the immediate right of the "stop" button for executing queries. If you squint just right, it looks like a tree, and if you have the mouse hover over it, the caption "Display Estimated Execution Plan (Control+L)" will come up.
After you see the plan, you can hover over any of the operators and find out what the estimated I/O costs are, what the estimated CPU costs are, etc.
To figure out the total estimated I/O cost, etc, add together the costs for all of the operators. Don't forget to include the Table Scans, etc. If there's none given for a select, don't worry about it.
Using this fact, and any database (with data in it) that you like (the ones you made for either homework 3 or homework 5 should be fine, and yes, you may use other people's tables for this), answer the following questions:
1. Find a query that the optimizer will pick a hash join for.
  1. What was the query?
  2. What was the total estimated I/O cost?
  3. What was the total estimated CPU cost?
  4. By how much do they differ, and what do you think would happen if you used bigger or smaller tables? (comparing numbers with someone else is fine, just write down who) Why?
2. Find a query that the optimizer will pick a nested-loop join for.
  1. What was the query?
  2. What was the total estimated I/O cost?
  3. What was the total estimated CPU cost?
  4. By how much do they differ, and what do you think would happen if you used bigger or smaller tables? (comparing numbers with someone else is fine, just write down who) Why?
3. Why did the optimizer pick a hash join for A and a nested loop join for B?
[4 points]
1. How long did it take you to complete this assignment?
2. What did you like the best about this assignment?
3. What did you like the least about this assignment?
4. What helped you learn the best in this assignment?