Objectives:

  • To understand how SQL queries are translated into the relational algebra.
  • To master writing relational queries in a logic formalism, datalog.

Assignment tools: None

Due date: Tuesday, July 25 by 11:00pm.
Turn in your solution using the assignment drop box linked from the main course web page.

What to turn in:
One file, hw4-answers.{txt|rtf|pdf|doc|docx} (any one of those formats)
with each question answered in (a) RC, (b) Datalog, and (c) RA

Setup

In this homework, we will work with the following relational schema:

  • Employee (eid, name, office) — each row gives the unique ID of the employee (eid) along with their name and address.
  • Manager (eid, mid) — each row says that the employee with ID mid is a manager of the employee with ID eid.

Both the eid and mid values in Manager should hold the IDs of employees that exist. That is, both columns eid and mid in Manager are foreign key references to Employee(eid).

Also note that each employee can have multiple managers, and each manager can manage multiple employees. This is why Manager has no primary key.

Problems

For each query described below, show how to find the answer using relational calculus, and datalog. In other words, you should return three answers for each problem: (a) a relational calculus expression; (b) a query in datalog+negation (c) a relational algebra plan.

Assume that your datalog programs are using set semantics. However, for relational algebra, we will use bag semantics.

For (a) and (b), turn in your solutions by writing out the RC expression and datalog query, respectively. For (c), you can either draw out the RA plan as a tree as shown in lecture, or write them out as an expression (e.g., σSalary > 40000(Employee)). Recall that RA expressions are evaluated from left to right, and innermost expression first. Do not assume any precedence rules; write out all parentheses to denote precedence). In all cases, please make sure your logic and RA symbols are legible if you decide to draw them by hand and scan them in.

  1. [3 points] Retrieve all employees that have two or more managers. Output their IDs and names.
  2. [3 points] Retrieve all employees that have no managers. (For example, the CEO of the company has no manager.) Output their IDs and names.
  3. [3 points] Retrieve the offices of all managers of all employees named 'Alice'. (Note that there could be multiple employees named 'Alice', and each such employee could have multiple managers.)
  4. [3 points] Find all the managers with the property that every employee they manage is located in same office. Output the manager's ID and name and the office where all their employees are located.
  5. [3 points] A manager is an employee who manages at least one other employee. A second-level manager is a manager who manages only managers. Write a query to return all second-level managers; your query should return their eids and their names.

Optional: If you want to double check your answers, you can try executing them in DLV. (You will need to make up some facts about Employees and Managers.)

Extra Credit:

  • [1 point] Write a datalog query that finds all employees that report directly or indirectly to 'Alice'. That is, your query should return the employees managed by Alice, the employees managed by someone managed by Alice, and so on.