Feb 5, 2001
3:35pm – 4:50 pm
Name____________________________________________________
Total number of points:
75
Total time: 1hr 15 min
Reminders
a)
Do
not forget to write your name
b)
Closed
book, Closed notes
c)
Write
legibly. If we cannot read your answer, you get no credit
d)
Do
not get stuck at any one question
e)
Length
of an answer is determined by the question. Do not feel compelled to fill all
the available space J
f)
Correctness
in schema design and SQL is of primary importance, but simplicity and good
taste matter too.
Score
Q# Max. Point Your Point
1. 10
2. 10
3. 20
4. 15
5.
20
TOTAL 75
1. [10pt ] Multiple Choice [1pt
each] Circle the correct answer
(Yes/No) against each statement
a. Using Call Level Interface (such as ODBC) requires a DBMS vendor-specific precompiler. NO
b. ODBC Drivers are DBMS vendor-specific. YES
c. Stored Procedures are executed in the client NO
d. All views are physically materialized (stored) as tables NO
e. Strict Relational Algebra can not express all SQL queries YES
f. Referential integrity is a constraint that holds between columns of the same table NO
g. Size of a left outerjoin between R and S is always at least as large as that of (R Join S), assuming the join is on the same set of columns YES
h. Size of a left outerjoin between R and S is always at least as large as that of R (as well as S). : This was a confusing question. Left outerjoin between R and S is at least as large as that of R, but no conclusions may be drawn on whether it is larger or smaller than S. You would have been awarded a point whether you answered Yes or No.
i. Size of a join is at least as large as its inputs, i.e., size-of(S Join T) is greater of equal to Max(size-of(S), size-of(T)) NO
j. Result of (select * from T) cannot include any duplicates NO
2. [10pt] Short Answer Question. Answer with justification
(a) [5pt] Consider the following two queries. Q1: pa (S) Ç pa (T) and Q2: pa (S ÇT). Are they equivalent? If not, is the answer to one of them always larger in number of tuples compared to the other one? Justify.
They are not equivalent. Answer to Q1 is at least as large as that of Q2. This is because S intersect T and then project is potentially smaller as S and T need to agree on all attributes. If the projection happens first, they only need to agree on the projected attribute.
(b) [5pt] Consider the following two queries:
Q1: Select X From R R1 Where
NOT EXISTS (Select * from R where X > R1.X)
Q2: Select Max(X) from R
Are they equivalent? If not, is the answer to one of them always larger in number of tuples compared to the other one? Justify.
These two queries are not equivalent. Answer to Q1 is at least as large as that of Q2. This is because Q1 returns all tuples from R whose X attribute value is maximal (in case that there is more than one). Q2 always returns only a single tuple.
3. [20pts] Entity Relationship Model and Decomposition
Every course is held one or more times a week at different start and end times. However, the class is held in the same room every week and is taught by the same teacher for every lecture. Every teacher has attributes facultyid (unique), name and rank. Each student has attributes studentid (unique), name, and department. Every course has attributes courseid (unique), and title. Furthermore, a subset of the courses is designated as required courses. Every student may be enrolled in at most 3 courses but must enroll in at least one course. Thus, every student must enroll in at least one required course but optionally in two other courses. Note that a student can enroll in more than one required courses (up to the limit of 3).
(a) [12pt] Draw an E-R Diagram that captures the above constraints. Justify why your E-R diagram is accurate. Identify constraints that the E-R diagram is unable to capture.
We are unable to express that each student has to have at least one of the required courses, as well as the fact that the student may take no more than three classes total.
(b) [8pt] Map your E-R Diagram into a relational schema with brief justification. Make sure to identify keys.
Course(courseid,
title,room)
Student(studentid,
name,department)
Teacher(facultyid,name,
rank)
TaughtBy(facultyid,courseid)
Meets(courseid,day,
starttime,endtime)
Has(studentid
,courseid)
ReqHas(studentid,courseid)
Day(day)
4. [15 points] Functional Dependency and Decomposition
Consider the relation R(A,B,C,D,E) with the set of functional dependency as {A->B, B->C, C->A, D->E, E->D}
a. [5pt] What are all the keys for R? Justify briefly.
A+
->{A,B,C}
D+->{D,E}
Keys:
{A,E}, {A,D}, {B,D}, {B,E}, {C,D}, {C,E}
These
determine all other attributes of the relation
b. [5pt] How many superkeys are present? Justify briefly.
{A,D},
{A,E}, {B,D}, {B,E}, {C,D}, {C,E}
{A,
B,D}, {A,C,D},{A,B,E}, {A,C,E},{A,D,E}
{B,C,D},
{B,C,E}, {B,D,E}, {C,D,E}
{A,B,D,E},
{A,B,C,D}, {A,B,C,E}, {A,C,D,E},{B,C,D,E}
{A,B,C,D,E}
The
faster way to answer this question is:
Each
superkey must include one of {A,B,C} and one of {D,E}.
(23-1)*(22-1)=21
c. [5pt] Do a decomposition in BCNF with brief justification.
A violation is A->B,C so we split as ABC and ADE. D->E is a violation in ADE , so decompose to AD and DE.
Final decomposition: {A,B,C}, {A,D}, {A,E}
5. [20pt] SQL and Relational Algebra
Consider the following schema that depicts calls made to customers by agents to survey the degree of satisfaction in using products. Every agent calls a customer only once to enquire about a specific product. But, the agent may not enquire about all the products with every customer and all customers may not be surveyed.
Agent(Agentid, AgentName)
Product(Productid, ProductName)
Customer(Customerid, Customername)
Log(Agentid, Customerid, Productid, CallDate, CustomerSatisfaction)
/*CustomerSatisfaction is a number between 1 and 10 if the customer has bought the product. Otherwise it is NULL*/.
Write the following SQL queries over this schema:
(a) [3pt] Write CREATE TABLE statement for Log that identifies the appropriate keys and foreign keys. Assume that CustomerID, Productid are of characters of length 10 and CallDate is of type Date. CustomerSatisfaction has been explained above.
CREATE
table Log(Agentid CHAR(10) NOT NULL,
Customerid
CHAR(10) NOT NULL,
Productid CHAR (10) NOT NULL,
CallDate
DATE,
CustomerSatisfaction INTEGER DEFAULT
NULL check
(CustomerSatisfaction>0
and CustomerSatisfaction<11)
Primarykey(Agentid,Customerid,Productid),
Foreign key
(Agentid) references Agent,
Foreign key
(Productid) references Product,
Foreign key (Customerid)
references Customer)
(b) [3pt] Write in Relational Algebra (you can use intermediate relation variables, if needed): List of customers agents spoken to by agents on Jan 20 and who did not buy any products brought up on during their call with the agents (as of Jan 20)
pCustomerName(
(sCallDate=’01.20.01’(Log) -sCallDate=’01.20.01’^CutomerSatisfaction>=1(Log))
JOIN
Customerid Customer)
(c) [3pt] Write in SQL: The ProductName which the largest number of surveyed customers had bought
Create
View ProductCount AS
select ProductName AS Pname, Count(*) AS PCount
from
Product P, Log L
where
P.Productid=L.Productid and CustomerSatisfaction>=1
group
by P.Productid
Select
Pname
From
ProductCount PC1
Where
PC1.Pcount = (Select Max(PC2.Pcount)
From ProductCount PC2)
(d) [3pt] Write in SQL: The productname with the largest average satisfaction level among surveyed customers (who had bought that product).
Create
View ProductSatisf AS
select ProductName AS Pname,
Avg(CustomerSatisfaction) AS PSatisf
from
Product P, Log L
where
P.Productid=L.Productid and CustomerSatisfaction>=1
group
by P.Productid
Select
Pname
From
ProductSatisf PS1
Where
PS1.PSatisf = (Select Max(PS2.PSatisf)
From ProductSatisf PS2)
(e) [3pt] Write in SQL: The agent who, between Jan 1, 2001 and Feb 1, 2001, surveyed more customers than the average number of customers surveyed by all agents during the same period. Do not use nested subqueries for answering, but you are allowed use of views.
Create
view AverageNumCust as
select count(Customerid)/count(distinct Agentid) as
avNumCustomers
from Agent A, Log L
where
A.Agentid=L.Agentid and L.CallDate BETWEEN cast (’02.01.2001’ as Date)
and cast (’01.01.2001’ as Date)
Select A.AgentName
From Agents A, Log L, AverageNumCust
AV
Where L.Agentid=A.Agentid and
L.CallDate BETWEEN
cast (’02.01.2001’ as Date) and cast
(’01.01.2001’ as Date)
Group by L.Agentid
Having count
(*)>AV.avNumCustomers
(f) [5pt] Write in SQL: Products that have been bought by at least 10% of all customers (who were asked about that product) and for which average level of satisfaction by a customer is at least 5.
Create
view ProductStats as
Select Productid,count(Customerid)
as numCust
From Log
Where CustomerSatisfaction>=1
Group by Productid
Having AVG(CustomerSatisfaction)>
4
Select
distinct ProductName
From
Product P, ProductStats S
Where
P.Productid=S.Productid and S.numCust>=
0.1*(select
count(distinct Customerid)
From Log L
Where L.Productid=S.Productid)