CSE 444 Mid Term

Feb 5, 2001

3:35pm 4:50 pm

 

Name____________________________________________________

 

Total number of points: 75

Total time: 1hr 15 min

Reminders

a)    Do not forget to write your name

b)    Closed book, Closed notes

c)    Write legibly. If we cannot read your answer, you get no credit

d)    Do not get stuck at any one question

e)    Length of an answer is determined by the question. Do not feel compelled to fill all the available space J

f)      Correctness in schema design and SQL is of primary importance, but simplicity and good taste matter too.

Score

Q# Max. Point Your Point

1. 10

2. 10

3. 20

4. 15

5.                                          20

TOTAL 75

1. [10pt ] Multiple Choice [1pt each] Circle the correct answer (Yes/No) against each statement

a.      Using Call Level Interface (such as ODBC) requires a DBMS vendor-specific precompiler. NO

b.      ODBC Drivers are DBMS vendor-specific. YES

c.      Stored Procedures are executed in the client NO

d.      All views are physically materialized (stored) as tables NO

e.      Strict Relational Algebra can not express all SQL queries YES

f.        Referential integrity is a constraint that holds between columns of the same table NO

g.      Size of a left outerjoin between R and S is always at least as large as that of (R Join S), assuming the join is on the same set of columns YES

h.      Size of a left outerjoin between R and S is always at least as large as that of R (as well as S). : This was a confusing question. Left outerjoin between R and S is at least as large as that of R, but no conclusions may be drawn on whether it is larger or smaller than S. You would have been awarded a point whether you answered Yes or No.

i.         Size of a join is at least as large as its inputs, i.e., size-of(S Join T) is greater of equal to Max(size-of(S), size-of(T)) NO

j.         Result of (select * from T) cannot include any duplicates NO

2. [10pt] Short Answer Question. Answer with justification

(a)   [5pt] Consider the following two queries. Q1: pa (S) pa (T) and Q2: pa (S T). Are they equivalent? If not, is the answer to one of them always larger in number of tuples compared to the other one? Justify.

They are not equivalent. Answer to Q1 is at least as large as that of Q2. This is because S intersect T and then project is potentially smaller as S and T need to agree on all attributes. If the projection happens first, they only need to agree on the projected attribute.

(b)   [5pt] Consider the following two queries:

Q1: Select X From R R1 Where

NOT EXISTS (Select * from R where X > R1.X)

Q2: Select Max(X) from R

Are they equivalent? If not, is the answer to one of them always larger in number of tuples compared to the other one? Justify.

These two queries are not equivalent. Answer to Q1 is at least as large as that of Q2. This is because Q1 returns all tuples from R whose X attribute value is maximal (in case that there is more than one). Q2 always returns only a single tuple.

3. [20pts] Entity Relationship Model and Decomposition

Every course is held one or more times a week at different start and end times. However, the class is held in the same room every week and is taught by the same teacher for every lecture. Every teacher has attributes facultyid (unique), name and rank. Each student has attributes studentid (unique), name, and department. Every course has attributes courseid (unique), and title. Furthermore, a subset of the courses is designated as required courses. Every student may be enrolled in at most 3 courses but must enroll in at least one course. Thus, every student must enroll in at least one required course but optionally in two other courses. Note that a student can enroll in more than one required courses (up to the limit of 3).

(a)   [12pt] Draw an E-R Diagram that captures the above constraints. Justify why your E-R diagram is accurate. Identify constraints that the E-R diagram is unable to capture.

 

We are unable to express that each student has to have at least one of the required courses, as well as the fact that the student may take no more than three classes total.

(b)   [8pt] Map your E-R Diagram into a relational schema with brief justification. Make sure to identify keys.

Course(courseid, title,room)

Student(studentid, name,department)

Teacher(facultyid,name, rank)

TaughtBy(facultyid,courseid)

Meets(courseid,day, starttime,endtime)

Has(studentid ,courseid)

ReqHas(studentid,courseid)

Day(day)

4. [15 points] Functional Dependency and Decomposition

Consider the relation R(A,B,C,D,E) with the set of functional dependency as {A->B, B->C, C->A, D->E, E->D}

a.       [5pt] What are all the keys for R? Justify briefly.

A+ ->{A,B,C}

D+->{D,E}

Keys: {A,E}, {A,D}, {B,D}, {B,E}, {C,D}, {C,E}

These determine all other attributes of the relation

b.       [5pt] How many superkeys are present? Justify briefly.

{A,D}, {A,E}, {B,D}, {B,E}, {C,D}, {C,E}

{A, B,D}, {A,C,D},{A,B,E}, {A,C,E},{A,D,E}

{B,C,D}, {B,C,E}, {B,D,E}, {C,D,E}

{A,B,D,E}, {A,B,C,D}, {A,B,C,E}, {A,C,D,E},{B,C,D,E}

{A,B,C,D,E}

The faster way to answer this question is:

Each superkey must include one of {A,B,C} and one of {D,E}.

(23-1)*(22-1)=21

c.       [5pt] Do a decomposition in BCNF with brief justification.

A violation is A->B,C so we split as ABC and ADE. D->E is a violation in ADE , so decompose to AD and DE.

Final decomposition: {A,B,C}, {A,D}, {A,E}

5. [20pt] SQL and Relational Algebra

Consider the following schema that depicts calls made to customers by agents to survey the degree of satisfaction in using products. Every agent calls a customer only once to enquire about a specific product. But, the agent may not enquire about all the products with every customer and all customers may not be surveyed.

Agent(Agentid, AgentName)

Product(Productid, ProductName)

Customer(Customerid, Customername)

Log(Agentid, Customerid, Productid, CallDate, CustomerSatisfaction)

/*CustomerSatisfaction is a number between 1 and 10 if the customer has bought the product. Otherwise it is NULL*/.

Write the following SQL queries over this schema:

(a)   [3pt] Write CREATE TABLE statement for Log that identifies the appropriate keys and foreign keys. Assume that CustomerID, Productid are of characters of length 10 and CallDate is of type Date. CustomerSatisfaction has been explained above.

CREATE table Log(Agentid CHAR(10) NOT NULL,

Customerid CHAR(10) NOT NULL,

Productid CHAR (10) NOT NULL,

CallDate DATE,

CustomerSatisfaction INTEGER DEFAULT NULL check

(CustomerSatisfaction>0 and CustomerSatisfaction<11)

Primarykey(Agentid,Customerid,Productid),

Foreign key (Agentid) references Agent,

Foreign key (Productid) references Product,

Foreign key (Customerid) references Customer)

(b)   [3pt] Write in Relational Algebra (you can use intermediate relation variables, if needed): List of customers agents spoken to by agents on Jan 20 and who did not buy any products brought up on during their call with the agents (as of Jan 20)

pCustomerName(

(sCallDate=01.20.01(Log) -sCallDate=01.20.01^CutomerSatisfaction>=1(Log))

JOIN Customerid Customer)

(c)   [3pt] Write in SQL: The ProductName which the largest number of surveyed customers had bought

Create View ProductCount AS

select ProductName AS Pname, Count(*) AS PCount

from Product P, Log L

where P.Productid=L.Productid and CustomerSatisfaction>=1

group by P.Productid

 

Select Pname

From ProductCount PC1

Where PC1.Pcount = (Select Max(PC2.Pcount) From ProductCount PC2)

(d)   [3pt] Write in SQL: The productname with the largest average satisfaction level among surveyed customers (who had bought that product).

Create View ProductSatisf AS

select ProductName AS Pname, Avg(CustomerSatisfaction) AS PSatisf

from Product P, Log L

where P.Productid=L.Productid and CustomerSatisfaction>=1

group by P.Productid

 

Select Pname

From ProductSatisf PS1

Where PS1.PSatisf = (Select Max(PS2.PSatisf) From ProductSatisf PS2)

 

(e)   [3pt] Write in SQL: The agent who, between Jan 1, 2001 and Feb 1, 2001, surveyed more customers than the average number of customers surveyed by all agents during the same period. Do not use nested subqueries for answering, but you are allowed use of views.

Create view AverageNumCust as

select count(Customerid)/count(distinct Agentid) as avNumCustomers

from Agent A, Log L

where A.Agentid=L.Agentid and L.CallDate BETWEEN cast (02.01.2001 as Date)

and cast (01.01.2001 as Date)

 

Select A.AgentName

From Agents A, Log L, AverageNumCust AV

Where L.Agentid=A.Agentid and L.CallDate BETWEEN

cast (02.01.2001 as Date) and cast (01.01.2001 as Date)

Group by L.Agentid

Having count (*)>AV.avNumCustomers

 

(f)     [5pt] Write in SQL: Products that have been bought by at least 10% of all customers (who were asked about that product) and for which average level of satisfaction by a customer is at least 5.

Create view ProductStats as

Select Productid,count(Customerid) as numCust

From Log

Where CustomerSatisfaction>=1

Group by Productid

Having AVG(CustomerSatisfaction)> 4

 

Select distinct ProductName

From Product P, ProductStats S

Where P.Productid=S.Productid and S.numCust>=

0.1*(select count(distinct Customerid)

From Log L

Where L.Productid=S.Productid)