================================================================ CSE 344 -- Spring 2011 Lecture 10: Finishing: Relational Calculus. XML ================================================================ Reading assignment: 5.3, 5.4 Schema: /***************************/ /* Likes(drinker, beer) */ /* Frequents(drinker, bar) */ /* Serves(bar, beer) */ /***************************/ Recall datalog rules: "Find all bars that serve some beer that Fred likes" A(x) :- Likes('Fred',y), Serves(x,y) Recap: why do we use datalog ? -- simple, logic language, based on rules -- no quantifiers, no "or" -- can be extended to recursion BUT we don't discuss this in class -- can be translated easily to SQL (*** practice this at home !) -- today: it is used as intellectual foundation for large scale data analytics. What is datalog missing ? -- Universal quantifiers ! ================================================================ Relational Calculus = Predicate Calculus AND, OR, NOT Forall, Exists --------- "Find all bars that serve some beer that Fred likes" A(x) = Exists y. Likes("Fred", y) AND Serves(x,y) --------- "Find all bars that serve all beers that Fred likes" A(x) = Forall y. Likes("Fred", y) => Serves(x,y) Note: P ==> Q is the same as (not P) OR Q A(x) = Forall y. not(Likes("Fred", y)) OR Serves(x,y) --------- *** In class: Average: "Find drinkers that frequent some bar that serves some beer they like" Prudents: "Find drinkers that frequent only bars that serve some beer they like" Paranoic: "Find drinkers that frequent only bars that server only beer theylike" *** At home: Sad: "Find drinkers that frequent some bar that serve only beer they don't like" Pessimists: "Find drinkers that frequent only bars that serve some beer they dont like" Depressed: "Find drinkers that frequent only bars that serve only beer they don't like" ================================================================ What's wrong ? A(x) = not Likes("Fred", x) A(x) = forall y. Serves(x,y) A(x,y) = Likes("Fred", x) OR Serves("Bar", y) ---------------- These are "domain dependent" queries. They depend on the domain, and we don't accept such queries. ================================================================ TRASLATIONS. THIS IS IMPORTANT ! 1. From Relational Calculus to non-recursive datalog with negation: Guideline 1: If the query is A = Q1 or Q2 then: A(...) :- translate_Q1 A(...) :- translate_Q2 Guideline 2: If the query is A = Q1 and Q2 then: A(...) :- translate_Q1, translate_Q2 Guideline 3: If the query is A = exists x.Q then: B(x, ...) :- translate_Q A(...) :- B(x, ...) (just remove x from the head vars) Guideline 4: If the query is A = forall x.(Q1 ==> Q2) then B(...) :- translate_Q1, not (translate_Q2) A(...) :- translate_Q1, not B(...) Note: these are *guidelines* only ! Use judgment ------------- 2. From Non-recursive datalog to SQL Single guideline: every "not" becomes a subquery. -------------- Example: "Find all bars that serve all beers that Fred likes" Relational calculus: A(x) = Forall y. Likes("Fred", y) => Serves(x,y) Note: this query is actually domain dependent ! If Fred likes nothing, then EVERY x in the universe is a good answer. Fix it to: A(x) = Serves(x,-) and [Forall y. Likes("Fred", y) => Serves(x,y)] Datalog notation: B(x) = Serves(x, -) and Likes("Fred", y) and not Serves(x,y) A(x) = Serves(x, -) and not B(x) SQL: select s1.bar from Serves s1 where s1.bar not in (select s2.bar from Serves s2, Likes l where l.drinker = 'Fred' and not exists (select * from Serves s3 where s2.bar = s3.bar and l.beer = s3.beer) ------------- Question: can you write the query above WITHOUT subqueries Answer: discuss "monontone queries" in class and their implication to query design