Student(, FName, LName, Class) Professor(, FName, LName, Office, Salary) Section(, Year, Prof_SSN) Transcript(, Grade) 1. Write XQuery for i. For each instructor, list all the courses he/she teaches in the format of [subject] [course number]: [title]. (e.g. ANTH 211 Introduction to Anthropology ) ANSWER: { for $i in distinct-values(document("reed.xml")/root/course/instructor) return { $i, for $c in distinct-values(document("reed.xml")/root/course[instructor=$i]) return {$c/subj/text(), " ", $c/crse/text(), " ", $c/title/text()} } } ii. Find all the "over crowded provinces" in China, where "over crowded provinces" are defined to be the ones whose population density is greater than that of China itself. ANSWER: { let $china_pop := (document("mondial-3.0.xml")/mondial/country[@name="China"]/@population) let $china_area := (document("mondial-3.0.xml")/mondial/country[@name="China"]/@total_area) for $p in document("mondial-3.0.xml")//country[@name = 'China']/province where ($p/@population div $p/@area) > ($china_pop div $china_area) return {fn:string($p/@name)} } 2. Consider the following relational schema. An employee can work in more than one department; the pct time field of the Works relation shows the percentage of time that a given employee works in a given department. Emp(eid: integer, ename: string, age: integer, salary: real) Works(eid: integer, did: integer, pct time: integer) Dept(did: integer, budget: real, managerid: integer) Write integrity constraints (domain, key, foreign key, or CHECK constraints; or assertions) or triggers to ensure each of the following requirements, considered independently. a. Employees must make a minimum salary of $1000. -- One possible solution: -- ALTER TABLE Emp ADD CONSTRAINT MinSal CHECK (salary >= 1000) b. Every manager must be also be an employee. -- One possible solution: -- ALTER TABLE Dept ADD CONSTRAINT ManIsEmp FOREIGN KEY (managerid) REFERENCES Emp (eid) c. A manager must always have a higher salary than any employee that he or she manages. -- Notes about the problem: -- - Because this involves more than one table, you have the hint that an assertion is needed. -- One possible solution: -- CREATE ASSERTION MangerMakesMore CHECK (NOT EXISTS (SELECT * FROM Emp AS e, Works AS w, Dept AS d WHERE e.eid = w.eid AND w.did = d.did AND e.salary > (SELECT salary FROM Emp AS manager WHERE d.managerid = manager.eid))) d. Whenever an employee is given a raise, the manager¡s salary must be increased to be at least as much. -- Notes about the problem: -- - Here we know we want a trigger because upon a certain event occurring, we want to check a condition and then if the condition is true we want to make a change to the database. - Also I got questions from students about when and when not to use the "FOR EACH ROW CLAUSE". The following modified explanation is based on that given in the Ramakrishnan database textbook (I think it is more clear than the explanation given in our text): A user must specify whether a trigger is to be executed once per modirifed record or once per activating statement. If the action depends on the individual changed records, for example, we have to examine the salary field of the inserted employees to decide whether to increment the count, the triggering event should be defined to occur for each modified record; the FOR EACH clause is used to do this. Such a trigger is called a row-level trigger. On the other hand, there are situations where we would want the trigger to be executed once per INSERT statement, regardless of the number of records inserted. In this case, you omit the FOR EACH clause and such a trigger is called a statement-level trigger. -- One possible solution: -- CREATE TRIGGER ManSalEqEmp AFTER UPDATE OF salary ON Emp REFERENCING OLD ROW AS OldTuple, NEW ROW AS NewTuple FOR EACH ROW WHEN (NewTuple.salary > OldTuple.salary) UPDATE Emp SET salary = NewTuple.salary WHERE eid IN (SELECT man.eid FROM Emp AS man, Works AS w, Dept AS d WHERE NewTuple.eid = w.eid AND w.did = d.did AND d.managerid = man.eid AND man.salary < NewTuple.salary) e. Whenever an employee is given a raise, the manager¡s salary must be increased to be at least as much. Further, whenever an employee is given a raise, the department¡s budget must be increased to be greater than the sum of salaries of all employees in the department. -- One possible solution: -- CREATE TRIGGER ManSalEqEmpAndRaiseBudget AFTER UPDATE OF salary ON Emp REFERENCING OLD ROW AS OldTuple, NEW ROW AS NewTuple FOR EACH ROW WHEN (NewTuple.salary > OldTuple.salary) BEGIN UPDATE Emp SET salary = NewTuple.salary WHERE eid IN (SELECT man.eid FROM Emp AS man, Works AS w, Dept AS d WHERE NewTuple.eid = w.eid AND w.did = d.did AND d.managerid = man.eid AND man.salary < NewTuple.salary); UPDATE Dept SET budget = 1 + (SELECT Sum(salary) FROM (SELECT DISTINT e.eid, salary FROM Emp AS e, Works AS w WHERE NewTuple.eid = w.eid AND w.eid = e.eid)); END 3. Consider the following scenario: Dept(did: integer, projid: integer, budget: real, status: char(10)) Proj(projid: integer, code: integer, report: varchar) Assume that each Dept record is 40 bytes long and each Proj record is 2000 bytes long on average. There are 5000 tuples in Dept (note that did is not a key by itself) and 1000 tuples in Proj. Each department, identified by did, has 10 projects on average. The file system supports 4000 byte pages, and 12 buffer pages are available. All following questions are based on this information. You can assume uniform distribution of values. State any additional assumptions. The cost metric to use is the number of page I/Os. Consider the following query: SELECT D.did, COUNT(*) FROM Dept D, Proj P WHERE D.projid=P.projid AND D.budget>99000 GROUP BY D.did Assume that department budgets are uniformly distributed in the range 0 to 100,000. a. Draw the most naive query execution tree. -- Notes about the problem: -- - I am basing my b34utiful ascii art on the drawing on pg. 807 of your book. Thanks to Dylan Carney for giving me the idea to do wonderful ascii art!! -- One possible solution: -- GROUP BY(D.did), COUNT(*) | | SELECT budget>99000 | | NATURAL JOIN | | | | Proj Dept b. Assuming no prior indices, what is the best join algorithm to use? Be sure to show that there is enough buffer space to run the algorithm you chose. -- Solution: -- Hash join would is the best algorithm to use. The general rule of thumb is that the amount of buffer space needed for hash join is sqrt(number of pages need for smaller table). Let M be the number of pages in Dept: M = (5000 * 40) / 4000 = 50 Let N bet the number pages in Proj: N = (1000 * 2000) / 4000 = 500 Therefore, sqrt(50) ~= 7 which is less than the number of buffer pages available (12) so we can use hash join. c. Using the algorithm you chose above, compute the cost of executing this j join. -- Solution: -- The formula for the cost of hash join is 3(M + N) so, cost = 3(50 + 500) = 1650 I/Os. d. Now draw the best query execution plan you can think of for this query. -- Solution: -- GROUP BY(D.did), COUNT(*) | | PROJECT D.did | | NATURAL JOIN / \ | | | | SELECT | D.budget > 99000 | | | | | Dept Proj e. How does this change the cost of the join algorithm you chose above? -- Solution: -- With the selection placed before the join, there will only be 1% as many tuples coming from Dept in the join (the 1% comes from the fact that it is given that the budgets are uniformly distributed from 0 - 100000). Now, M = (50 * 40) / 4000 = .5 (this rounds to 1 buffer page) = 1 Still, N = 500 Therefore, the cost of the hash join is now, cost = 3(1 + 500) = 1503 page I/Os f. Now suppose that there is a clustered B+ tree index on {D.did, D.budget} and a hash index on {P.projid}. Describe the plan with the lowest estimated cost. -- Solution: -- Retrieve Dept tuples that satisfy the condition on budget in did order by using the clustered B+ tree index while probing Proj and counting tuples in each did group on-the-fly.