For the first two questions, consider the following schema:

Jedi-Teams (master, apprentice)

Jedi(name, side, home-planet)

Government(leader planet, postition)

Inhabitants(specie, planet)

  1. Given a query to find all planetary leaders who are apprentices and use the dark side of the force:

select leader

from Jedi-Teams, Jedi, Government

where apprentice = name and

name = leader and

side = 'dark'

  1. Express this query in terms of relational algebra
  2. Answer:

  3. Write your expression as the corresponding logical query plan
  4. Answer:

  5. Now, according to System-R style optimization, write the best and worst logical query plan (involving only the relations given, wise guys) possible.

Answer: Best: Worst:

    1. Under what circumstances would you expect to see the biggest difference?
    2. Answer: I’d expect to see the biggest difference when there are a small proportion of Jedi who use the dark side

    3. Under what circumstances would you expect to see the smallest difference?

Answer: I’d expect to see the smallest difference when most Jedi use the dark side

 

2. Please look at the following query:

select count(*), home-planet

from Jedi, Inhabitants

where specie = 'wookies' and

planet = home-planet and

side = 'light'

group by home-planet

    Answer:

  1. Express this query in terms of relational algebra
  2. Write your expression as the corresponding logical query plan
  3. Answer:

     

  4. Now, according to System-R style optimization, write the best and worst logical query plan possible.

Answer:

    1. Under what circumstances would you expect to see the biggest difference?
    2. Answer: I'd expect to see the biggest difference when there was a wide variation on planets, home-planets, and there were a lot of species other than wookies, and wookies lived on a large number of planets.

    3. Under what circumstances would you expect to see the smallest difference?

Answer: I'd expect to see the smallest difference (or even have the best plan run slower) if wookies lived on only one planet, and there were a very small number of planets and home planets.

  1. What kind of optimizations are you able to make using the group by? If none, why can't you improve your plan with it?

Answer: I assumed that the number of home planets and planets total would be significant; thus we can optimize by pushing down the group by. If there were few, then the savings would not be as great, and this might be a bad place to do this.

3. Here are two plans for the same query:

  1. With no knowledge about the data sources, which plan would you prefer?
  2. Answer: Plan A

  3. Is there a reason that you might prefer the other plan?

Answer: If there were going to be no tuples out of the join of A and B, and all of them had A.A equal to 3, I would prefer the second plan. It doesn’t save that much work (since selects are cheap), but it does save some.

  1. In the following example, push the selection beneath the union:

 

 

5. Can you un-nest the following query?

Select A.A

From A

Where A.B = 42

and A.C in (

Select B.A

From B

Where B.D = 'Darth' and

A.E = B.B

)

Answer:

Yes (well, I can):

Select A.A

From A,B

Where A.B = 42 and

B.A = A.C and

B.D = ‘Darth’ and

A.E = B.B

6. Consider the conjunctive queries Q1 and Q2.

Q1: p(U,Z) :- q(U,V) & q(X,Y) & r(Y,Z) & r(V,X)

Q2: p(U,V) :- q(Y,U) & q(U,X) & r(U,V) & r(X,Y)

Is Q1 contained in Q2? Is Q2 contained in Q1? Justify your answers.

Answer:

Q2 is contained in Q1. Containment mapping from Q1 to Q2: U->U, Z->V, V->X, X->Y, Y->U.

7. Consider the following query and views:

Q(x) :- e1(x), e2(x,y), e3(y,z), y > 25

V1(A):- e1(A)

V2(B):- e2(B,C), C > 25

V3(E):- e2(E,D), e3(D,F), D > 24

V4(G):- e2(G,H), e3(H,I), H > 26

V5(K):- e1(J), e2(J,K), e3(K,L), K > 25

V6(M,N):- e2(M,N)

V7(P):- e1(O),e2(O,P)

V8(R):- e3(R,R)

    1. In a query optimization context,
    1. For each of V1-V8, is it able be used in rewriting Q? If not, why?
    2. Answer:

      V1: Yes.

      V2: No, we need to use y in e3 in addition to e2, and it is not mapped by V2, and C is existential in V2

      V3: No, y is mapped to D, and D needs to be greater than 25, and in V3 it’s only > 24 and D is existential

      V4: No. We are looking for an equivalent rewriting, and this will only return a subset of the correct answers

      V5: No. We need x to be distinguished since it is returned in the head. Since x is mapped to J, and J is existential, it cannot be used.

      V6: Yes.

      V7: No. we need to map x to o, and since x is distinguished in the query, we can't use it.

      V8: No. We are looking for an equivalent rewriting, and thus we cannot accept that y and z are equated

       

       

    3. Is there an equivalent rewriting? If so, what is it?

Answer:

No.

    1. In a data integration context,
    1. Does your answer for a.i change for any of the views? If so, which ones, and why?
    2. Answer:

      Yes. V4 and V8 can now both be used because we are looking for maximally contained rewritings rather than equivalent rewritings.

    3. Assuming that each of the sources are non-overlapping, if there is a maximally contained rewriting, what is it?

Answer:

(Q’1(x):-V1(x), V4(x)) È (Q’2(x):-V1(x), V6(X,Y), V8(Y), Y > 25)