Sample problems and solutions

For the first two questions, consider the following schema:

Jedi-Teams (master, apprentice)

Jedi(name, side, home-planet)

Government(leader planet, postition)

Inhabitants(specie, planet)

Given a query to find all planetary leaders who are apprentices and use the dark side of the force:

select leader

from Jedi-Teams, Jedi, Government

where apprentice = name and

name = leader and

side = 'dark'

Express this query in terms of relational algebra

Answer:

Write your expression as the corresponding logical query plan

Answer:

Now, according to System-R style optimization, write the best and worst logical query plan (involving only the relations given, wise guys) possible.

Answer: Best: Worst:

Under what circumstances would you expect to see the biggest difference?

Answer: I’d expect to see the biggest difference when there are a small proportion of Jedi who use the dark side

Under what circumstances would you expect to see the smallest difference?

Answer: I’d expect to see the smallest difference when most Jedi use the dark side

2. Please look at the following query:

select count(*), home-planet

from Jedi, Inhabitants

where specie = 'wookies' and

planet = home-planet and

side = 'light'

group by home-planet

Answer:

Express this query in terms of relational algebra

Write your expression as the corresponding logical query plan

Answer:

Now, according to System-R style optimization, write the best and worst logical query plan possible.

Answer:

Under what circumstances would you expect to see the biggest difference?

Answer: I'd expect to see the biggest difference when there was a wide variation on planets, home-planets, and there were a lot of species other than wookies, and wookies lived on a large number of planets.

Under what circumstances would you expect to see the smallest difference?

Answer: I'd expect to see the smallest difference (or even have the best plan run slower) if wookies lived on only one planet, and there were a very small number of planets and home planets.

What kind of optimizations are you able to make using the group by? If none, why can't you improve your plan with it?

Answer: I assumed that the number of home planets and planets total would be significant; thus we can optimize by pushing down the group by. If there were few, then the savings would not be as great, and this might be a bad place to do this.

3. Here are two plans for the same query:

With no knowledge about the data sources, which plan would you prefer?

Answer: Plan A

Is there a reason that you might prefer the other plan?

Answer: If there were going to be no tuples out of the join of A and B, and all of them had A.A equal to 3, I would prefer the second plan. It doesn’t save that much work (since selects are cheap), but it does save some.

In the following example, push the selection beneath the union:

5. Can you un-nest the following query?

Select A.A

From A

Where A.B = 42

and A.C in (

Select B.A

From B

Where B.D = 'Darth' and

A.E = B.B

)

Answer:

Yes (well, I can):

Select A.A

From A,B

Where A.B = 42 and

B.A = A.C and

B.D = ‘Darth’ and

A.E = B.B

6. Consider the conjunctive queries Q₁ and Q₂.

Q₁: p(U,Z) :- q(U,V) & q(X,Y) & r(Y,Z) & r(V,X)

Q₂: p(U,V) :- q(Y,U) & q(U,X) & r(U,V) & r(X,Y)

Is Q₁ contained in Q₂? Is Q₂ contained in Q₁? Justify your answers.

Answer:

Q2 is contained in Q1. Containment mapping from Q1 to Q2: U->U, Z->V, V->X, X->Y, Y->U.

7. Consider the following query and views:

Q(x) :- e₁(x), e₂(x,y), e₃(y,z), y > 25

V₁(A):- e₁(A)

V₂(B):- e₂(B,C), C > 25

V₃(E):- e₂(E,D), e₃(D,F), D > 24

V₄(G):- e₂(G,H), e₃(H,I), H > 26

V₅(K):- e₁(J), e₂(J,K), e₃(K,L), K > 25

V₆(M,N):- e₂(M,N)

V₇(P):- e₁(O),e₂(O,P)

V₈(R):- e₃(R,R)

In a query optimization context,

For each of V₁-V_8, is it able be used in rewriting Q? If not, why?

Answer:

V₁: Yes.

V₂: No, we need to use y in e₃ in addition to e₂, and it is not mapped by V₂, and C is existential in V₂

V₃: No, y is mapped to D, and D needs to be greater than 25, and in V₃it’s only > 24 and D is existential

V₄: No. We are looking for an equivalent rewriting, and this will only return a subset of the correct answers

V₅: No. We need x to be distinguished since it is returned in the head. Since x is mapped to J, and J is existential, it cannot be used.

V₆: Yes.

V₇: No. we need to map x to o, and since x is distinguished in the query, we can't use it.

V₈: No. We are looking for an equivalent rewriting, and thus we cannot accept that y and z are equated

Is there an equivalent rewriting? If so, what is it?

Answer:

No.

In a data integration context,

Does your answer for a.i change for any of the views? If so, which ones, and why?

Answer:

Yes. V₄and V₈ can now both be used because we are looking for maximally contained rewritings rather than equivalent rewritings.

Assuming that each of the sources are non-overlapping, if there is a maximally contained rewriting, what is it?

Answer:

(Q^’1(x):-V₁(x), V₄(x)) È (Q^’2(x):-V₁(x), V₆(X,Y), V₈(Y), Y > 25)