-- Section 6 : Functional Dependencies, Normal Forms and Decompositions
-----------------------------------------------------------------------


Important notions to remember:

- Data normalization used to resolve data redundancy
- What is a functional dependency? 
- Armstrong rules: Reflexivity, Augmentation, Transitivity 
- Closure of a set of attributes
- Keys and superkeys 


The BCNF (Boyce-Codd Normal Form)
-> Def. A relation R is in BCNF if every set of attributes is either a superkey or its closure is the same set.

The BCNF Decomposition algorithm.

------------- Example 1 ---------------------

The relation is R (A, B, C, D, E) and the fd's are: D -> B and CE ->  A

Notice that the closure {D}+ = {D,B}, which is not {D} nor {A,B,C,D,E}. So we apply the first step and divide R into R_1(D,B) and R_2(D,A,C,E). R_1 is now fine and D is the key. What about R_2 ? 

Again we run into trouble when we compute the closure {C,E}+ = {C,E,A}, which is not {C,E} neither {D,A,C,E}. So, we break R_2 into: R_2A(C,E,A) and R_2B(C,D,E). The final decomposition into BCNF is:

- R_1(D,B)
- R_2A (C,E,A)
- R_2B (C,D,E)


------------- Example 2 ---------------------

The relation is R (A, B, C, D, E) and the fd's : A -> E, BC -> A and DE -> B

Notice that {A}+ = {A,E}, violating the BCNF condition. We split R to R_1(A,E) and R_2(A,B,C,D).

R_1 satisfies BCNF now, but R_2 not because of: {B,C}+ = {B,C,A}. Notice that the fd D E -> B has now disappeared and we don't need to consider it! Split R_2 to: R_2A(B,C,A) and R_2B(B,C,D). 


Can we split differently? Let's try with the violation {B,C}+ = {B,C,A,E}. We initially split to R_1(B,C,A,E) and R_2(B,C,D). Now we need to resolve for R_1 the violation {A}+ = {A,E}. So we split again R_1 to R_1A(A,E) and R_1B(A,B,C). The same!

We can also start splitting by considering the BCNF violation {D,E}+ = {D,E,B}. Which is the resulting BCNF decomposition in this case? (it will be a different one)


------------- Example 3 ---------------------


Reminder: A set of attributes X is closed under functional dependencies if X+ = X. Consider now a relation R(A,B,C,D) where the
set of functional dependencies is unknown to us. 

1. What are the fd's if we know that every attribute set is closed? 

   This means we have only trivial functional dependencies! The empty set {} is thus a consistent set of fd's. 


2. What if we know that the only closed sets are {} and {A,B,C,D} ? 

   One solution could be: {A -> B, B -> C, C -> D, D -> A}. What are other possible consistent sets of fd's ? 
   Can you think of something different than permutations of the above solution?


3. What if the closed sets are {}, {A,B} and {A,B,C,D} ?

   First, notice that {A}+ is not {A}, and since {A,B}+ = {A,B}, it must be that A -> B is an fd. With a similar argument, we have
   that B -> A is another fd we must include.  

   A possible solution is as follows: {C -> ABD, D -> ABC, A -> B, B -> A}. Is there any other solution ? 
 

------------ Lossless-join decomposition ----------------


-> Def. Decomposing R into X and Y is lossless with respect to a set of fd's F if and only if the closure of the F contains either X^Y -> X or X^Y -> Y, i.e. the common attributes form a candidate key in at least one relation.

Why is this desirable? If not lossless, joining them would result in information loss.

For example, consider R(A,B,C) without keys and no fd's. Suppose we partition R into R_1(A,B) and R_2(B,C). Can we combine them after to retrieve R? No, for example let R = {(2,1,3), (4,1,5)}. Then, if we split the relation, we have R_1 = {(2,1),(4,1)} and R_2 = {(1,3),(1,5)}. Now if we join R_1 with R_2 we get R' = {(2,1,3), (2,1,5), (4,1,3), (4,1,5)}, which is not the initial table. There is no way to recover R from R_1 or R_2. 

 But if we had the fd B -> C, then the above instance is not valid (why?) and it is easy to see that the natural join results in getting back the original relation.


-------------------- Example 4 -------------------------------

Consider the relation R(A,B,C,D,E) with fd's: {AB -> C, BC -> D, AD -> E}. We want to check whether the decomposition {ABC, BCD, ADE} is a lossless-join decomposition. 

Start by constructing a tableau as follows: 

  A  |  B  |  C  |  D  |  E
------------------------------
  a  |  b  |  c  |  d1 |  e1
  a1 |  b  |  c  |  d  |  e2
  a  |  b1 |  c1 |  d  |  e


Notice that we use a common distinguished variable (a,b,c,...) if the variable is a key, otherwise we use a non-distinguished symbol (e1, e2, b1,...)

We next start applying the fd's! Notice that the 1st and 2nd row have the same distinguished B and C attributes. Hence, D must be the same by the fd BC -> D. This results in unifying d1 = d. Now the table becomes: 


  A  |  B  |  C  |  D  |  E
------------------------------
  a  |  b  |  c  |  d  |  e1
  a1 |  b  |  c  |  d  |  e2
  a  |  b1 |  c1 |  d  |  e


But now rows 1 and 3 agree on A and D. Because AD -> E, we unify e1 = e. Now, we have:


  A  |  B  |  C  |  D  |  E
------------------------------
  a  |  b  |  c  |  d  |  e
  a1 |  b  |  c  |  d  |  e2
  a  |  b1 |  c1 |  d  |  e

Row 1 contains only distinguished symbols, hence the algorithm terminates and the answer is YES, the decomposition is lossless. If we could not apply any fd and no row had only distinguished symbols, we would terminate with NO. This method is called the "chase".