---------  Section 10: Map-Reduce Algorithms 

--  From chapter http://i.stanford.edu/~ullman/mmds/ch2.pdf 
--  sections 2.3.3. -- 2.3. 9.

We study basic algorithms for the Map-Reduce framework. 

The basic operations:

Map    : read the input, and produce key-value pairs: (key,value)

Reduce : take a key-list pair: (key,[t1, ..., tk]) and produces another key-list pair: (key,[t1', ..., tl']) 

-------------------------------------------------
1. Matrix-Vector Multiplication
-------------------------------------------------

How do we multiply a (n x n) matrix M with a vector v ?

Map    : each map task takes entire vector v (why is this possible?)
         and a chunk of the matrix M. Then, it produces the key-value
         pair (i, m_ij x v_j ) for each entry m_ij

Reduce : every reduce task gets an entire row (i-th row). It then sums
         all the elements to get the value of the result vector at
         coordinate i. Result (i, x_i) 

Q1: does it matter how the matrix is initially divided between the servers?

Q2: what happens if the vector v is too big to fit into the memory?

Q3: how is the computation possible if n = 10^10 (that is the size of the WWW)
 
-------------------------------------------------
2. Relational Algebra Operators
-------------------------------------------------
 
--> Selections: (easy, try to think how it is done. Do we even need the reduce step?)

--> Projections:

Map    : for each tuple t, project to get t' and output: (t',t')     

Reduce : keep just one from each list (t', [t',..., t'])

Q4: what is a combiner? How can we use it in this case? 

--> Union / Intersection

Map    : for any tuple t, output the key-value pair: (t,t)

Reduce : [UNION] how many values will each key have? What do we need to do?
         [INTERSECTION] again, how many values? 


--> Difference (R - S)

A tuple is in (R - S) if it is in R and not in S. Hence, we must "label" eaach tuple in the mapper 

Map    : If t belongs in R (t,R) 
         else in S (t,S)

Reduce : output a tuple t only if the list is [R]. What are the other possible outcomes for the list?


--> Join: R(A,B) with S(B,C)

Map    : for tuple R(a,b) produce: (b,(R,a)) and for S(b,c) produce: (b,(S,c))

Reduce : the list will have R and S-tuples. Compute the two sublists of the initial list and then create
         all the possible pairs.


Q5: what can we do if we want to save some space on the join result?

Q6: what happens when there is only one joining value? 

Q7: how would we compute the join R(A,B) S(A,C) T(A,D) ?

Q8: can we do something more intersting if one R >> S ? 

 
--> Grouping and Aggregation 

Map    : use as key the attributes that are used for the group by

Reduce : apply the aggregation operator to the list

Q9: what happens when we need to count everything, e.g. the size of the relation or sum a column ?