Latency Hiding In Programming Model
•Bulk-synchronous programming (BSP) is a solution by Valiant
•Computation executes in supersteps:
–Assume threads execute 3-address code  a:=b op c;
•[Load]  Fetch operands from memory for many threads
•[Compute]  For all threads having available operands compute a:=b op c
•[Store] Return the result to memory
•With many threads, there is compute work enough to hide the data transmission time
Each thread may not execute on each cycle