Barriers
All processes have to wait at a synchronization point
Processes do not progress until all of them have reached the barrier
Low-performance implementation: use a counter initialized with the number of processes
- When a process reaches the barrier, it decrements the counter (atomically) and busy waits
- When the counter is zero, all processes are allowed to progress (broadcast)
Lots of possible optimizations (tree, butterfly, hardware etc)