Too Few Threads = Waiting
•When the enabled threads are too few to cover the latency, processors finish computing before next data arrives
•Not enough parallelism
•Communication subsystem may be less efficient
Load: n+1
Load: n
Load: n+2
n+1
n
n+2
Store: n+1
Store: n
Store: n-1
Store: n-2
n-1
Theoretically, P log P threads are needed, minimum