Too Few Threads = Waiting
•
When the enabled threads are too few to
cover the latency, processors finish
computing before next data arrives
•
Not enough parallelism
•
Communication subsystem may be less efficient
Load: n+1
Load: n
Load: n+2
n+1
n
n+2
Store: n+1
Store: n
Store: n-1
Store: n-2
n-1
Theoretically, P log P threads are needed, minimum