•To hide memory
reference latency (l of the CTA
model)
requires that there be many more threads (work) than there are
processors
•A thread is a sequence of instructions operating on a
small quantity
of data -- for example, a loop iteration
•The
idea is that a processor with many threads to execute, can switch to
execute another thread when it is stalled waiting for a memory reference, getting
productive work done during the wait time
•The
idea can be used in either a programming model or hardware implementation