Review: Thread Management Alternatives for Shared-Memory Multiprocessors

From: Raz Mathias (razvanma_at_exchange.microsoft.com)
Date: Mon Jan 26 2004 - 16:44:32 PST

  • Next message: Praveen Rao: "Review of 'The Performance Implications of Thread Management Alternatives for Shared-Memory Multiprocessors' paper"

    This paper deals with the efficient implementation of threads in
    multiprocessor systems. It begins by stating the fact that the purpose
    of threads is to separate the execution context from process accounting
    (including paging, handles to resources, virtual memory, etc.). On
    uniprocessor systems threads allow for efficient overlapping of
    processing with I/O; on multiprocessor systems, we overlap the
    processing itself.

    The paper mentioned some general ways of improving throughput which, I
    think, are worth keeping in mind when doing any kind of parallel
    programming. The first way is to check if a resource is available
    before trying to obtain a lock for it. This way, if the resource is not
    available, the thread can continue doing other useful work and can try
    again later. Another way is to divide the work of a single coarse lock
    amongst many smaller ones; this has the side effect of increasing
    latency as well as throughput.

    The paper's first major contribution was the description of various
    thread management strategies and their performance implications. The
    basic implementations were the following:

    1. Using a single, global lock - obviously yielding poor latency
    and throughput,
    2. Using multiple fine-grained locks - increasing throughput, but
    latency as well
    3. Using per-processor free lists - shifting the throughput
    bottleneck onto the ready queue. To make this implementation work well,
    we need to load-balance the free lists amongst the processors. With
    this approach, we run into the problem of having to pre-allocate thread
    stacks for every element in the free list, which can get expensive.
    4. Keeping a list of idle processors for starting threads -
    lowering latency when there are idle processors available and increasing
    latency by a large amount when there are none
    5. Keeping per-processor ready queues - performance can be poor if
    we have a processor with no threads on its free lists. One solution
    presented was to have idle processors search the ready queues of all
    other processors for work to do; the impact is that time spent searching
    could have been time spent processing.

    Performance-wise, it appears that the local-ready-queues solution scaled
    the best (throughput-wise) out of the alternatives. It also appears
    that this solution had the largest amount of latency. The
    idle-processor solution had the best latency characteristics (up to the
    point where there are no idle processors); this advantage in latency was
    offset by its modest latency performance.

    The paper then stated that blocking cannot be utilized for
    multiprocessor synchronization because of their poor performance
    characteristics. Three spin-lock alternatives were then presented that
    depended only on a test-and-set() operation. The first alternative was
    to block on test-and-set(). The second was to only block on
    test-and-set() after we've read the value of the lock and found that it
    wasn't being used. The third was to use an exponential back-off when
    trying to obtain a lock. I found the discussion interesting about how
    running the test-and-set() operation invalidates CPU caches causing
    other running threads to be impacted negatively performance-wise.

    I though that this paper presented a good overview of the multiprocessor
    thread scheduling alternatives and presented salient statistics on the
    performance of each alternative. I would really like to know whether
    Linux or Windows use the exponential back-off algorithm in their
    multiprocessor thread-scheduling implementations.

     


  • Next message: Praveen Rao: "Review of 'The Performance Implications of Thread Management Alternatives for Shared-Memory Multiprocessors' paper"

    This archive was generated by hypermail 2.1.6 : Mon Jan 26 2004 - 16:44:31 PST