Review of "The Performance Implications of Thread Management Alternatives for Shared-Memory Multiprocessors"

From: Song Xue (speedsx_at_hotmail.com)
Date: Sun Jan 25 2004 - 23:32:06 PST

  • Next message: Tarik Nesh-Nash: ""The Performance Implications of Thread Management Alternatives for Shared-Memory Multiprocessors" Review"

    The paper "The Performance Implications of Thread Management Alternatives for Shared-Memory Multiprocessors" studies the performance implications of thread management alternatives for shared-memory multiprocessors.
    In a parallel environment, access to shared data structures must be serialized to ensure consistency and correctness. Locking implies dual concerns of latency and throughput. Often the choice involves a tradeoff between the two.
    Five alternative thread management strategies are studied and performance results are measured and analyzed:
    A. Single Lock: Central Data Structures Protected by a Single Lock
    B. Multiple Locks: Central Data Structures Each Protected by a Separate Lock
    C. Local Free List: Per-Processor Free Lists without Locks; A Central Locked Ready Queue
    D. Idle Queue: A Central Queue of Idle Processors; Per-Processor Free Lists
    E. Local Ready Queue: Per-Processor Ready Queues; Per-Processor Free Lists
    The authors find that simplicity is the key to the implementation of a fast thread package. For fine-grained parallelism, small changes in data structures and locking have a significant effect on both latency and throughput. Per-processor data structures can be used to improve throughput; if a resource is not scarce, localizing data can avoid locking, improving latency as well.
    The second half of the paper deals with Spin-Lock management alternatives. If a processor finds a thread management lock busy, it spin-waits for the lock to be released. Spin locks may save the cost of context switching. However, spin-waiting has a hidden cost where processors doing useful work may be slowed by processors that are merely waiting for a lock due to bus contention. The authors then evaluate three different approaches to spin-waiting and the performance results are measured and analyzed:
    A. Spin on Xchgb
    B. Spin on Read
    C. Ethernet-Style Backoff
    The authors find that Spin-waiting can delay not only the processor waiting for a lock, but other processors doing work. The cost of spin-waiting can be reduced by using an Ethernet-style backoff or a queue-based algorithm.


  • Next message: Tarik Nesh-Nash: ""The Performance Implications of Thread Management Alternatives for Shared-Memory Multiprocessors" Review"

    This archive was generated by hypermail 2.1.6 : Sun Jan 25 2004 - 23:32:11 PST