Review of "The Performance Implications of Thread Management Alternatives for Shared-Memory Multiprocessors"

From: Jeff Duzak (jduzak_at_exchange.microsoft.com)
Date: Sat Jan 24 2004 - 13:51:18 PST

  • Next message: David Coleman: "Anderson et al. Thread paper review"

    This paper describes five possible implementations of a thread
    management package on multiprocessor machines. The paper is focused on
    how lock contention affects the throughput and latency of thread
    creation, startup, and finish. Empirical performance results for the
    five alternatives is provided. Further, models for two of the
    alternatives are created in order to further explore the effect on
    performance of certain variables.
     
    The five alternatives start from the most simple and centralized (ie,
    having a single central lock for all threading activities) and progress
    toward more distributed schemes (ie, having one queue per processor of
    threads ready to execute). Intuitive arguments describe the potential
    benefits and pitfalls of each approach. The performance of each
    alternative is then measured on a test application designed to use very
    fine-grained threading (that is, spawning and destroying threads very
    frequently, with each thread doing a very small amount of work). The
    result was that the local ready-queue approach gives the best
    performance for such an application.
     
    Next, a discussion is given regarding three alternatives for multiple
    processors spin-waiting on a common lock. The choice of a method to
    spin-wait on a lock has a great affect on performance primarily because
    of bus contention caused by querying and setting the state of the lock.
    Further, it affects how often processor memory caches are invalidated.
    This discussion is especially pertinent to thread management routines,
    as they inevitably have to use common locks, but the discussion also
    applies to any situation involving common locks used by multiple
    processors, especially when they are used with high frequency and high
    possibility of contention. The discussion concludes that an
    ethernet-style backoff routine provides the best performance, although
    it has the drawback of not providing fair access to a lock to all
    processors.
     
    Finally, models are created to simulate the affect of certain variables,
    including execution time of a thread and bus load of a thread, on the
    performance of two thread management routines. The models are first
    verified against empirical results. Then, the models are used to
    predict the behavior of the alternatives in different situations. The
    general results are that the local ready queue method provides better
    performance than a central ready queue, especially as the number of
    processors increases.
     
    This paper is interesting as an exploration of the very fundamentals of
    thread-management. The situation of interest (that is, fine-grain
    threading) is somewhat foreign to me, as I usually think of threads as
    long-lived (as for example, in an application with one UI thread and one
    background processing thread). It was also interesting to see how
    accurate the formal analysis ended up being, given how complicated of
    the actual situation. Last, this paper made me curious to think about
    how hardware support could be given to facilitate thread management.


  • Next message: David Coleman: "Anderson et al. Thread paper review"

    This archive was generated by hypermail 2.1.6 : Sat Jan 24 2004 - 13:51:24 PST