Review for "The Performance Implications of Thread Management Alternatives for Shared-Memory Multiprocessors"

From: Justin Voskuhl (justinv_at_microsoft.com)
Date: Mon Jan 26 2004 - 13:39:25 PST

  • Next message: Brian Milnes: "Fast Threads Review"

    This paper discusses alternatives for organizing thread management data
    structures in the kernel in a shared memory multiprocessor. There are a
    number of design alternatives that are compared, then two key
    performance metrics are analyzed for the various designs. The first
    performance metric is latency (how long on average we have to wait for
    service) and throughput (the number of thread operations that can
    complete in a given time-period.) Various models are proposed to
    explain the performance dynamics that are observed.

     

    There were five major design alternatives given for the threading
    architecture inside the kernel. The first simple design is that of a
    global lock for all thread-related data structures in the kernel. Then
    second design is to let each major data structure have its own lock.
    The third design is to let each processor have its own Free list and
    have a centrally locked Ready queue. The next design was more
    interesting, which was to maintain queues of idle processors, and on
    each processor maintain a Free list. The last design was to move most
    of the thread-related data structures into per-processor data
    structures.

     

    Various performance metrics are computed for each of the major designs.
    A program is written which really exercises the thread primitives by
    creating, destroying and synchronizing many threads (a million across
    all processors.) We see that for most performance metrics the local
    ready queue design tends to be the most performant. The observation is
    made that simplicity tends to create a performant thread package -
    optimizing special cases tends to not pay off because the code path is
    run so frequently that you don't make it back with the savings from the
    special case.

     

    Because of the architecture of the machine the tests were conducted on,
    the performance of spin-locks were problematic. As implemented they
    caused a lot of bus traffic back and forth, and this bus traffic
    actually interfered with the thread that was making progress, slowing
    the whole system. Some alternatives to spin-locks are presented,
    including a back-off algorithm inspired by the design of Ethernet. It's
    noted however that the problem with this is that it's unfair - threads
    that have waited longer are now less likely to acquire the lock.

     

    A formal model of bus contention is presented. The model is used to
    predict the behavior of various locking schemes.


  • Next message: Brian Milnes: "Fast Threads Review"

    This archive was generated by hypermail 2.1.6 : Mon Jan 26 2004 - 13:39:33 PST