Review of Anderson, Lazowska and Levy

From: Gail Rahn (gail_at_screaminggeek.com)
Date: Mon Jan 26 2004 - 11:27:13 PST

  • Next message: Manish Mittal: "Review:The Performance Implications of Thread Management Alternatives for Shared-Memory Multiprocessors."

    Review of "Performance Implications of Thread Management Alternatives..." by
    Anderson, Lazowska and Levy

    This paper discusses alternatives for algorithms and data structures used in
    thread management in multiprocessor systems, and the performance tradeoffs
    associated with changes. The researchers improved the thread-management
    system in a 20-processor Intel 386 computer. Data structures previously kept
    in global memory were moved to local lists at each processor. Alternative
    algorithms to spin-waiting ("busy-wait") were evaluated to reduce processor
    degredation.

    An initial optimization made to thread-management was for an idle processor
    to set-up the next running thread by pre-allocating its control block. At
    thread-creation, the control block already exists, so only the stack is left
    for allocation. (Stacks cannot be preallocated because of the memory
    expense.)

    The next optimization targeted lock-handling in the thread-management
    subsystem. Should thread-realted data structures be protected by one
    centralized lock, one-lock-per-data-structure or some other method? The
    researchers found that a single lock limits throughput. With multiple locks
    for centralized data structures, gains in throughput occur, but bring with
    it matching gains in latency as the processors compete for many locks. The
    optimal solution, according to the authors, was to use one global lock for
    the central data structures and ALSO unlocked per-processor lists of data.
    When processors went looking for data in the structures, it searched its
    local list first and then, if necessary, accessed the global structure. This
    algorithm improved throughput without increasing latency.

    An important note is that especially for thread-management software, which
    is called so frequently by the operating system, any increases in
    complexity also increase latency. Simplicity and code efficiency are crucial
    principles here.

    Spin-waiting as a part of thread-management code was also evaluated in the
    paper because local queuing is a major thread-management bottleneck. The
    improvement made by the researchers modified the algorithm from its basic
    form (spin the processor while waiting for the lock - causes bus
    contention) into an Ethernet-style backoff algorithm (wait for a variable
    period of time based on past experience and try again). Switching to a
    blocking-wait was rejected because it's inefficient for thread-management
    purposes. A blocking-wait implies an expensive context switch for the
    processor and locks are held for only a short time. Each wait would entail a
    context-switch and that is too great of a cost.

    This paper was clearly written and conveyed. It is interesting to read
    because the paper addresses a crucial theme in computer architecture,
    managing lightweight processes, displays bottlenecks in the existing
    architecture. I also appreciated that both theory and implementation were
    addressed: the proposed solutions were implemented, tested, analyzed and
    proven to be more efficient in PRACTICE as well as in theory, even
    considering simulated user tasks. I "believe" this paper more readily than
    I would put faith into an OS paper that described only theory and used
    logic to support assertions, instead of an actual implemented, functioning
    system.

    -------------
    Gail Rahn
    grahn_at_cs.washington.edu


  • Next message: Manish Mittal: "Review:The Performance Implications of Thread Management Alternatives for Shared-Memory Multiprocessors."

    This archive was generated by hypermail 2.1.6 : Mon Jan 26 2004 - 11:27:25 PST