FW: Cliff Schmidt's review of Performance Implications of Thread Management Alternatives for Shared-Memory Multiprocessors

From: Clifford B Schmidt (cliffsch_at_u.washington.edu)
Date: Mon Jan 26 2004 - 17:15:35 PST

  • Next message: Ian King: "Review: Anderson, Lazowska, Levy, The Performance Implications of Thread Management Alternatives"

    Forwarding from different account since original message hasn't appeared on list.

    -----Original Message-----
    From: Cliff Schmidt
    Sent: Monday, January 26, 2004 3:13 PM
    To: hype-csep551-11_at_www.cs.washington.edu
    Subject: Cliff Schmidt's review of Performance Implications of Thread
    Management Alternatives for Shared-Memory Multiprocessors

    This was (or at least felt like) the longest paper I've read so far. In
    contrast to the "Scheduler Activations" paper, written by the same authors,
    this one seemed a little too detailed for this course. For example, the
    discussion of processor caching discussions and how it affects bus
    contention seemed more appropriate for a hardware-focused, computer
    architecture course. To be honest, I learned a few things about how much
    of an effect hardware design can have on threading models that the
    developer sees, but it was a very slow read for me. Maybe this would
    have been an easier read had I already been more familiar with the issues
    around write-through caches, invalidation of caches, the impact of cache
    misses. Part of the reason this paper took me so long to read was that
    I was doing a lot of background reading as some of these hardware issues
    were discussed.

    The key concepts mentioned over and over in the paper are latency and
    throughput. Unlike the "Scheduler Activations" paper that shows how
    a better design can achieve better flexibility and better performance
    at the same time, this paper seemed to make a point of saying, just when
    you think you've solved a perf problem by reducing latency, you probably
    have done so at the cost of throughput, and vice versa. Not to say that
    there weren't optimal designs depending on the scenario (such as local
    ready queues and the Ethernet-style backoff algorithm). The paper also
    makes the point that simplicity is key. For instance, I found it
    interesting that there are on the order of 100 instructions in the
    thread management path, and that adding a few extra instructions can
    have a measurable impact on performance.

    Some of the other points I thought was interesting in the paper were:

    - getting the data structure right can have a big impact on perf
    - local copies can be used to prevent contention, and basically trade
    space for time. Depending on the size of the object (especially
    the difference between copying control blocks and copying stacks), this
    may or may not be the right thing to do.
    - as the number of threads gets above or below the number of processors,
    different strategies need to be used. Part of the significance found
    in charting this is due to the change in the definition of latency,
    but aside from that, it appeared to me that one of the lessons was that
    doing extra work ahead of time to improve perf at a later time will not
    pay off if that prep work is preventing other work from getting done
    then. In other words, having more processors than threads allows for
    tactics that are no longer beneficial when it is the other way around.
    - spin-waiting can delay, not only the spinning processor, but other
    processors as well. It was also interesting that depending on the
    caching mechanisms, spin-locks can cause bus contention if they have
    to keep checking for the lock to be free.


  • Next message: Ian King: "Review: Anderson, Lazowska, Levy, The Performance Implications of Thread Management Alternatives"

    This archive was generated by hypermail 2.1.6 : Mon Jan 26 2004 - 17:15:36 PST