Review: Porcupine paper

From: Raz Mathias (razvanma_at_exchange.microsoft.com)
Date: Wed Feb 25 2004 - 17:39:15 PST

  • Next message: Tarik Nesh-Nash: ""Manageability, Availability and Performance in Porcupine: A Highly Scalable Internet Mail Service" Review"

    This paper presents several design goals and algorithms employed in
    implementing an internet-scalable system that has "luke-warm" volatile
    data (e.g. mail systems, in contrast with stock trading systems which
    have hot, constantly changing data and proxies or search engines which
    have read only data). Its most valuable contribution is its ability to
    almost completely decouple system management from storage (separating
    system management from availability/performance) at the expense of
    somewhat weakened consistency in the presence of failures. This
    decoupling allows the system to itself to maintain performance with
    increased loads and to "heal" itself in response to hardware failures.

     

    The paper interestingly begins by defining scalability as a function of
    management (!) as well as availability and performance. Scalability is
    often defined as the function mapping from load to latency. I think
    that the inclusion of management in the definition is critical to the
    system's ability to scale; performance results naturally from real-time
    self-reconfiguration. As we've seen in the "Cluster-Based Scalable
    Network Services" paper, centralized management creates unnecessary
    sharing in the system which limits scalability. The obvious problem
    here is how does one distribute management without compromising
    performance? The solution presented in the paper involves defining the
    shared management state as a "soft," calculated data that is maintained
    by all nodes in the cluster (soft state was introduced by the last paper
    we've read). Any node brought onto the system can easily recalculate
    the full management state. The only hard state in the system is the set
    of mail messages and the user profile database (user name, passwords,
    etc.) which are replicated and fragmented across the system to provide
    graceful degradation in the presence of failures.

     

    Self management allows the system to reconfigure itself in response to
    node failures, the detection of additional nodes, and the presence of
    poor load-balancing. The central piece of management data in the system
    is the "user map", a hash mapping between the user name and machine
    responsible for managing that user's mail store fragments. When a new
    node is introduced, the system can evenly redistribute the load of
    management by negotiating and converging to a new user map. When a node
    detects the presence of new machines, it becomes the coordinator in a
    The Three Round Membership Protocol used to obtain the complete set of
    nodes in the cluster. A new user map is computed on every node and the
    delta's are computed between the old and new maps. These deltas are
    then used by the nodes to direct the information transfer of message
    store fragments in the system. To be effective, the delta between the
    old node and the new one need to be sufficiently different to distribute
    load evenly while sufficiently similar so as not to overburden the
    underlying network (i.e. some amount of node affinity must be maintained
    for scalability). This property of the hashing function (more of a
    policy issue than a mechanism) is one that, I believe, needs to be
    studied further and in greater detail. It would be nice to come up with
    a hash function that takes deltas of various factors such as network
    bandwidth, latency, machine availability, disk storage, etc before and
    after the new node was added.

     

    In my opinion the randomized method of fragmentation employed in this
    system was a great idea. Instead of requiring that "some of the users
    get none of the data some of the time," we have "more of the users don't
    get all their data some of the time." I believe that this is exactly
    the right kind of thinking. This critical tradeoff allows the system to
    degrade gracefully without leaving any particular user without service.

     

    I personally really enjoyed reading this paper as it presented some
    convincing scenarios and tradeoffs (in addition to core algorithms) that
    would be valuable in today's mail systems. The innovative use of soft
    state in bringing up management information makes this system very
    appealing.

     


  • Next message: Tarik Nesh-Nash: ""Manageability, Availability and Performance in Porcupine: A Highly Scalable Internet Mail Service" Review"

    This archive was generated by hypermail 2.1.6 : Wed Feb 25 2004 - 17:39:19 PST