Manageability, availability and performance in Porcupine

From: Greg Green (ggreen_at_cs.washington.edu)
Date: Tue Feb 24 2004 - 22:19:25 PST

  • Next message: Cliff Schmidt: "Saito et al. "Manageability, Availability and Performance in Porcupine""

    This paper describes the implementation of a highly available and
    scalable internet mail service on a commodity pc cluster. The design
    constraints were easily managed with self-configuration and
    self-healing. Linear scaling so that nodes or disks can be added to
    improve throughput. Highly available so that good service is available
    to all users even if part of the cluster is down. Finally it should
    have comparable performance and scale to hundreds of machines.

    The system consists of several services that run on each machine in
    the cluster. They consist of membership managers, user profile
    manager, a delivery proxy, and a retrieval proxy, and a replication
    manager. The mail for each user is stored in mailbox fragments
    distributed on the cluster. There is a user map on each node that maps
    users to node responsiblities for profiles and mailbox fragments. A
    user profile databases keeps specific client information and is
    relatively static. There is also a cluster membership list on each
    node.

    Mail coming into the cluster is handled by a delivery proxy. This
    proxy retrieves the mailbox list from the user's management node and
    then chooses a node to deliver the mail too. Mail retrieval works in a
    similar way.

    Management is done by a membership protocol. When a node discovers
    that a node is down, or a new node has joined, it initiates a protocol
    that has all current cluster members reply. The node then broadcasts
    the new membership and a epoch id to all nodes. Also in this step, the
    new hash is created that maps user management to nodes. Each node then
    examines it's state for changes and sends those changes to the proper
    nodes.

    The mail messages can be replicated with a replication manager. This
    allows a message to be updated anywhere, therefore no primary mailbox
    manager is needed. The algorithm is uses BASE not ACID semantics so a
    user could see inconsistent state amongst the different nodes until
    stablization is achieved.

    The paper had a large section on actual scaling and performance. The
    performance was comparable to a similar cluster using existing mail
    services and static assignment of users to nodes. With replication the
    performance was not quite as good. The system was shown to be highly
    scalable, with performance going up linearly with the number of nodes,
    and also just by upgrading some of the nodes with faster disks.

    I really liked this paper. It was described in sufficient detail that
    I had a good idea of how the system worked. Automatic management and
    load balancing is obviously a highly desirable trait, and this system
    shows a way to deliver that, at least for internet mail. Since I
    suffer from numerous exchange mail server outages and slowdowns, I
    wish Microsoft would adopt something like this. It was interesting how
    even upgrading disks on individual machine improved the
    performance. This is a difficult thing to accomplish it seems to me. I
    kept looking for the downside, as the papers tend to show their system
    in the best light possible, and skim over the difficulties. I was
    unable to spot the drawbacks here. I would like to know what they are
    as there is bound to be something.

    -- 
    Greg Green
    

  • Next message: Cliff Schmidt: "Saito et al. "Manageability, Availability and Performance in Porcupine""

    This archive was generated by hypermail 2.1.6 : Tue Feb 24 2004 - 22:19:29 PST