Review: Manageability, Availability and Performance in Porcupine: A Highly Scalable Internet Mail Service.

From: Sellakumaran Kanagarathnam (sellak_at_windows.microsoft.com)
Date: Wed Feb 25 2004 - 17:19:28 PST

  • Next message: Steve Arnold: "Review: Saito, et.al., Porcupine"

    This paper describes Porcupine mail system. It is a cluster-based system
    and the key aspect of Porcupine is its functional homogeneity and
    scalability. The paper describes the system architecture, addition of
    nodes, failure detection and self management, replication and
    availability and dynamic load balancing. Scalability is achieved by
    clustering many small machines. There are three aspects of scalability
    and the authors list down requirements for each one of them:

    a) Manageability - system must self-configure with respect to load
    and data distribution and self-heal with respect to failure and recovery

    b) Availability - avoid failure modes in which a group of users lose
    service (even for a short period of time)

    c) Performance - competitive with respect to other single node
    systems and scale linearly with the number of nodes in the system

    Functional homogeneity means any node can execute any or all of any
    transaction. In contrast, in the SNS discussed in other paper, each
    node is assigned a component of work. The main reason for this approach
    in Porcupine is to simplify system configuration. Three key techniques
    are used to meet the scalability goals: dynamic scheduling, automatic
    reconfiguration and automatic replication. The paper classified the
    replicated state as hard and soft. Basically hard state is information
    that cannot be lost and soft is information that can be reconstructed if
    lost. Hard state is replicated on multiple nodes to ensure high
    availability where as soft state (most of them) is maintained only on
    one node. The key data structures in Porcupine are: mailbox fragment,
    mail map, user profile database, user profile soft state, user map and
    cluster membership list. There is a list of managers that distribute
    and manage these structures. A mail delivery involves user lookup, load
    balancing and message store. The cluster membership service uses a
    variant of the Three Round Membership Protocol. If the node to which a
    user has established POP or IMAP session fails, then the user has to
    reconnect. But for all other cases of node failure, the users will see
    momentary failures, but they will succeed on next attempts. The general
    unit of replication in Porcupine is a single message or profile of a
    user and the replication has the following 5 properties: update
    anywhere, eventual consistency, and total update, lock free and ordering
    by loosely synchronized clocks. The system has dynamic load balancing,
    but there is no centralized node for this purpose. Each node keeps track
    of load on other nodes. It is good to note that pending requests and
    hard disk space make up the load. But at the same time, for some
    reason, if a node is at near 100% CPU for a while, I am not sure how
    this is handled. It was not clear if a single view of the whole system
    is possible in Porcupine. As the cluster size increases, determining a
    new membership could take longer time.

    Porcupine is a very promising mail system that handles many of the top
    issues like availability, self management (for failures), and dynamic
    load balancing. The paper is very detailed and it is neatly organized.


  • Next message: Steve Arnold: "Review: Saito, et.al., Porcupine"

    This archive was generated by hypermail 2.1.6 : Wed Feb 25 2004 - 17:19:52 PST