Review: Saito et al., Manageability, Availability and Performance in Porcupine

From: Ian King (iking_at_killthewabbit.org)
Date: Wed Feb 25 2004 - 17:14:39 PST

  • Next message: Raz Mathias: "Review: Porcupine paper"

    The authors describe a clustered system designed to provide Internet mail
    service to a large user population. Primary goals were scalability, fault
    tolerance, and cost effectiveness; certain other goals, while not ignored, were
    subordinate, such as consistency and comprehensive availability. The authors
    point out that the system described here can be utilized for other types of
    Internet services such as netnews, but distinguish their implementation from
    web-server clusters by the pattern of data usage: web service implies primarily
    read access, while email systems are write-intensive.

    A distinguishing factor of Porcupine is that any machine (with minimal
    restrictions) can serve any function in the overall cluster at any given time.
    (The chief limitation is that a node receiving SMTP traffic must be identified
    to Internet clients.) Load balancing is fully distributed; a module that has
    work for another type of module has a periodically updated (but possibly
    outdated) table of all nodes' current loading, and can make a choice of
    recipient based thereon. A clever feature is that which restricts completely
    unrestrained distribution in the case of "mailbox fragments"; spreading
    fragments too widely would degrade the system by requiring too much network
    traffic in assembling a user's mail from all the fragments. Instead, a "loose"
    upper limit is placed on the number of fragments; that limit can be violated in
    extreme cases, such as full disks or maxed CPU utilization on a target system.

    Many large mail systems make use of static partitioning of the user population
    to distribute the load. Two issues with this are that use is not consistent
    across a user population (they'd hate to maintain my inbox), and scaling the
    system involves subdividing a partition, i.e. migrating some portion of users'
    data between machines. The dynamic balancing mechanisms of Porcupine obviate
    this by distributing in relatively small, nearly arbitrary partitions across the
    system nodes.

    Porcupine makes use of both soft state and relaxed consistency requirements to
    simplify structure and speed common behaviors. It further allows for certain
    benign "errors", such as duplicate messages or temporary unavailability of some
    subset of a user's messages. Replication is used both to reasonably ensure that
    no message is truly lost until deleted by the user, and to allow for load
    balancing on mailbox reading. As in other systems, loss of soft state is
    corrected either by replication from another node or examination of relevant
    "hard state". One particularly important feature is that any replica can serve
    as a replication source; generational issues are resolved through use of
    timestamps. This does not place the onus of providing replication on one node,
    nor run the risk of that one node being the faulting one and thereby incapable
    of recovering itself.

    The paper goes into considerable detail regarding the consistency protocols for
    messages and system state. Messages and user information are viewed as "hard
    state", which cannot be lost, while most other state is soft. Certain
    structures are replicated for fault tolerance, others for performance (such as
    user maps).

    In simulated-load studies, the researchers demonstrated that Porcupine performs
    at least as well as a statically-partitioned system, and with certain
    assumptions outperformed it. It also scales linearly with additional nodes, up
    to the limitations of local network bandwidth. Nodes become disk-bound before
    becoming CPU-bound; the theoretical maximum performance was demonstrated by
    assuming "infinitely fast" disks, which was simulated by discarding bodies and
    retaining message digests in memory.

    Adding machines for either recovery or scaling are both handled by the same
    mechanism of repopulating the soft state of a new machine that is running
    Porcupine software; no administrator interaction is required. It was noteworthy
    that the latency of these events was in the tens of seconds.


  • Next message: Raz Mathias: "Review: Porcupine paper"

    This archive was generated by hypermail 2.1.6 : Wed Feb 25 2004 - 17:26:57 PST