Porcupine review

From: Muench, Joanna (jmuench_at_fhcrc.org)
Date: Wed Feb 25 2004 - 16:57:13 PST

  • Next message: Sellakumaran Kanagarathnam: "Review: Manageability, Availability and Performance in Porcupine: A Highly Scalable Internet Mail Service."

    The paper on the Porcupine system by Saito, Bershad and Levy (1999)
    describes the design and implementation of cluster-based mail service. As a
    paper it provides a nice counter-point to the TranSend/HotBot paper; that
    system was optimized for reading operations, while the Porcupine system is
    optimized for writes. The over-arching requirements are the same for both
    systems, manageability, availability and good performance.

    One of the important concepts discussed in the paper is hard state vs. soft
    state information. This is similar to the ACID and BASE semantics of Fox et
    al. (1997), although with a different spin. Hard state information seems to
    follow semantics something very close to ACID (is maintained in stable
    storage), but unlike BASE, soft state information can actually be
    reconstructed from hard state. Allowing data to exist in soft state
    increases performance by decreasing updates, message traffic and other
    overhead associated with maintaining consistency.

    To offer reliability, all mail is replicated on at least one other server.
    As the evaluation system points out, performance-wise this is where the
    system gets expensive. In addition, to improve load balancing, mail for an
    individual will be spread out over multiple servers. There is a trade-off
    between the load balancing gains of spreading the data and the performance
    hit of having to collect information from large numbers of nodes. One of the
    important tuning factors discussed in the paper is how to make this balance.
    Evaluation of the system reveals that a dynamic spreading policy is optimal.

    Porcupine contains many interesting techniques to deal with failures. The
    goal is to manage change automatically, and judging from the evaluation, the
    system largely succeeds. Most interesting is how management responsibility
    is distributed. Changes in cluster membership are dealt with by whatever
    node first notices a new node is available or a node isn't responding. The
    user map is also reconfigured in the event of a membership change. This
    broad distribution should make systems administration easy; if a disk needs
    replacing that node can be powered down without adversely affecting the
    entire system.

    It would be interesting to look at the Porcupine system in today's world of
    huge email loads. It should be relatively easy to add spam-filtering and
    virus-detection tools to the system.


  • Next message: Sellakumaran Kanagarathnam: "Review: Manageability, Availability and Performance in Porcupine: A Highly Scalable Internet Mail Service."

    This archive was generated by hypermail 2.1.6 : Wed Feb 25 2004 - 16:57:22 PST