From: Muench, Joanna (jmuench_at_fhcrc.org)
Date: Wed Feb 25 2004 - 16:57:13 PST
The paper on the Porcupine system by Saito, Bershad and Levy (1999)
describes the design and implementation of cluster-based mail service. As a
paper it provides a nice counter-point to the TranSend/HotBot paper; that
system was optimized for reading operations, while the Porcupine system is
optimized for writes. The over-arching requirements are the same for both
systems, manageability, availability and good performance.
One of the important concepts discussed in the paper is hard state vs. soft
state information. This is similar to the ACID and BASE semantics of Fox et
al. (1997), although with a different spin. Hard state information seems to
follow semantics something very close to ACID (is maintained in stable
storage), but unlike BASE, soft state information can actually be
reconstructed from hard state. Allowing data to exist in soft state
increases performance by decreasing updates, message traffic and other
overhead associated with maintaining consistency.
To offer reliability, all mail is replicated on at least one other server.
As the evaluation system points out, performance-wise this is where the
system gets expensive. In addition, to improve load balancing, mail for an
individual will be spread out over multiple servers. There is a trade-off
between the load balancing gains of spreading the data and the performance
hit of having to collect information from large numbers of nodes. One of the
important tuning factors discussed in the paper is how to make this balance.
Evaluation of the system reveals that a dynamic spreading policy is optimal.
Porcupine contains many interesting techniques to deal with failures. The
goal is to manage change automatically, and judging from the evaluation, the
system largely succeeds. Most interesting is how management responsibility
is distributed. Changes in cluster membership are dealt with by whatever
node first notices a new node is available or a node isn't responding. The
user map is also reconfigured in the event of a membership change. This
broad distribution should make systems administration easy; if a disk needs
replacing that node can be powered down without adversely affecting the
entire system.
It would be interesting to look at the Porcupine system in today's world of
huge email loads. It should be relatively easy to add spam-filtering and
virus-detection tools to the system.
This archive was generated by hypermail 2.1.6 : Wed Feb 25 2004 - 16:57:22 PST