From: Brian Milnes (brianmilnes_at_qwest.net)
Date: Wed Feb 25 2004 - 16:50:17 PST
Manageability, Availability and Performance in Porcupine: A Highly
Scalable, Cluster Based Mail Service - Saito, Bershad and Levy
The authors describe the motivations, design and experience with
a clustered based mail service. The system is designed to be self
configuring, self healing, highly available, not sacrifice single node
throughput and to scale linearly with the number of nodes. They chose email
because its write intensity breaks TACC and other models.
The architecture requires all machines to perform all services,
automatically reconfigures and replicates data. State is handled as either
hard, e.g. may not be lost, or soft may be reconstructed. Each node runs a
set of replicated data managers and the mail interface. The transaction path
is controlled by a map of where user data is and who has the least current
load.
If you don't use load balancers what do you do when a server IP goes down?
Will the mail clients retry on a new IP or do you appear down for a while?
The flow of control in Porcupine is very elegant. Users data are
automatically replicated, each new machine gets a copy of some slice of the
data. A user map reconstructed on membership changes which is controlled by
a membership protocol.
The user data is replicated by a replication manager. The data may be
updated anywhere, eventually consistent, a total overwrite, lock free and
ordered by loosely synchronized clocks. An application process decides to
replicate storage, picks a replication manager by load, and sends out the
data with a list of nodes to receive the data. This is very simple and very
cool. Why not try and choose a node that is to receive the data if its load
is below a threshold? This greatly simplified by the size of email. How do
you handle replicating data on machine loss?
The combination of using ring and piggy backing load information on all RPCs
is very elegant. In this case, unlike in the TACC architecture, your tasks
are driven by where the data is and so a more complicated load balancer is
warranted. Generalizing techniques like this to larger blocks of data and
read and write load would be very useful for a larger set of applications.
The benchmarks show a very nice linear scalability with the network being
the bottleneck as messages move through the network four times. The load
balancing algorithm nicely handles changes to nodes that increase their disk
throughput. The benchmarks show quite a variability in messages per second
throughput. Why? Why is the latency not shown compared to a single POP
service?
This is a much more interesting paper the TACC as the management is
simplified and the problem of writing is handled. On the TACC system data
has to be delivered to make the system work and it has been ignored. The
specialization to such a light weight service is a very interesting set of
tradeoffs. What happens if you want to increase the average size message by
2, 16, 32 or 64?
This archive was generated by hypermail 2.1.6 : Wed Feb 25 2004 - 16:50:23 PST