Porcupine Review

From: Brian Milnes (brianmilnes_at_qwest.net)
Date: Wed Feb 25 2004 - 16:50:17 PST

  • Next message: Muench, Joanna: "Porcupine review"

     Manageability, Availability and Performance in Porcupine: A Highly
    Scalable, Cluster Based Mail Service - Saito, Bershad and Levy

                The authors describe the motivations, design and experience with
    a clustered based mail service. The system is designed to be self
    configuring, self healing, highly available, not sacrifice single node
    throughput and to scale linearly with the number of nodes. They chose email
    because its write intensity breaks TACC and other models.

                The architecture requires all machines to perform all services,
    automatically reconfigures and replicates data. State is handled as either
    hard, e.g. may not be lost, or soft may be reconstructed. Each node runs a
    set of replicated data managers and the mail interface. The transaction path
    is controlled by a map of where user data is and who has the least current
    load.

     If you don't use load balancers what do you do when a server IP goes down?
    Will the mail clients retry on a new IP or do you appear down for a while?
    The flow of control in Porcupine is very elegant. Users data are
    automatically replicated, each new machine gets a copy of some slice of the
    data. A user map reconstructed on membership changes which is controlled by
    a membership protocol.

    The user data is replicated by a replication manager. The data may be
    updated anywhere, eventually consistent, a total overwrite, lock free and
    ordered by loosely synchronized clocks. An application process decides to
    replicate storage, picks a replication manager by load, and sends out the
    data with a list of nodes to receive the data. This is very simple and very
    cool. Why not try and choose a node that is to receive the data if its load
    is below a threshold? This greatly simplified by the size of email. How do
    you handle replicating data on machine loss?

    The combination of using ring and piggy backing load information on all RPCs
    is very elegant. In this case, unlike in the TACC architecture, your tasks
    are driven by where the data is and so a more complicated load balancer is
    warranted. Generalizing techniques like this to larger blocks of data and
    read and write load would be very useful for a larger set of applications.

    The benchmarks show a very nice linear scalability with the network being
    the bottleneck as messages move through the network four times. The load
    balancing algorithm nicely handles changes to nodes that increase their disk
    throughput. The benchmarks show quite a variability in messages per second
    throughput. Why? Why is the latency not shown compared to a single POP
    service?

     This is a much more interesting paper the TACC as the management is
    simplified and the problem of writing is handled. On the TACC system data
    has to be delivered to make the system work and it has been ignored. The
    specialization to such a light weight service is a very interesting set of
    tradeoffs. What happens if you want to increase the average size message by
    2, 16, 32 or 64?


  • Next message: Muench, Joanna: "Porcupine review"

    This archive was generated by hypermail 2.1.6 : Wed Feb 25 2004 - 16:50:23 PST