Review of "Porcupine"

From: Honghai Liu (liu789_at_hotmail.com)
Date: Wed Feb 25 2004 - 16:09:18 PST

  • Next message: David Coleman: "Porcupine review"

    Reviewer: Honghai Liu

    The paper presents Porcupine, a cluster of commodity PCs which supports E-mail services and
    achieves self manageability, high availability and scaleable performance.

    The principle goal of Porcupine is functional homogeneity, and any node can play the role of any
    part in the system. So Failure, work load of any parts of system can be transparent to the users.
    More importantly, this implies that the system can grow scalable with the demands.

    The difference between Porcupine and BASE based system by Fox etl. is intriguing. First, Fox's
    system is read intensive, where cache plays an significant role in achieving high performance. With
    Email applications, which Porcupine is focused on, are write intensive workloads, and caching
    doesn't help so much. Second, there are different types of components in Fox's system, and only
     at that same type of component level can load balance and failure detection can be achieved, so the
     system are layered and responsibilities of roles have to be pre-defined. On the contrary, each node
    in Porcupine system is a complete element and it is by nature interchangeable. Therefore the
    management and system architecture are flat, which is more attractive since the administration,
    maintenance and growth of the system are more manageable.

    Data structures are divided into two groups: soft state and hard state. Hard state is the information
    that cannot be lost and has to be in stable storage; soft state can be lost and could be easily computed
     from the existing hard state. Similar to BASE's semantics, soft state is heavily used because of
    performance advantage and the relaxed nature of E-mail application.

    Self management (dynamic reconfiguration) is realized by membership services using TRM to handle
    the leave or addition of a member, and soft state reconstruction. Replication scheme of Porcupine
    provides high availability through the use of consistency semantics weaker than strict single-copy
    consistency.

    Load management is fined grained and dynamic, and responsibility of load decision is not centralized
    (because each node plays exactly the same role). Side effect of RPC and virtual rings are used to
    distribute load to the least loaded node. Spread is a soft upper bound of the number of different
    nodes a user's mail can be stored. In most cases, making spread to 2 would give the best performance
     and availability balance.

    In terms of weakness, Porcupine's architecture is flat so every node could talk to the other nodes.
    Therefore, network is certainly a bottle net when the number of the nodes grows into hundreds or
    thousands. Managing such large number of the nodes without any strict layers to follow may become
     a legitimate concern for network administrators - it is hard to identify and isolate a problem because
    every node is the same.

    In terms of relevance of the today's technology, Porcupine shows us an interesting way to think of how
    to take the advantage of the increasing powerful commodity PCs to build a scalable and self manageable
    system that can do things equally to or even better than centralized commercial system.


  • Next message: David Coleman: "Porcupine review"

    This archive was generated by hypermail 2.1.6 : Wed Feb 25 2004 - 16:09:21 PST