Fox et al. "Cluster-based Scalable Network Services"

From: Cliff Schmidt (cliff_at_bea.com)
Date: Wed Feb 25 2004 - 02:56:15 PST

  • Next message: Gail Rahn: "Review of "Cluster-Based Scalable Network Services" by Fox et al"

    This paper describes an architecture for scalable network services, with
    an emphasis on scalability, availability, and cost effectiveness. All
    of these factors are achieved through the use of clusters of commodity
    workstations. The paper begins with a vote about Multics and later
    mentions Grapevine, but I also couldn't help seeing similarities to
    ISIS -- from the point of view of having a goal to create a framework
    on which other applications could run and not have to worry about key
    factors (for ISIS these factors had to do with message delivery order,
    which is not very related to this paper).

    The main thing about using clusters instead of one large SMP is that
    you get lots of inherent redundancy, but you have to use lots of smart
    software to make it all work together as one unit. One of the main
    points of this paper, however, is that there are guarantees that can
    be relaxed -- primarily around consistency and durability. In fact,
    the authors use the term "BASE: basically available, soft state,
    eventual consistency" to contrast with the standard ACID term (atomic,
    consistent, isolated, durable). The point here is that most network
    services can get away without having ACID-like guarantees; some data
    can be stale or even temporarily out of synch with other data. By
    relaxing these guarantees, this architecture is able to optimize its
    goals of scalability, availability, and cost effectiveness. The
    reference to "soft state" means that lots of state on various nodes
    can avoid being written to disk, if on failure there is a way for
    it to somehow recover the state, even if it means relatively
    expensive I/Os at that time. It's also worth noting that the paper
    isn't discouraging the use of ACID transactions in all cases; in
    fact, the system used ACID in some cases, but there are many places
    in any system where BASE properties are more appropriate and provide
    substantial side benefits.

    A couple places in the paper the authors refer to a programming model
    for service authoring, but I didn't feel like that was adequately
    explained; I would have liked to learn more about the programming
    model.

    The system was tested with TranSend, a Web caching and data
    transformation service for UC Berkeley. There is also a discussion
    of Inktomi's HotBot search engine, but that is done mainly for
    comparison since the search engine has never actually been deployed
    on the system described by the authors (although it is an interesting
    heavy production example with similar characteristics).

    The architecture of the system is divided into three layers, the
    service layer (think applications), the TACC layer (transformation,
    aggregation, caching, and customization), and the SNS (Scalable
    Network Service support). This functionally static partitioning
    allows there to be a pool of simple and stateless worker nodes,
    which can be traded-in and out quite easily.

    The one thing that is not distributed is the Manager. Used for load
    balancing, the centralized manager transmits hints to the front ends
    to help them make dynamic scheduling decisions. This decision to
    centralize the manager was intentional for ease of implementation
    and policy enforcement; the authors also noted that it was never
    a bottleneck and that fault tolerance was addressed in other ways.

    Here are a few other quick notes:

    - burstiness was discussed a few times -- an overflow pool exists
    for the sole purpose of handling the infrequent, but critical busts
    of load, as well as longer lived sustained loads.

    - I liked the way that the manager periodically "beacons its
    existence on an IP multicast group to which other components
    subscribe." This allows new nodes to simply have a hardwired
    channel to listen to at startup, find the manager, and register
    itself. Timeouts are used as failure indications. Soft state means
    mirroring isn't necessary since the state can be regenerated.

    - Also liked that peers monitor other peers. If a peer notices
    that a node isn't responding, it can notify the manager and even
    restart the node itself.

    - There was an emphasis on how quickly most of the pieces of the
    TranSend software were written. The point being that the TACC and
    SNS framework allowed the developer to focus on the content-
    specific tasks. This is the type of observation that reminded me of
    the ISIS paper.


  • Next message: Gail Rahn: "Review of "Cluster-Based Scalable Network Services" by Fox et al"

    This archive was generated by hypermail 2.1.6 : Wed Feb 25 2004 - 02:56:16 PST