From: Cliff Schmidt (cliff_at_bea.com)
Date: Wed Feb 25 2004 - 02:56:15 PST
This paper describes an architecture for scalable network services, with
an emphasis on scalability, availability, and cost effectiveness. All
of these factors are achieved through the use of clusters of commodity
workstations. The paper begins with a vote about Multics and later
mentions Grapevine, but I also couldn't help seeing similarities to
ISIS -- from the point of view of having a goal to create a framework
on which other applications could run and not have to worry about key
factors (for ISIS these factors had to do with message delivery order,
which is not very related to this paper).
The main thing about using clusters instead of one large SMP is that
you get lots of inherent redundancy, but you have to use lots of smart
software to make it all work together as one unit. One of the main
points of this paper, however, is that there are guarantees that can
be relaxed -- primarily around consistency and durability. In fact,
the authors use the term "BASE: basically available, soft state,
eventual consistency" to contrast with the standard ACID term (atomic,
consistent, isolated, durable). The point here is that most network
services can get away without having ACID-like guarantees; some data
can be stale or even temporarily out of synch with other data. By
relaxing these guarantees, this architecture is able to optimize its
goals of scalability, availability, and cost effectiveness. The
reference to "soft state" means that lots of state on various nodes
can avoid being written to disk, if on failure there is a way for
it to somehow recover the state, even if it means relatively
expensive I/Os at that time. It's also worth noting that the paper
isn't discouraging the use of ACID transactions in all cases; in
fact, the system used ACID in some cases, but there are many places
in any system where BASE properties are more appropriate and provide
substantial side benefits.
A couple places in the paper the authors refer to a programming model
for service authoring, but I didn't feel like that was adequately
explained; I would have liked to learn more about the programming
model.
The system was tested with TranSend, a Web caching and data
transformation service for UC Berkeley. There is also a discussion
of Inktomi's HotBot search engine, but that is done mainly for
comparison since the search engine has never actually been deployed
on the system described by the authors (although it is an interesting
heavy production example with similar characteristics).
The architecture of the system is divided into three layers, the
service layer (think applications), the TACC layer (transformation,
aggregation, caching, and customization), and the SNS (Scalable
Network Service support). This functionally static partitioning
allows there to be a pool of simple and stateless worker nodes,
which can be traded-in and out quite easily.
The one thing that is not distributed is the Manager. Used for load
balancing, the centralized manager transmits hints to the front ends
to help them make dynamic scheduling decisions. This decision to
centralize the manager was intentional for ease of implementation
and policy enforcement; the authors also noted that it was never
a bottleneck and that fault tolerance was addressed in other ways.
Here are a few other quick notes:
- burstiness was discussed a few times -- an overflow pool exists
for the sole purpose of handling the infrequent, but critical busts
of load, as well as longer lived sustained loads.
- I liked the way that the manager periodically "beacons its
existence on an IP multicast group to which other components
subscribe." This allows new nodes to simply have a hardwired
channel to listen to at startup, find the manager, and register
itself. Timeouts are used as failure indications. Soft state means
mirroring isn't necessary since the state can be regenerated.
- Also liked that peers monitor other peers. If a peer notices
that a node isn't responding, it can notify the manager and even
restart the node itself.
- There was an emphasis on how quickly most of the pieces of the
TranSend software were written. The point being that the TACC and
SNS framework allowed the developer to focus on the content-
specific tasks. This is the type of observation that reminded me of
the ISIS paper.
This archive was generated by hypermail 2.1.6 : Wed Feb 25 2004 - 02:56:16 PST