From: Raz Mathias (razvanma_at_exchange.microsoft.com)
Date: Wed Feb 25 2004 - 17:38:33 PST
Today's paper introduced us to the concept of clustering. Unlike
transaction processing systems, clusters do not guarantee the ACID
properties. Some of these strong properties are compromised for the
goal of increasing scalability, increasing uptime, and reducing costs.
The paper takes a layering approach to developing cluster architectures.
The lowest layer is called the Scalable Network Service (SNS) layer.
This layer supports load balancing, overflow management, front-end
availability, fault tolerance, and monitoring and logging. The middle
layer implements Transformation, Aggregation, Caching, and Customization
(TACC). Finally, at the highest layer we have the applications. Note
that when implementing at this layer, we do not have to think about how
scalability and fault tolerance are handled as they are abstracted from
below.
The paper defines a cute-sounding alternative to the ACID properties
dubbed BASE, which stands for Basically Available, Soft state,
Eventually consistent. The lack of guarantees in this system makes it
viable under specific circumstances. These include times when the data
is read-only, and no session information (i.e. state) needs to be
maintained across calls to the worker processes. Search engines and
proxy servers are ideally suited to this type of model. Introducing
state to this system would be detrimental, as nodes would have prolonged
dependencies on each other and the state could not easily be transformed
into "soft state" (e.g. the state needs to be stored somewhere). Soft
state is a new concept introduced by this paper; it can be considered to
mean calculated state, or state that need not be stored. The fact that
storage is not necessary for this state allows us to avoid the costs
involved with replicating it and backing up data. Soft state trades off
some latency in the case of failure for simplified management and better
scalability (because there is no shared resource such as a database that
stores state upon which processes idly block).
The bottom layer in the system provides the fabric for the system's
scalability. It replicates components for both fault tolerance and
scalability. It manages load across machines and assigns tasks to an
emergency worker pool in instances of high demand. An interesting facet
of this layer is the fact that it depends on peer fault-tolerance. When
a node is corrupted, it depends on the other nodes in the system to
detect the failure (using a timeout or ping mechanism) and restart the
faulting process. Worker stubs (on the worker nodes) and manager stubs
(on the front-ends) allow the application programmer to essentially
ignore the requirements of the high scalability; they merely worry about
their own single-instance performance (in terms of latency). This
abstraction allows us to build Internet-scalable applications from
off-the-shelf components.
It is interesting that this paper defines a scalable network service as
requiring incremental scalability, overflow growth, and 24x7
availability. I prefer the definition of a scalable system presented in
the Porcupine paper, as it includes the critical property of management
in the definition. The trend toward less state (and soft state) in each
component is also interesting (and new?). This limits us in the kind of
applications we can create and we may trade off a bit of performance in
the case of failures (in smaller networks where there is no contention
on the shared or hard state) but it does seem to scale better which is
the goal for supporting applications that scale to Internet-sized
populations of clients.
This archive was generated by hypermail 2.1.6 : Wed Feb 25 2004 - 17:38:37 PST