From: Jeff Duzak (jduzak_at_exchange.microsoft.com)
Date: Wed Feb 25 2004 - 11:10:25 PST
This paper describes a reusable architecture for cluster-based services.
In particular, the architecture is useful for services which are
extremely parallel and do not require strict coherence. The
architecture trades strict semantics for availability, performance, and
scalability. The primary example that is given, TranSend, is a web
proxy which reduces the size of returned web pages primarily by reducing
picture quality.
Clusters in general have a number of useful characteristics. First, a
cluster is much cheaper than a single machine that could be capable of
doing the same work. A cluster can be composed of commodity machines
purchased from any vendor. Second, a cluster naturally lends itself to
reliability. If a single machine in the cluster goes down, there is no
reason why the entire cluster must go down. Third, a cluster lends
itself to easy upgrade. When demand on the cluster grows, new machines
can be added to the cluster.
By abandoning strict ACID semantics, the authors increase the
parallelism of the system. For instance, load balancing decisions are
made based on information that is not always up to date. Load balancing
information is gathered periodically by a manager component from all the
worker components, and then broadcast to the manager stubs on the front
end components. Each manager stub then uses this information to decide
where to forward a request. Hence, the load balancing information is
not coherent. Load may change, and load balancing decisions may be made
with old data. However, the information is good enough to achieve the
desired load balancing. Further, the system allows manager stubs to
work in parallel, not having to contact a central manager for each
request. However, ACID semantics are still maintained for parts of the
system that need it, such as user profiles.
The authors then describe the TranSend system which was built using this
architecture. The TranSend system demonstrates that the architecture
does achieve the goals of scalability and availability, and also shows
that the architecture is easily reusable. Beyond the implementation of
the base SNS (scalable network service) components, the work to build
the TranSend system included building the worker nodes and creating the
front end. The worker nodes were picture and HTML distillers built
mostly from existing code. The front end consisted of the logic for
passing data to workers and returning it to the client. The TranSend
system was then tested, and found to scale linearly all the way up to
the maximum load supported by the equipment available to the authors.
A second system, HotBot, was described. The discussion of this system
seemed a bit out of place, as this system did not use the architecture
described by the paper. Further, the HotBot system did not use dynamic
load balancing, a key feature of the SNS architecture. It seems that
the HotBot system was described primarily to further the argument that
BASE semantics, and clustering in general, allow for good scalability
and availability.
The main idea of the paper, that is, trading strict semantics for better
scalability and availability, seems a very good idea for some systems.
Another interesting theme of the paper was that approximate answers are
sometimes good enough, and the related idea that a system need not try
to be correct in all cases. An example of that idea in this paper was
the Bay Area Culture page, in which incorrect data may be reported, but
it is assumed that the user can just ignore that data.
This archive was generated by hypermail 2.1.6 : Wed Feb 25 2004 - 11:10:31 PST