From: Cem Paya (cemp_at_microsoft.com)
Date: Wed Feb 25 2004 - 15:56:11 PST
Review: Cluster based scalable network services
Cem Paya, CSE551P
In this paper the authors describe a general purpose architecture for
building network services using a cluster of off-the-shelf hardware.
They have 3 requirements on the outset: scalability, availability and
cost-effectiveness. First two are relatively easy to measure, although
the view expressed here is somewhat simplistic. 24x7 availability is
rarely achieved, usually it is expressed in "nines" eg four nines of
reliability corresponds to 99.99% uptime. Similarly scaling in the real
world is rarely linear in the # of identical components added. Cost
effectiveness isn't as clear and their use of SunSparc workstations as
opposed to commodity x86 PCs is questionable in that regard.
Comparing pros/cons of using one extremely powerful machine (possibly an
SMP box) vs cluster of cheap PCs, they raised 3 good points: scaling a
large machine is very difficult compared to adding new one-compare
installing new processor or another SIMM to adding a new machine to the
network and hitting the power button. Clusters have redundancy built in
naturally because everything is mirrored, and independent-failure of one
disk or CPU has no affect on probability of failure elsewhere in the
cluster. Authors also focused on some interesting economical/logistical
problems: the mainframe/supercomputer requires what they termed a
"forklift" upgrade which throws out all investment and replaces it with
new one all at once, while a cluster can be gradually evolved-eg upgrade
1 machine at a time to faster hardware. Capacity planning is less of an
issue with clusters; if one underestimates, ordering another machine can
be done very quickly-48 hours in the paper, must faster these days with
online custom ordering. Extremely competitive nature of PC hardware
business has served to only amplify these efficiencies. Some
disadvantages of using clusters are pointed out: administration
overhead, need to breakdown by component and sharing state. Only the
latter is an inherent problem but it plagues protocol design: if
load-balancing is not sticky (eg user may return to different servers
over time) it is very difficult to adopt protocols to roam the state.
Overall architecture does not provide the strong ACID guarantee, but a
weaker one identified by the acronym BASE which stands for basically
available, soft-state, eventual consistency. It consists of a front end
which is the outside face (eg Internet facing usually) which uses a
manager stub to route requests over a system-area network to workers,
sitting behind worker stubs. There is also some built-in support for
load balancing and customized content via user profile DB.
Load-balancing is dynamic and takes into account the individual load
reported by each node in the cluster (there is extensive multi-cast
traffic going on over the SAN).
Last few sections of the paper discuss two systems built on top of this
architecture. These systems are actively deployed for use-in one case
serving 5000 users on dial-up lines at UC Berkeley, in another case
forming the underpinnings of a search engine, so they are far from being
"prototypes" or "proof-of-concept." In particular TranSend is a
web-cache that actively transforms data, changing image representation
for faster load times. Note that the GIF--> JPEG compression etc. could
be done by clients too but the real bottle neck here was size of images.
(Although the progressive GIF standard, where a low-resolution version
of the image appears first, followed by successive refinements, seems to
have obviated the need for this type of transformation.)
This archive was generated by hypermail 2.1.6 : Wed Feb 25 2004 - 15:56:18 PST