From: Richard Jackson (richja_at_expedia.com)
Date: Wed Mar 10 2004 - 14:12:51 PST
This 1997 SOSP paper by Fox, et al describes a layered, generic model
for building scalable network services(SNS).
The main sections of this paper are: 1) overview of SNS, 2) TranSend
design and implementation, 3) Hotbot overview, 4) measurements of
TranSend
SNS is a framework that allows services to be built around clusters of
commodity computers. The main idea is that new services can be easily
integrated, and you'll get all the benefits of SNS "for free". The
overall design is like many other high-level frameworks in that it hides
the low-level details that are not relevant for a creating a new
high-level service. In a sense, SNS acts like an operating system and
the new services act as users of this system. The high-level design of
SNS includes various components. Each component is independent, but
could possibly be located on a single node(in practice this should be
rare). The types of nodes include: 1) front-ends, 2) manager stubs, 3)
SAN, 4) worker stubs, 5) workers, 6) monitor, 7) manager, 8) profile DB.
There is also a cache that can be used by various components. The
design of this system seems to indicate that a new service can be added
by doing the following: 1) creating new custom front-end and integrating
it into the manager stub, 2) creating new custom workers and integrating
them into the worker stub. The other components of the system serve to
balance load and provide high availability by reacting to failures.
Because a developer does not need to be too concerned with the central
components of the model, it is easy to see how development efficiency
can be gained via SNS. Another concept introduced is TACC, which is a
programming model to be used with SNS. TACC stands for Transformation,
Aggregation, Customization and Caching. Essentially, the TACC model
supports the framework described above. Finally, BASE semantics are
introduced, which means Basically Available, Soft state, and Eventual
Consistency.
TranSend is a service that runs within the general framework of SNS.
TranSend is a proxy-like system that runs between the Berkeley dial-in
modem pool and the Internet. The main purpose of TranSend is to
pre-process images before they are sent to a client. This preprocessing
will scale the image or reduce resolution. This work is done in a
distiller, which is really a SNS worker. The front end is a HTTP
interface, essentially a web server. Another class of workers called
cache nodes also are used to store transformed data. TranSend supports
BASE semantics by using stale load balancing data, and giving
approximate answers.
Hotbot is briefly described. This is another system that uses a
SNS-like architecture to achieve cluster-based scaling. Most aspects
are not as well-refined as TranSend, so Hotbot doesn't achieve nearly
the same level of SNS-like behavior. It should be noted that neither
system seems to completely follow SNS. It seems that SNS was invented
afterward as a general way to describe existing systems.
Finally, performance measurements of TranSend are discussed. It is
shown that the system can handle burstiness well, and that the distiller
allocation scheme works correctly. This shows that the manager can
correctly handle new load. Scalability is also shown, as each component
can be individually replicated to remove bottlenecks within SNS. This
way, the system can be grown in arbitrary directions, as needed.
Ironically, the final conclusion of this section is that TranSend could
run on a single machine, which defeats the whole purpose of this type of
system. In other words, instead of building an elegant software-based
system, we can just throw a bunch of hardware at the problem and get the
same result.
Overall, the design of SNS seems to formalize typical designs that are
likely used in most clustered applications. The main strengths of this
paper were: 1) load balancing, 2) automatic failure recovery, 3) use of
an overflow pool to absorb excessive load, 4) optimistic use of stale
soft state for load balancing, 5) ignorance of bugs in workers because
the manager will automatically restart them. This last point is
notable, as we've seen this before(ie: Nooks). Basically, the idea is
to build a reliable central system, such that any errant add-on will not
cause systemic failures. By making the system restart the add-on as
needed, you've built a very reliable overall system. The fundamental
idea seems great, because often application development work is focused
on finding the crashing bugs in all code. If instead developers could
focus only building the key functionality and ignoring non-critical
crashing bugs, significant development costs could be achieved. I am
not necessary advocating this practice however.
This archive was generated by hypermail 2.1.6 : Wed Mar 10 2004 - 14:15:23 PST