From: Reid Wilkes (reidwilkes_at_hotmail.com)
Date: Mon Feb 23 2004 - 23:02:13 PST
The basic idea of this paper is to present an infrastructure the authors devised for building a certain type of internet application on a cluster of commodity hardware. There are a number of advantages to building such a system with commodity systems rather than more heavy-duty (and thus expensive machines). Clearly less-expensive hardware is a cost benefit; and by taking a cluster approach it is in theory easy to increase the service level of the application by simply hooking up more machines into the cluster. It is also the case that clusters have the potential to provide a much higher level of fault tolerance than single monolithic systems. The key to making the clustering idea work is scalability: will throwing more machines into the cluster increase the service level of the application in a linear fashion? The results from the experiments the authors did with the TranSend application built on their infrastructure seem to indicate than linear scaling is definitely possible. The infrastructure itself is composed of a few centralized resources such as a manager and database for user profiles, and then a set of manager stubs and worker stubs - one of which runs on each node of the cluster. The manager stubs run on the nodes which provide the "Front End" of the system (presumably just the HTTP web server), and the worker stubs run on the nodes running the back-end processing code - what the authors refer to as "TACC" (Transformation, Aggregation, Customization, and Caching). It is supposedly quite easy to write new applications to be hosted on this infrastrure as the manager and worker stubs and the manager code all work together to distribute and balance the load across the nodes in the cluster so all the programmer has to do is worry about the actual task or service that's being provided and he gets the scalability and reliability of the cluster essentially for free. One downside to this system is that it seems geared for a only a very specific type of application - it is not well suited as a general purpose application platform. For web applications which do a lot of processing on the server for each request it seems like a win, but if you start to need to share much state amongst individual user requests or use a shared database as in e-commerce applications then you will still run into a bottleneck that will severely limit the scalability of the clustering (parallel processing) approach.
This archive was generated by hypermail 2.1.6 : Mon Feb 23 2004 - 23:02:22 PST