From: Cliff Schmidt (cliff_at_bea.com)
Date: Wed Feb 25 2004 - 02:55:46 PST
It was very interesting to read the Porcupine paper immediately after
reading the Fox et al. Cluster-Based Scalable Network Services paper.
Most of the things I liked about that paper (which appears to have
preceded this one by two years, and was shepherded by one of the authors
of this paper--Hank) appear in this one, but the key differences with
this Porcupine were that every single node really was interchangeable
(functionally homogenous), self-configurable, self-healing, and also
had to deal with frequent writes.
This system is very clear about its focus on one thing: scalability.
However, it considers scalability in terms of manageability,
availability, and performance. The authors consider how other
clustering alternatives might work, but point out that common choices
such as statically partitioned clusters, typically can only work with
a substantial overcommitment of capacity (since underutilized
partitions are incapable of relieving load from overburdened
partitions) and with a large administration cost to reconfigure the
system as necessary over its lifetime.
As with the Fox et. al paper, this paper takes advantage of the soft
state concept where state that is needed for a node to function can
be kept only in memory and then derived from other durable state in
the case of a failure. The reconstruction of lost soft state was
described to be "completely distributed, but unsynchronized". After
a membership change, each node inspects the user map and discovers
if it has fresh user buckets to fill. If so it starts looking for
data applicable to the user and sends it to the current user manager.
Here are a few other thoughts about this paper:
- node addition simply requires a system administrator to install
the porcupine software on it. Once it boots it will get noticed
by the membership protocol and get added. This is an idea that is
really catching on throughout the industry as grid computing. In
fact, most of this paper and the Fox et al. paper are example of
what has been talked about in the industry a lot lately: grid
computing and Service Oriented Architecture. I actually think a
lot of industry people who use those buzzwords could learn
something from this paper.
- I liked the consideration of both affinity with load balancing.
Mail box fragments for a single user can be distributed across any
number of nodes (actually there is a soft limit that should not
be exceeded unless no other node is available); one of the many
benefits of this approach is that incoming messages are never
blocked. However, it takes time gather all those mailbox
fragments (or actually, the least loaded node of each
replicated fragment) when a user wants to browse through their
email, so the node affinity places mail onto nodes with existing
fragments for the user, because it is less expensive.
- I didn't really have a good understanding of the ramifications
of eventual consistency until the paragraph in the "Replication
properties" section, which gave examples of deleted messages
briefly reappearing within the first few seconds, and users
receiving the same messages more than once. This really helped.
- load balancing decisions are based on whether the disk is full
and an integer that represents the number of pending remote
procedure calls, similar to the way the Fox et al. system makes
load balancing decisions based on the length of the queue.
- I was a little confused about the description of the "Platform
and workload" section that refers to front-end nodes and back-end
nodes. Were these interchangeable?
This archive was generated by hypermail 2.1.6 : Wed Feb 25 2004 - 02:55:48 PST