From: Gail Rahn (gail_at_screaminggeek.com)
Date: Mon Feb 02 2004 - 18:06:03 PST
Review of "Experience with Grapevine" by Scroeder, Birrell and Needham
This paper describes the operation and evolution of Grapevine, a distributed
system in Xerox PARC that has been in active use for a few years at the
time of publication. The paper analyzes the actual operation and performance
of Grapevine and compares and contrasts with the design goals of the
system.
As a distributed and replicated system, Grapevine handles message delivery
(between humans, distribution lists and servers), resource-naming,
authentication, resource location and access control. Grapevine consists of
a variable number of servers, each with its own storage. Grapevine is
design to scale by adding additional servers of fixed power (rather than
adding servers of increasing power, or by increasing the power of existing
servers). Additionally, Grapevine is a replicated system. Each Grapevine
node is authoritative for at least one "registry" of information (a registry
is analogous to a set of users and servers, and their messages, ACLs, etc).
The failure of one node in the Grapevine network will not cause information
loss in the system. And although there are an arbitrary number of Grapevine
servers on a network, the multiple servers appear to a client as a single
service.
Most of the paper is devoted to discussing unexpected aspects of the system.
There were several parts of the system that didn't scale or function as
anticipated. Some of these failures were problems in data design, such as
the distribution list scaling problem. In this situation, when d-lists
contained several members, distributing one message to the d-list audience
could take upwards of 10 minutes because the message-delivery computation
was isolated at one Grapevine server. Distributing the message-delivery
computation across several servers could alleviate this problem.
Another scaling issue concerned communication between Grapevine servers
across disparate networks. Grapevine servers must intercommunicate to
ensure consistency in replicated data. This communication is implemented as
direct connections between Grapevine servers. In segmented networks, it
might not be possible for a Grapevine server to directly connect to every
other server. However, all servers may be accessible through intermediate
connections to servers. This idea wasn't implemented in Grapevine but was
seen as a future modification to allow the system to exist in less-connected
environments.
Delays were encountered in updating administrative information across
servers. For several minutes after an administrative update, the servers
remained unsynchronized. So, changing a registration database and
immediately performing an action that depends on the change will probably
be a slow operation. Also, because the distributed system appears as one
server to client programs, sometimes a user expects that s/he is connected
to a local Grapevine server when actually s/he is connected to a more
remote server. In these cases, sending messages could take significantly
longer than anticipated.
This is the second time I have read the Grapevine paper (read it in an
earlier Distributed Systems class) and I really enjoyed this work. The
system is, as a whole, strongly designed and implemented to be
fault-tolerant. A drawback might be that a node is an authoritative source
for a registry database. When a single registry is significantly larger
than others, then isn't that node burdened with a larger amount of work?
-- Gail.
-------------
Gail Rahn
grahn_at_cs.washington.edu
This archive was generated by hypermail 2.1.6 : Mon Feb 02 2004 - 20:44:47 PST