From: Richard Jackson (richja_at_expedia.com)
Date: Sun Feb 01 2004 - 13:13:02 PST
This 1984 paper by Schroeder, Birrell and Needham describes Grapevine, a
distributed system that is primary used for email services, but also
includes functionality for a general naming service and access control.
This paper is written from a somewhat high-level operational aspect,
giving some analysis and general guidance for running a Grapevine
system. In that aspect, it almost seemed like a system administrator's
guidebook. However, there are also many design details/enhancements
mentioned in the paper, which could be used to build other systems based
on the general Grapevine design.
Note that Grapevine was mentioned in the preceding paper[1], as a
distributed means for storing RPC binding information.
The paper is divided into two main sections, 1) overview of system, 2)
operational issues. The 2nd part comprises most of the paper.
In the first section, Grapevine is introduced. The basic components of
Grapevine are a messaging service and a registration service, each of
which runs on every Grapevine server. The former is used to deliver
arbitrary messages(generally email) and the latter is used to provide a
generic, hierarchical naming service(with distributed registrars, like
DNS). At the time of the writing, there were 17 servers deployed within
the Xerox internet. The paper discusses a good example of the naming
service - the common idea of email users and distribution lists(groups
of users and/or other distribution lists). Overall, Grapevine is
described as a "replicated, distributed system that provides message
delivery, naming, authentication, resource location, and access control
services to an internet." To me, this description is overwhelming. It
seems that the system is trying to serve too many functions.
Thankfully, the paper acknowledges that Grapevine is mainly focused on
mailing services. While these other services are somewhat related(i.e.:
naming is required for email delivery), I think that the others are
beyond the scope of this system. I think that the designers of this
system were trying to build a generic system that could handle any type
of data. While this is an admirable goal, I think they would be better
served by focusing on a specific design domain. The paper hints at
this in section 9, where they tell about an IC manufacturing operation
that uses Grapevine. Here, the email needs of Grapevine sometimes
overwhelm the servers, preventing the specialized needs from being met.
Perhaps Grapevine could be used in this case, but I think the
specialized system should be isolated from other orthogonal Grapevine
systems.
In the next section, many operational issues were discussed. This
section was extremely thorough and covered all the critical topics, such
as 1)scaling, 2)configuration of the system, 3) transparency of
distributed system(user sees a single logical service), 4) modifying
design to accommodate unexpected load, 5)remote access and operation of
a distributed system, 6) reliability issues. One key issue that was not
resolved is the fact that each server node had limited disk space, which
unscrupulous users could easily abuse(ie: forgetting to delete old
messages). To me, it seems that a simple pessimistic quota system could
have prevented this problem, which would solve many of the issues raised
in this paper. I also did not like their idea of constantly
re-analyzing the system to find ways to configure it for better
performance. I think that a better design would eliminate the need for
this ongoing management and tuning.
One weakness of this design is that the authors did not plan to scale
beyond approximately 10000 users. This seems to be an arbitrarily low
number, and simple design changes would have allowed them to scale well
beyond this. In the conclusion, the authors say that a commercial
version of this system at Xerox will include the necessary
changes(change naming hierarchy from 2 levels to 3 levels) to allow a
larger user base.
The main strength of this paper was its incredible thoroughness. These
people considered so many aspects of the system that it's hard to
believe that the paper is 20 years old. Many of these problems still
plague systems of today, while the authors of this paper seemed to have
developed reasonable solutions in 1984. For example, their discussion
of remote access, debugging, and the repair of disk corruption via a
terminal interface is great. How many modern systems allow this level
of remote administration in cases of partial failure? Also, their
general remote-monitoring interface seems like a useful addition to
modern systems, which is usually only added as an afterthought.
This paper also pointed out two key things that we should keep in mind:
1) it is hard to make major changes to a system after it has been widely
adopted - the initial design must be very good, 2) even the
experts(designers/developers) of a system slowly lose familiarity with
the system over time, further preventing ongoing analysis and re-design.
This underscores the importance of building it right the first time;
often there is not a good opportunity to go back and rebuild.
Overall, this system seemed to be a mixture of modern internet email
systems and the DNS naming facility[2]. This pioneering work at Xerox
surely had a large influence on later systems such as the SMTP and DNS
standards.
[1] Andrew D. Birrell and Bruce Jay Nelson. Implementing Remote
Procedure Calls.
[2] P.V. Mockapetris and K.J. Dunlap, Development of the Domain Name
System.
This archive was generated by hypermail 2.1.6 : Sun Feb 01 2004 - 13:13:15 PST