From: Manish Mittal (manishm_at_microsoft.com)
Date: Mon Feb 02 2004 - 14:15:31 PST
This paper mainly discusses the performance, scalability and reliability
issues of the Grapevine system. Grapevine is a distributed, replicated
system that provides message delivery, naming, authentication, resource
location and access control services in an internet of computers. The
system was actually designed and implemented years before this paper.
This paper mainly reports the operational experience with the system
under substantial load. The discussion focuses on its features of
message delivery, naming, authentication, resource location, and access
control as well as what the authors have learned from using it. This
paper is important because its discussion of grapevine reveals many
issues that are applicable to any distributed systems design.
The Grapevine system provides two primary services: a) the messaging
system - accepts messages and buffers them in inboxes b) the
registration service - provides naming, authentication, access control,
resource location functions, and distribution list implementation.
Inboxes and registry information is distributed throughout the network,
with no single server having all the information and each piece of data
being replicated several times. Upon a change to the registry, messages
are sent to the necessary servers so that they can update their registry
entries. Grapevine uses internet protocols to communicate between
servers that are distributed across a network. Any server that contains
a replica of a registry can accept a change to that registry from a
client. That server takes the responsibility for propagating the change
to other relevant servers.
One of the most significant features of Grapevine is Scalability.
Scalability is provided by means of partitioning, which means that users
are stored in different registries. The system can be scaled by adding
more servers rather than by increasing power of the existing servers.
This partitioning system is a simple system that divides users into
groups and allows the groups to be independent from each other. Two
major problems affecting scalability of the system are handling of
distribution lists and the size of underlying internet. Hierarchical
organization in lists is proposed as a solution to the distribution
lists problem.
Another aspect of the Grapevine system is its replication and
distribution policies. Messages for each user are placed on two inboxes
on separate servers providing reliable operation in the case of a
malfunction on one of the servers. Registries are also replicated on
multiple registration servers. They are placed on both ends of
unreliable links for guaranteed availability to sites on either side of
the link. They are also used to prevent disk failures from causing a
complete loss of the registry information. Other uses of replica in the
system are function replication, where servers can provide functionality
to sites closer by. Grapevine addresses scalability problems by trying
to estimate the load on the system and how much load each server could
handle. This guideline gives an idea of how many servers are needed.
Their resource location algorithm uses a nearest-server among eligible
servers approach. They divide the registration databases in to
registries to prevent scaling problems, where instead of larger
registries for a growing user community, they just allocate more
registries.
Overall, this paper gives good insight into the working of the
distributed mail system. The author has described the working of the
system very well. Problems & suggested solutions are also discussed in
great detail. Some of the problems that Author has listed out in this
paper are as follows: The delays in propagating registration database
changes which causes long lasting inconsistencies, deleting names from
the registries causes high load on the system since all the names needs
to be removed from the groups that they belong to, changing inbox site
lists for a user results in high load as well, systems inability in
dealing with duplicate messages, long delivery times due to the
inability of a message server in expanding large distribution lists in a
registry whose nearest replica is far away, inaccessibility of inboxes
caused due to unavailability of file servers on which the mails are
automatically archived and high load on the Grapevine servers with
authentication and access control checks.
I particularly liked the section describing the Operation of this
system. In this section, author has pointed out the importance of remote
monitoring and logging results. Some of the techniques used by this
system such as viticulturist's entrance facility, dead letter facility
and logging to solve operating problems are noteworthy.
This archive was generated by hypermail 2.1.6 : Mon Feb 02 2004 - 14:16:15 PST