Experience with Grapevine: The Growth of a Distributed System.

From: Manish Mittal (manishm_at_microsoft.com)
Date: Mon Feb 02 2004 - 14:15:31 PST

  • Next message: Praveen Rao: "Review of Grapevine paper"

    This paper mainly discusses the performance, scalability and reliability
    issues of the Grapevine system. Grapevine is a distributed, replicated
    system that provides message delivery, naming, authentication, resource
    location and access control services in an internet of computers. The
    system was actually designed and implemented years before this paper.
    This paper mainly reports the operational experience with the system
    under substantial load. The discussion focuses on its features of
    message delivery, naming, authentication, resource location, and access
    control as well as what the authors have learned from using it. This
    paper is important because its discussion of grapevine reveals many
    issues that are applicable to any distributed systems design.

     

    The Grapevine system provides two primary services: a) the messaging
    system - accepts messages and buffers them in inboxes b) the
    registration service - provides naming, authentication, access control,
    resource location functions, and distribution list implementation.
    Inboxes and registry information is distributed throughout the network,
    with no single server having all the information and each piece of data
    being replicated several times. Upon a change to the registry, messages
    are sent to the necessary servers so that they can update their registry
    entries. Grapevine uses internet protocols to communicate between
    servers that are distributed across a network. Any server that contains
    a replica of a registry can accept a change to that registry from a
    client. That server takes the responsibility for propagating the change
    to other relevant servers.

     

    One of the most significant features of Grapevine is Scalability.
    Scalability is provided by means of partitioning, which means that users
    are stored in different registries. The system can be scaled by adding
    more servers rather than by increasing power of the existing servers.
    This partitioning system is a simple system that divides users into
    groups and allows the groups to be independent from each other. Two
    major problems affecting scalability of the system are handling of
    distribution lists and the size of underlying internet. Hierarchical
    organization in lists is proposed as a solution to the distribution
    lists problem.

    Another aspect of the Grapevine system is its replication and
    distribution policies. Messages for each user are placed on two inboxes
    on separate servers providing reliable operation in the case of a
    malfunction on one of the servers. Registries are also replicated on
    multiple registration servers. They are placed on both ends of
    unreliable links for guaranteed availability to sites on either side of
    the link. They are also used to prevent disk failures from causing a
    complete loss of the registry information. Other uses of replica in the
    system are function replication, where servers can provide functionality
    to sites closer by. Grapevine addresses scalability problems by trying
    to estimate the load on the system and how much load each server could
    handle. This guideline gives an idea of how many servers are needed.
    Their resource location algorithm uses a nearest-server among eligible
    servers approach. They divide the registration databases in to
    registries to prevent scaling problems, where instead of larger
    registries for a growing user community, they just allocate more
    registries.

     

    Overall, this paper gives good insight into the working of the
    distributed mail system. The author has described the working of the
    system very well. Problems & suggested solutions are also discussed in
    great detail. Some of the problems that Author has listed out in this
    paper are as follows: The delays in propagating registration database
    changes which causes long lasting inconsistencies, deleting names from
    the registries causes high load on the system since all the names needs
    to be removed from the groups that they belong to, changing inbox site
    lists for a user results in high load as well, systems inability in
    dealing with duplicate messages, long delivery times due to the
    inability of a message server in expanding large distribution lists in a
    registry whose nearest replica is far away, inaccessibility of inboxes
    caused due to unavailability of file servers on which the mails are
    automatically archived and high load on the Grapevine servers with
    authentication and access control checks.

    I particularly liked the section describing the Operation of this
    system. In this section, author has pointed out the importance of remote
    monitoring and logging results. Some of the techniques used by this
    system such as viticulturist's entrance facility, dead letter facility
    and logging to solve operating problems are noteworthy.

     

     


  • Next message: Praveen Rao: "Review of Grapevine paper"

    This archive was generated by hypermail 2.1.6 : Mon Feb 02 2004 - 14:16:15 PST