Review of "Experience with Grapevine" Paper

From: Gail Rahn (gail_at_screaminggeek.com)
Date: Mon Feb 02 2004 - 18:06:03 PST

  • Next message: Reid Wilkes: "Grapevine Review"

    Review of "Experience with Grapevine" by Scroeder, Birrell and Needham

    This paper describes the operation and evolution of Grapevine, a distributed
    system in Xerox PARC that has been in active use for a few years at the
    time of publication. The paper analyzes the actual operation and performance
    of Grapevine and compares and contrasts with the design goals of the
    system.

    As a distributed and replicated system, Grapevine handles message delivery
    (between humans, distribution lists and servers), resource-naming,
    authentication, resource location and access control. Grapevine consists of
    a variable number of servers, each with its own storage. Grapevine is
    design to scale by adding additional servers of fixed power (rather than
    adding servers of increasing power, or by increasing the power of existing
    servers). Additionally, Grapevine is a replicated system. Each Grapevine
    node is authoritative for at least one "registry" of information (a registry
    is analogous to a set of users and servers, and their messages, ACLs, etc).
    The failure of one node in the Grapevine network will not cause information
    loss in the system. And although there are an arbitrary number of Grapevine
    servers on a network, the multiple servers appear to a client as a single
    service.

    Most of the paper is devoted to discussing unexpected aspects of the system.
    There were several parts of the system that didn't scale or function as
    anticipated. Some of these failures were problems in data design, such as
    the distribution list scaling problem. In this situation, when d-lists
    contained several members, distributing one message to the d-list audience
    could take upwards of 10 minutes because the message-delivery computation
    was isolated at one Grapevine server. Distributing the message-delivery
    computation across several servers could alleviate this problem.

    Another scaling issue concerned communication between Grapevine servers
    across disparate networks. Grapevine servers must intercommunicate to
    ensure consistency in replicated data. This communication is implemented as
    direct connections between Grapevine servers. In segmented networks, it
    might not be possible for a Grapevine server to directly connect to every
    other server. However, all servers may be accessible through intermediate
    connections to servers. This idea wasn't implemented in Grapevine but was
    seen as a future modification to allow the system to exist in less-connected
    environments.

    Delays were encountered in updating administrative information across
    servers. For several minutes after an administrative update, the servers
    remained unsynchronized. So, changing a registration database and
    immediately performing an action that depends on the change will probably
    be a slow operation. Also, because the distributed system appears as one
    server to client programs, sometimes a user expects that s/he is connected
    to a local Grapevine server when actually s/he is connected to a more
    remote server. In these cases, sending messages could take significantly
    longer than anticipated.

    This is the second time I have read the Grapevine paper (read it in an
    earlier Distributed Systems class) and I really enjoyed this work. The
    system is, as a whole, strongly designed and implemented to be
    fault-tolerant. A drawback might be that a node is an authoritative source
    for a registry database. When a single registry is significantly larger
    than others, then isn't that node burdened with a larger amount of work?

    -- Gail.

    -------------
    Gail Rahn
    grahn_at_cs.washington.edu


  • Next message: Reid Wilkes: "Grapevine Review"

    This archive was generated by hypermail 2.1.6 : Mon Feb 02 2004 - 20:44:47 PST