Review: Experience with Grapevine: The Growth of a Distributed System

From: Raz Mathias (razvanma_at_exchange.microsoft.com)
Date: Mon Feb 02 2004 - 16:33:19 PST

  • Next message: David Coleman \(Roxio Inc\): "Grapevine review"

    This paper gives us an account of experiences with the Grapevine system.
    In short, Grapevine can be described as a distributed registry service
    and message queuing management system (with authentication, and access
    control thrown in for good measure). The paper retrospectively analyzes
    deployment problems, strategies for overcoming these issues, and the
    effects taken from experience with the system over the course of a few
    years.

    The mechanics of the basic Grapevine system are relatively
    straightforward. The system is divided into a message system and a
    registry system. The registry consists of two kinds of entries,
    individual entries (representing users and servers) and group entries
    (comprising of a lists of individual entries). The names of each entry
    is divided into a two-level hierarchy consisting of <registry name, name
    local to the registry> pair. Registries names are distributed
    atomically amongst a number of servers (e.g. a server either contains
    all the names in a registry or none). In my opinion, limiting one's
    self to two levels of naming is limiting, arbitrary, and bound to lead
    to problems, as indeed it has for the authors of this paper. Their
    purpose in using only two levels was to impose a structuring on the
    topology of the network itself, forcing a geographic node to map an
    integer number of registries. This idea combined with the demonstrated
    demand for organizational partitioning can become problematic. As the
    size of an organization grows to be larger than a single machine the
    administrator is forced to create an artificial dichotomy in the naming
    system which burdens management of the organization. A similar problem
    (not directly related to the mechanics of the naming system, but
    conceptually related to the scalability of hierarchies) manifested
    itself in the management of large distribution lists where multiple
    levels of a hierarchy would have produced much more scalable results.

    I found the discussion of the mailbox distribution concerns was
    particularly relevant even today. The basic idea was to optimize the
    primary inbox for latency, the secondary inbox for availability and the
    tertiary inbox for unforeseen failures (put it on the "other side of the
    [local] internet"). I do believe that, in addition to the concerns
    expressed, the paper should have placed more emphasis on scalability and
    less on pure latency. I would argue that in a distributed messaging
    system, the goal should be high throughput, and not necessarily low
    latency; a corporation would much rather trade off suboptimal message
    delivery time (increasing from, say two minutes to four minutes in the
    worst case) than scalability (a lack of which imparts load on the
    administrator's part). The concerns raised for registry replicas are
    very relevant today. The ideas of reliability, availability, latency
    concerns, and protection from catastrophic failure are all fundamentally
    important concepts.

    The introduction of the concept of thin-client access was particularly
    interesting to me. Having a rich client can greatly simplify the
    server's guarantees (e.g. the server can force sequential access to
    messages) where as a thin client can force the server to support more
    complex core guarantees (e.g. random access to messages). I had never
    looked at the thick/thin client problem from this level.

    The paper also goes into the administrative surprises of propagation
    delays (all of which are still surprises to administrators twenty years
    later). Despite its numerous positive points (too many of which to give
    individual attention to in this short summary), I believe that the paper
    missed on a few issues. First off, although the registry system is
    distributed, it has now become the central repository of naming,
    location, authentication, and authorization. As such it becomes the
    lifeblood of a corporation (anything but 100% uptime would be
    phenomenally costly). As such, I believe the system should be promoted
    to support awareness and control of the underlying transport mechanisms.
    It should be able to take priority in switching and routing and should
    be able to guarantee a quality of service regardless of whatever else is
    going on the network (the system was explicitly constructed to avoid
    dealing with underlying transport). Next, there is the issue of
    duplicate message detection. I've personally run into this specific
    issue when trying to display a merged view of multiple mailboxes on a
    client device. The ideal solution, if it had already been built into
    the system, would have been to create a globally unique identifier for
    each message. Instead on this system, we would have to do
    property-by-property comparisons, a dangerous game to get into when
    routing servers can potentially add, modify, or delete properties.

    This paper reads like a Microsoft Active Directory/Exchange deployment
    guide, except that it was written twenty years ago. The ideas on
    scalability, reliability, availability, and all the other -abilities
    introduced in this paper really make this a must-read for any server
    applications developer. I personally enjoyed reading the concerns
    raised and hearing the authors' inclinations toward enumerating the
    tradeoffs of various solutions rather than attempting to paint a pure
    black-and-white picture of distributed systems design.

     


  • Next message: David Coleman \(Roxio Inc\): "Grapevine review"

    This archive was generated by hypermail 2.1.6 : Mon Feb 02 2004 - 16:32:54 PST