Review of "Scale and Performance in a Distributed File System"

From: Jeff Duzak (jduzak_at_exchange.microsoft.com)
Date: Sun Feb 22 2004 - 21:24:43 PST

  • Next message: Praveen Rao: "Review of Distributed File Systems paper"

    This paper is very much reminiscent of the Grapevine paper, in that it
    discusses a number of issues arising from the growth of a distributed
    system. This paper describes Andrew, a distributed file system
    developed with scalability in mind. A prototype of the system was
    implemented and tested first, and the lessons learned from the prototype
    were applied to a more complete version. The final system was compared
    to Sun's NFS system, and was shown to scale considerably better than
    NFS.
     
    A design choice that is stressed in the paper is the decision to cache
    shared files on the client when that file is open. Because of this
    caching, the server needs to be contacted only when a file is opened or
    closed, not for each read and write. Testing showed that this decision
    certainly improves performance in benchmarked scenarios. However, it
    also causes greater up-front latency compared with a system that
    contacts the server for each read or write. Further, no guarantees are
    made regarding simultaneous writes to a file. Presumably, if multiple
    workstations modify a file simultaneously, whichever workstation closes
    the file last will blast over the other workstations' changes.
     
    A number of issues were highlighted by thorough testing of the
    prototype, and certain changes were made to improve performance in the
    production system. For example, in the prototype, a new process was
    spun off for each client to serve that client's requests. This caused a
    lot of CPU time to be wasted in context switches, and also prevented
    sharing of data between request processors. In the production system,
    lightweight threads were used instead of processes, resulting in much
    less overhead.
     
    The goal of the system was to be able to service 50 users with each
    server, and apparently this goal was attained, with the assumption that
    the benchmark used for testing was equivalent to typical use by 5 users.
    Compared with the prototype, the production system showed much more
    uniform performance with increasing load, validating the design changes
    made for the production system.
     
    Andrew shows some of the same limitations as Grapevine. For instance, a
    database of volume names and locations must be maintained on each server
    in Andrew, just as such a database of registries had to be maintained on
    each server in Grapevine. Load balancing was done manually through
    distribution of volumes among servers. Grapevine appears to have had
    better replication. Each registry in Grapevine was replicated on one or
    two other servers. In Andrew, only some system volumes are replicated
    in a read-only manner. Most volumes are not replicated.
     
    The authors acknowledge that the system likely will only scale well up
    to 500 to 700 workstations. More recent (research) systems, such as
    CFS, are designed to scale indefinitely.


  • Next message: Praveen Rao: "Review of Distributed File Systems paper"

    This archive was generated by hypermail 2.1.6 : Sun Feb 22 2004 - 21:25:05 PST