Review: Howard, et al. Scale and Performance in a Distributed File System.

From: Richard Jackson (richja_at_expedia.com)
Date: Mon Feb 23 2004 - 13:07:02 PST

  • Next message: Brian Milnes: "Howard et al Review"

    This 1988 paper by Howard, et al describes the Andrew File System, which
    was a distributed file system implemented and deployed at Carnegie
    Mellon University in the 1980's. The paper discusses the evolution of
    the system over two generations - an initial prototype and the Andrew
    File System itself.
     
    The paper has the following main sections: 1) general architecture, 2)
    prototype implementation and performance, 3) necessary changes for
    better performance, 4) effect of those changes, 5) comparison to
    remote-open and NFS, 6) changes for operability.
     
    First, the paper describes the Andrew File System, or AFS. This is a
    client-server system for storing files. The main components are Venus,
    which is the client component that is AFS-aware, and Vice, which is the
    server component. Vice is actually a collection of serves that
    together comprise the file system, but to the client they appear to be a
    single service. The global file system is distributed among many Vice
    servers based on expected usage patterns, growth rates, etc. The design
    is focused on scale, and Venus does as much work as possible to prevent
    overloading of the servers.
     
    Prior to building AFS, the designers built a prototype to test their
    basic design and identify any problems. Based on the information from
    the prototype, the designers were able to build a much better version of
    the first AFS. This is Brooks' concept of "Build one to throw away."
    There were many things about the prototype that were lacking, even to
    someone unfamiliar with the system. First, the server created a process
    per connection. Next, the server processes used a strange file-based
    synchronization mechanism. Other problems included limited user
    capability(~20) and inability to relocate directories to different
    servers. On the good side, a key design constraint is something we've
    seen before - compatibility, which was generally successful. That is,
    they wanted to be compatible with BSD 4.2 file system semantics.
    Finally, a benchmark of the prototype measured its performance. In
    general, it performed well, but there were some glaring observations
    made about possible performance enhancements. These included: reducing
    number of client->server cache validations and changing the server to
    not use process-per-connection. Extensive analysis and tables were
    presented, which seemed convincing. However, they regarded all calls
    are costing the same amount, and only used call counts for comparison.
    Surely, all calls are not equally expensive, so count alone is not
    enough. They should have used some combination of #calls + approximate
    cost. Therefore, I question much of their analysis here.
     
    The next section talks in-depth about 4 changes for performance: 1)
    cache management, 2) name resolution, 3) communication and server
    process structure, 4) low-level storage representation. #1 was good, as
    it added server->client callbacks to avoid the constant
    client-validation requests. This adds complexity to the server but is a
    worthwhile tradeoff for performance. #2 moved some of the name
    resolution computation from Vice to Venus. This change did not seem as
    critical as the others, but was useful nonetheless. #3 changed from
    using process-per-connection to lightweight-threads. They chose 5
    threads, but I don't know why. I also don't know the RPC behavior if
    all threads are busy. Does it block or refuse requests? #4 seemed
    similar to #2, but seemed to add some improved caching. I think the key
    changes here were #1 and #3. In this section, they also discussed their
    consistency semantics, which are well-defined and as one would expect.
    This is later contrasted to NFS, which has no explicit consistency
    semantics, or very weak semantics.
     
    The effects of these changes are discussed, and generally they resulted
    in great improvements. The scaling ability was much improved over the
    prototype, and again, many tables of data are presented as proof. Even
    after these changes, they were still CPU-bound, and I could not
    understand why. It seems that eventually you should become disk or
    network bound in a system like this. I suspect that a design flaw was
    overlooked or the hardware was oddly configured/matched.
     
    A comparison was made to NFS, which they called a remote-open file
    system. This method differs from AFS in that AFS copies the whole file
    on open, while NFS copies only the required blocks. Still, AFS
    performed much better than NFS and scaled infinitely better. After
    reading this section, I was convinced that NFS is really broken. A case
    where AFS is weak is when opening a remote file for the first time -
    this process can be very slow for large files.
     
    Finally, some changes for operability were discussed. These included:
    1) creation of logical volumes to help with relocation of files, 2)
    quotas, 3) replication, 4) backup based on copy-on-write(included a very
    interesting idea of putting yesterday's backup into each user's home
    directory). This section gave some good ideas about how to improve an
    already-working system, and these ideas could be applied in many
    projects.
     
    Overall, AFS seems to be a well-designed system that evolved naturally
    over the course of a few years. This paper is an excellent example of
    how to design a system based on the prototype/analysis/redesign concept.

     


  • Next message: Brian Milnes: "Howard et al Review"

    This archive was generated by hypermail 2.1.6 : Mon Feb 23 2004 - 13:07:19 PST