Howard et al Review

From: Brian Milnes (brianmilnes_at_qwest.net)
Date: Mon Feb 23 2004 - 15:16:46 PST

  • Next message: Chuck Reeves: "Scale and Performance in a Distributed File System."

    Scale and Performance in a Distributed File System - Howard et al

                The authors study the Andrew File system's performance in its
    prototype and its first deployment at CMU. They started with measurements
    of their prototype with 100 users and six servers and showed uneven server
    load, high server load and that authorization and file stat calls were
    consuming most of the server access time. The prototype only supported 20
    users per server due in part to RPC costs, one process per client and
    communicating between all processes using files as 4.2 BSD had no other
    mechanism.

                The authors changed caching, name resolution, server structure
    and their low level representations. AFS kept whole files cached locally and
    wrote them back at file close time. They added caching of status and the
    directory structure and modified the directory structure on the server as it
    was modified. They also required the server to inform them when a file
    changed so they did not have to check on each open to a file.

                The name resolution system was changed to map files to unique
    file identifiers which could move between servers. A VolumeLocationDatabase
    was added to locate files on servers using this Fid. The process per client
    structure was replaced with a single process running lightweight coroutines.
    This allowed global data structures to be stored in RAM instead of shared
    only on disk. They exposed a set of calls to allow the client and the server
    to represent the files in an inode-like structure to minimize expensive path
    walking.

                The reimplementation reduced the workstation file access penalty
    from about 70% slower than local storage to about 20%. The servers now
    handled 20 units of their synthetic load, or about 100 users, with 70% CPU
    utilization. The authors then compared this to NFS's remote open style file
    system. They broke NFS at high loads due to lost retransmissions on UDP. The
    NFS servers were CPU saturated and nearly disk saturated at a load of 18
    while the Andrew servers had less than half of the CPU in use and a tiny
    fraction of the disk with 1/3 the packets. NFS's only advantage was latency
    on partial file reads.

                The authors added volumes to group sub-trees of the file systems
    and allowed their motion between servers, quotas and read only replication.
    Having spent years as a client of AFS, I have to say that it was a wonderful
    step forward. Its performance was great for all of my applications but it
    was too difficult to administer. Backups were unreliable and volume
    movements were so difficult that they were almost never done and its expense
    has made it not used very much in industry. We're all still using the
    antique and slow NFS which is quite a disappointment.


  • Next message: Chuck Reeves: "Scale and Performance in a Distributed File System."

    This archive was generated by hypermail 2.1.6 : Mon Feb 23 2004 - 15:17:43 PST