Howard et al Review

From: Brian Milnes (brianmilnes_at_qwest.net)
Date: Mon Feb 23 2004 - 15:16:46 PST

Next message: Chuck Reeves: "Scale and Performance in a Distributed File System."

Previous message: Richard Jackson: "Review: Howard, et al. Scale and Performance in a Distributed File System."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

Scale and Performance in a Distributed File System - Howard et al

The authors study the Andrew File system's performance in its
prototype and its first deployment at CMU. They started with measurements
of their prototype with 100 users and six servers and showed uneven server
load, high server load and that authorization and file stat calls were
consuming most of the server access time. The prototype only supported 20
users per server due in part to RPC costs, one process per client and
communicating between all processes using files as 4.2 BSD had no other
mechanism.

The authors changed caching, name resolution, server structure
and their low level representations. AFS kept whole files cached locally and
wrote them back at file close time. They added caching of status and the
directory structure and modified the directory structure on the server as it
was modified. They also required the server to inform them when a file
changed so they did not have to check on each open to a file.

The name resolution system was changed to map files to unique
file identifiers which could move between servers. A VolumeLocationDatabase
was added to locate files on servers using this Fid. The process per client
structure was replaced with a single process running lightweight coroutines.
This allowed global data structures to be stored in RAM instead of shared
only on disk. They exposed a set of calls to allow the client and the server
to represent the files in an inode-like structure to minimize expensive path
walking.

The reimplementation reduced the workstation file access penalty
from about 70% slower than local storage to about 20%. The servers now
handled 20 units of their synthetic load, or about 100 users, with 70% CPU
utilization. The authors then compared this to NFS's remote open style file
system. They broke NFS at high loads due to lost retransmissions on UDP. The
NFS servers were CPU saturated and nearly disk saturated at a load of 18
while the Andrew servers had less than half of the CPU in use and a tiny
fraction of the disk with 1/3 the packets. NFS's only advantage was latency
on partial file reads.

The authors added volumes to group sub-trees of the file systems
and allowed their motion between servers, quotas and read only replication.
Having spent years as a client of AFS, I have to say that it was a wonderful
step forward. Its performance was great for all of my applications but it
was too difficult to administer. Backups were unreliable and volume
movements were so difficult that they were almost never done and its expense
has made it not used very much in industry. We're all still using the
antique and slow NFS which is quite a disappointment.

Next message: Chuck Reeves: "Scale and Performance in a Distributed File System."

Previous message: Richard Jackson: "Review: Howard, et al. Scale and Performance in a Distributed File System."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.6 : Mon Feb 23 2004 - 15:17:43 PST