From: Richard Jackson (richja_at_expedia.com)
Date: Mon Feb 23 2004 - 13:07:02 PST
This 1988 paper by Howard, et al describes the Andrew File System, which
was a distributed file system implemented and deployed at Carnegie
Mellon University in the 1980's. The paper discusses the evolution of
the system over two generations - an initial prototype and the Andrew
File System itself.
The paper has the following main sections: 1) general architecture, 2)
prototype implementation and performance, 3) necessary changes for
better performance, 4) effect of those changes, 5) comparison to
remote-open and NFS, 6) changes for operability.
First, the paper describes the Andrew File System, or AFS. This is a
client-server system for storing files. The main components are Venus,
which is the client component that is AFS-aware, and Vice, which is the
server component. Vice is actually a collection of serves that
together comprise the file system, but to the client they appear to be a
single service. The global file system is distributed among many Vice
servers based on expected usage patterns, growth rates, etc. The design
is focused on scale, and Venus does as much work as possible to prevent
overloading of the servers.
Prior to building AFS, the designers built a prototype to test their
basic design and identify any problems. Based on the information from
the prototype, the designers were able to build a much better version of
the first AFS. This is Brooks' concept of "Build one to throw away."
There were many things about the prototype that were lacking, even to
someone unfamiliar with the system. First, the server created a process
per connection. Next, the server processes used a strange file-based
synchronization mechanism. Other problems included limited user
capability(~20) and inability to relocate directories to different
servers. On the good side, a key design constraint is something we've
seen before - compatibility, which was generally successful. That is,
they wanted to be compatible with BSD 4.2 file system semantics.
Finally, a benchmark of the prototype measured its performance. In
general, it performed well, but there were some glaring observations
made about possible performance enhancements. These included: reducing
number of client->server cache validations and changing the server to
not use process-per-connection. Extensive analysis and tables were
presented, which seemed convincing. However, they regarded all calls
are costing the same amount, and only used call counts for comparison.
Surely, all calls are not equally expensive, so count alone is not
enough. They should have used some combination of #calls + approximate
cost. Therefore, I question much of their analysis here.
The next section talks in-depth about 4 changes for performance: 1)
cache management, 2) name resolution, 3) communication and server
process structure, 4) low-level storage representation. #1 was good, as
it added server->client callbacks to avoid the constant
client-validation requests. This adds complexity to the server but is a
worthwhile tradeoff for performance. #2 moved some of the name
resolution computation from Vice to Venus. This change did not seem as
critical as the others, but was useful nonetheless. #3 changed from
using process-per-connection to lightweight-threads. They chose 5
threads, but I don't know why. I also don't know the RPC behavior if
all threads are busy. Does it block or refuse requests? #4 seemed
similar to #2, but seemed to add some improved caching. I think the key
changes here were #1 and #3. In this section, they also discussed their
consistency semantics, which are well-defined and as one would expect.
This is later contrasted to NFS, which has no explicit consistency
semantics, or very weak semantics.
The effects of these changes are discussed, and generally they resulted
in great improvements. The scaling ability was much improved over the
prototype, and again, many tables of data are presented as proof. Even
after these changes, they were still CPU-bound, and I could not
understand why. It seems that eventually you should become disk or
network bound in a system like this. I suspect that a design flaw was
overlooked or the hardware was oddly configured/matched.
A comparison was made to NFS, which they called a remote-open file
system. This method differs from AFS in that AFS copies the whole file
on open, while NFS copies only the required blocks. Still, AFS
performed much better than NFS and scaled infinitely better. After
reading this section, I was convinced that NFS is really broken. A case
where AFS is weak is when opening a remote file for the first time -
this process can be very slow for large files.
Finally, some changes for operability were discussed. These included:
1) creation of logical volumes to help with relocation of files, 2)
quotas, 3) replication, 4) backup based on copy-on-write(included a very
interesting idea of putting yesterday's backup into each user's home
directory). This section gave some good ideas about how to improve an
already-working system, and these ideas could be applied in many
projects.
Overall, AFS seems to be a well-designed system that evolved naturally
over the course of a few years. This paper is an excellent example of
how to design a system based on the prototype/analysis/redesign concept.
This archive was generated by hypermail 2.1.6 : Mon Feb 23 2004 - 13:07:19 PST