From: Ian King (iking_at_killthewabbit.org)
Date: Mon Feb 23 2004 - 17:47:23 PST
The authors describe the fundamental design of the Andrew file system, to the
end of discussing changes implemented from the original prototype to address
scaling and performance challenges. Both the prototype and its successor were
deployed in a user environment, and metrics are offered both on synthetic
milestones and observed usage.
Andrew was a significant departure from many other distributed file systems in
its caching of entire files on client systems; while this led to certain
limitations cited by the authors, it was by and large effective for the usage of
the system. In their prototype, performance was very sensitive to usage
patterns, which suggested that scaling would be poor; I was not surprised when
they acknowledged that the model of confirming every cached entry on access was
the primary culprit in their performance problems.
This paper was interesting because they made some significant implementation
changes, within the original design, and addressed the performance issues and,
through extension, the scalability problems arising therefrom. As in other
papers on filesystem design, the metadata around a filesystem (in the case of
Andrew, its "cache freshness" assurance) was the limiting factor, not the act of
transferring the actual data. The final algorithm is not unlike certain NUMA
invalidation protocols, except for the assurance in Andrew that someone,
somewhere has a persistent copy. The volume concept allowed for considerable
optimization, and the mechanisms built on that for "cloning" and migration
considerably eased the administration of the overall system. Adding the idea of
'read only' volumes optimized this even further, as OS images and such are
rarely written, and hopefully not by end users (!).
Another interesting aspect was the consistency protocol that the authors felt
was acceptable; it was possible that a persistent copy might actually be out of
date. At the base of this, the researchers had made the decision that this was
acceptable, and that strict synchronization was the duty of the application
layer, not the OS layer.
The authors compared the performance of Andrew with Sun-NFS. This was a
reasonable comparison, but interesting in that I believe the two file systems
had different goals (the authors touched on this tangentially): NFS was intended
primarily to create a 1-1 mapping from a client to a server's file system, while
Andrew was intended to transparently expose a distributed file system that was
resident across many "servers" (actually, local stores). Interestingly, while
Andrew's consistency story was more deterministic than that of NFS, that feature
did not seem to be a performance liability.
Approximately ten years after this paper was written, I became acquainted with
some researchers from CMU who were just then promoting a replacement for Andrew.
It is encouraging that this experiment was actually in regular use by a large
community over that length of time.
This archive was generated by hypermail 2.1.6 : Mon Feb 23 2004 - 18:13:57 PST