Jim Shearers review of Scale and Performance in a Distributed File System

From: shearerje_at_comcast.net
Date: Sun Feb 22 2004 - 17:28:02 PST

  • Next message: David V. Winkler: "Review: Scale and Performance in a Distributed File System"

    This paper discusses “Andrew”, a distributed computing environment that uses a “Venus” client process talking to a “Vice” server process to provide a distributed file system to a large number of workstations (400 active as of 1989 with a goal of 5000).

    Andrew accesses and locally caches data at the file level (open and close touch the server, read and write are local). The paper discusses several different aspects of file representation and management that impact performance and operability (system management). (1) Making the server notify caching clients when the server copy of a file changes (callback) rather than having each client go back to the server to verify the file on every operation exploits the observation that typically there is only one user changing a file at any given time. (2) Grouping files into “volumes” and exploiting the permanent file id (fid) to find files avoids several problems associated with repeated pathname resolution. Furthermore, if volumes map to users it provides system administrators a useful tool for load leveling servers and budgeting user accounts. (3) Creating a single server or client process on each device, within which a LWT-pool handles individual transactions proved to be much more efficient than starting a s
    eparate long duration process for each client-server binding. (4) Accessing files on the client via inode rather than pathname provided benefits similar to the fid approach on the server.

    I was impressed by the effort to measure actual usage of different file access features in a real-life environment, to analyze discrepancies in the data, and to apply the lessons learned to improve the system. The experiments show that the real driving issues are not always what we expect them to be. Maybe they shouldn’t put all of the bulletin boards on one server.

    I was however concerned by one of the consistency semantics: “Multiple workstations can perform the same operation on a file concurrently ... no implicit locking is performed ... “. What happens when these files are closed? Does the last one closed win?


  • Next message: David V. Winkler: "Review: Scale and Performance in a Distributed File System"

    This archive was generated by hypermail 2.1.6 : Sun Feb 22 2004 - 17:28:08 PST