Scale and Performance in a Distributed File System.

From: Manish Mittal (manishm_at_microsoft.com)
Date: Mon Feb 23 2004 - 17:23:25 PST

  • Next message: Steve Arnold: "Review: Howard, et.al., Scale and Perf in DFS"

    In this paper, author discusses the implementation issues related to the
    scale and performance in Andrew's distributed file system which was
    developed at CMU. Andrew is a distributed file system designed to be
    used over 5000 to 10000 nodes.

    There are two parts to Andrew: Vice and Venus. Vice is a set of servers
    which present a homogeneous, location-transparent file name space to all
    client workstations. Venus is a process that caches files retrieved from
    Vice on the local machine accessing the file and stores modified copies
    of files back on the servers. Venus talks with the Vice when files are
    opened and closed. The key idea in Andrew is that reads and writes are
    done to a local copy of the file in a cache. A cache entry is always
    validated against the server file by making a call to get the timestamp
    of the file on the server. The directory contains stub directories for
    files that were located on another server. A search for a file not on a
    particular server would end up in a Stub entry that would point to the
    actual server. As the author points out, this design made it difficult
    to move files to different servers.

    Authors discuss a prototype implementation of AFS, its shortcomings and
    then based on the performance study of the prototype suggests several
    performance improvements by altering some implementations. These are:1)
    Instead of caching only the file data, the client caches contents of
    directories and symbolic links. The client-side cache allows significant
    gains in performance over remote accesses. 2) Reduce cache validity
    checks. Callback mechanism are used to keep the server and local copies
    in synchronize. This reduces the traffic caused by the client checking
    file timestamp. However it is not quite scalable if many client
    registered callback on the server. 3) Move the name resolution from the
    server to the client. The client maps the pathname to a globally
    recognizable Fid, server uses this Fid to find the data location. For
    this a fid to data location mapping database needs to exist on each
    server. 4) Reduce number of Server processes. Now Server uses thread
    (LWP) instead of process to handle each client, in order to reduce the
    context switch cost. 5) Volume data structure is introduced to improve
    the operability.

    Overall, the idea of using application for creating a distributed file
    system over standard file system was very enticing.

     


  • Next message: Steve Arnold: "Review: Howard, et.al., Scale and Perf in DFS"

    This archive was generated by hypermail 2.1.6 : Mon Feb 23 2004 - 17:23:30 PST