"Scale and Performance in a Distributed File System" Review

From: Tarik Nesh-Nash (tarikn_at_microsoft.com)
Date: Mon Feb 23 2004 - 11:54:39 PST

  • Next message: Richard Jackson: "Review: Howard, et al. Scale and Performance in a Distributed File System."

    This paper presents Andrew, the CMU distributed file system and
    discusses its scalability and performance. As an overview, workstations
    run a user-level processes, Venus, that communicate with the server.
    Vine, a set of trusted servers with minimal communication between each
    other represent the server data location. Venus locates the files in
    Vine, initiate a dialog, caches the files and copies them back to the
    server. To improve performance, Read/Write operations are done on the
    cached copy of the file instead of the server's. Venus only contacts
    the server in open/close opearions. This method improves scalability by
    maximizing the work done on the client instead of the servers. At the
    server level (Vine), there was a process dedicated for every Venus
    client. The sharing mechanism was controlled using a user-level lock
    server. The emulation of a 4.2 BSD was successful without any
    recompilation or relinking. However, the implementation of this system
    showed some unexpected slow performance due to extra calls to the
    server. Other problems were related to context switching overhead,
    paging demands and cost of moving files between servers. However, the
    architecture seems to have a great potential. Keeping the same
    architecture, a new implementation is developed and focusing on
    improving the performance. The first improvement was to cache
    management. Venus assumes that the cache is valid unless notified
    (callback), this has two major drawbacks: bottleneck at the server and
    potential inconsistency. This is related to the discussion we had in
    class about "sequential consistency". Also a mechanism is defined to
    uniquely identify files (Fid) and improve directory lookup. As the
    communication level, instead of using a server process for every client,
    there will be one process per server that handles LWPs. This is very
    similar to the thread model.

    These changes showed an important scalability and performance
    improvements which are as good or better than NFS SUN. Progress can
    still be made (e.g. move the logic to the kernel code) while other
    commercial products are already fine tuned.

    My project, Distributed Build, is very related to this paper. When I
    investigated the distributed file system option, I had a very bad
    performance because the file system does not trust the content of the
    cache and fetches the file on the server everytime it is read. I could
    not find a way to control this in NT platform.

     

     


  • Next message: Richard Jackson: "Review: Howard, et al. Scale and Performance in a Distributed File System."

    This archive was generated by hypermail 2.1.6 : Mon Feb 23 2004 - 11:54:48 PST