Wide Area Cooperative Storage with CFS

From: Manish Mittal (manishm_at_microsoft.com)
Date: Wed Mar 03 2004 - 16:20:42 PST

  • Next message: Cem Paya: "CFS review"

    This paper talks about CFS which is a distributed, cooperative, read
    only file storage based on blocks. CFS is designed to be scalable,
    efficient, highly available, decentralized and fault tolerant. CFS
    achieves load balanced storage by breaking file into many blocks and
    distributing blocks over the servers. It further balances storage by
    placing virtual servers per physical server.

    CFS consists of three layers. 1) FS: Uses DHash layer to retrieve
    blocks. It also interprets blocks as files. 2) DHash: Uses Chord layer
    to locate CFS Server holding desired blocks. It also stores, caches and
    replicates blocks 3) Chord: Maintains routing table used for locating
    blocks. Publisher inserts the file blocks into CFS system using hash of
    each block contents as its identifier. Also, the root block is signed
    using the publishers private key and the corresponding public key is
    used as its identifier. With this approach, any changes in a block could
    be easily detected, however I am not sure what happens if a complete
    block is replaced. To eliminate this issue, hash of complete file could
    be used as the root block identifier. Also, authors suggest that root
    block is changed for an update. This seems most efficient way to update
    the file. But great care needs to be taken to remove orphaned blocks.

    Authors have run multiple sets of tests to demonstrate the feasibility
    of this file system. Experiment shows that large amounts of pre-fetches
    are not productive since they congest the network. Download speed
    increases substantially with server selection. Experiments also show
    that lookup for a file doesn't fail even when 30% of servers are down.
    By using virtual servers on top of physical server, the amount of data
    that the server must serve can be controlled. This is very effective and
    there is very little memory overhead in running many such virtual
    servers. Caching the blocks along the lookup path also increases the
    efficiency & performance of the system.

    Overall, this is a very interesting approach for designing a file
    system. This paper explains the design of CFS thoroughly and provides
    reasons for various design choices. Experiment results are shown with
    detailed explanations. Since files are stored as blocks, it would have
    been nicer if they had shown a performance comparison between storing
    blocks and complete file. Sometimes better performance could be achieved
    by a mix of two approaches. Some of the features lacking in this file
    system are inserts, updates and directory listing. I understand that
    implementing these features may result in performance bottleneck but
    they are core features for any file systems to have.

     


  • Next message: Cem Paya: "CFS review"

    This archive was generated by hypermail 2.1.6 : Wed Mar 03 2004 - 16:20:31 PST