From: Manish Mittal (manishm_at_microsoft.com)
Date: Wed Mar 03 2004 - 16:20:42 PST
This paper talks about CFS which is a distributed, cooperative, read
only file storage based on blocks. CFS is designed to be scalable,
efficient, highly available, decentralized and fault tolerant. CFS
achieves load balanced storage by breaking file into many blocks and
distributing blocks over the servers. It further balances storage by
placing virtual servers per physical server.
CFS consists of three layers. 1) FS: Uses DHash layer to retrieve
blocks. It also interprets blocks as files. 2) DHash: Uses Chord layer
to locate CFS Server holding desired blocks. It also stores, caches and
replicates blocks 3) Chord: Maintains routing table used for locating
blocks. Publisher inserts the file blocks into CFS system using hash of
each block contents as its identifier. Also, the root block is signed
using the publishers private key and the corresponding public key is
used as its identifier. With this approach, any changes in a block could
be easily detected, however I am not sure what happens if a complete
block is replaced. To eliminate this issue, hash of complete file could
be used as the root block identifier. Also, authors suggest that root
block is changed for an update. This seems most efficient way to update
the file. But great care needs to be taken to remove orphaned blocks.
Authors have run multiple sets of tests to demonstrate the feasibility
of this file system. Experiment shows that large amounts of pre-fetches
are not productive since they congest the network. Download speed
increases substantially with server selection. Experiments also show
that lookup for a file doesn't fail even when 30% of servers are down.
By using virtual servers on top of physical server, the amount of data
that the server must serve can be controlled. This is very effective and
there is very little memory overhead in running many such virtual
servers. Caching the blocks along the lookup path also increases the
efficiency & performance of the system.
Overall, this is a very interesting approach for designing a file
system. This paper explains the design of CFS thoroughly and provides
reasons for various design choices. Experiment results are shown with
detailed explanations. Since files are stored as blocks, it would have
been nicer if they had shown a performance comparison between storing
blocks and complete file. Sometimes better performance could be achieved
by a mix of two approaches. Some of the features lacking in this file
system are inserts, updates and directory listing. I understand that
implementing these features may result in performance bottleneck but
they are core features for any file systems to have.
This archive was generated by hypermail 2.1.6 : Wed Mar 03 2004 - 16:20:31 PST