CFS review

Cem Paya
Wed Mar 03 2004

    Review: CFS

    CSE 551P, Cem Paya


    This paper describes the Cooperative File System, a peer-to-peer
    distribute filesystem designed for Internet scale applications. CFS is
    structed as 3 layers: at the core is the Chord distributed hash table,
    layered on top of that is Dhash which manages storage and retrieval of
    individial blocks, including support for caching/replication. At the top
    level sits the interface which presents the abstraction of an ordinary
    file system, except the semantics are somewhat unusual. Main distinction
    is that it is a read-only system. Each file has an owner and only that
    owner can update. That protection is very strong and provided via
    cryptographic means: the root block is signed with a private key and any
    updates require signing with same key.


    CFS envisions a peer-to-peer system with identical, unmanaged nodes.
    There is no server vs. client distinction, and it's expected that there
    is no administration overhead beyond the one time installation of CFS
    software, a realistic assumption if non-computer-savvy users are to take
    part in the chord network. Each physical servers runs any number of
    virtual servers depending on available space, and each one dedicates
    some space to maintaining content for other nodes. Chord manages the
    mapping of content to node. It is the algorithmic innovation remarkable
    for overcoming a huge challenge: maintain up-to-date data about the
    where-abouts of files even as individual nodes appear and disappear
    without notice, reflecting the connectivity state of users on the
    Internet. Chord provides other guarantess such as even load-balancing
    and quick look ups, where number of nodes queries scales as the
    logarithm of the number of network population. One downside is there is
    no distributed search, which is what put P2P on the map. Existing
    systems such as Gnutella or the Fasttrack system enable querying the
    network but their protocol is extremely primitive compared to Chord.
    Both use inefficient broadcast that floods the network with queries
    because there is no a priori connection between content (eg keys in the
    language of CFS) and nodes hosting them.


    One vague aspect of CFS is the risk model. "C" stands for cooperative
    but there are some measures in place to defend against malicious nodes
    fabricating content, flooding other nodes with replicated data etc. For
    example each key is hashed using a cryptographically secure one-way hash
    algorithm (SHA1 specifically) which makes it computationally infeasible
    to come up with a key that maps to a given ID. Underlying network
    communication uses nonces and echo-back to rule out trivial IP spoofing
    attacks. On the other hand some security concerns are completely placed
    aside-for example the authors argue that anonymous download is out of
    scope because it can be layered on top of CFS using proxying or
    anonymous remailers. Denial-of-service potential is also not very clear:
    nodes are saving data on behalf of other nodes. One-way nature of SHA1
    makes it more difficult to control where a given block lands although
    trial-and-error still works since in practice there are only on the
    order of millions of nodes in the network: that space is still small
    enough to brute force. Even when it is not easy to single out particular
    nodes for attack, all experience to date with P2P systems points at
    widespread free-riding where most content is served by a small number of
    peers while others download at will without contributing. CFS analog for
    this would be people who use the network for storage but contribute only
    very small amount of disk space (or equivalently run very few virtual




