Review: Wide-Area Cooperative Storage with CFS

From: Raz Mathias (razvanma_at_exchange.microsoft.com)
Date: Wed Mar 03 2004 - 17:44:24 PST

  • Next message: Ian King: "Review: Dabek et al., Wide Area Cooperative Storage wth CFS"

    This paper presented a new peer-to-peer file system consisting of a file
    management layer called DHash and the corresponding block-lookup
    algorithm called Chord. Together the entire system consists of CFS, the
    Cooperative File System.

     

    The first component in CFS is the high-level file system abstraction
    implemented by DHash. The main purpose of DHash is to present a file
    system to clients of the network by managing the blocks stored by Chord.
    DHash caches blocks on the lookup path and replicates blocks to maintain
    fault tolerance. Similar to PAST, a node on the lookup path can cache
    the blocks it sees; an LRU algorithm is utilized for cache content
    replacement. Replication occurs by simply storing blocks in the k
    successors of the current node in the node id space. Just like in
    Pastry, subsequent nodes in the ID space do not have similar network
    characteristics; therefore they will tend to fail independently and
    fault-tolerance is achieved. Furthermore, the selection of replica
    nodes when a primary goes down is not special-cased in the lookup
    algorithm, simplifying the code significantly. Storage load is balanced
    by the use of individual blocks. This is a significant place in which
    PAST and CFS differ significantly. Large files can be better load
    balanced across the system with CFS than with PAST as we no longer have
    the problem of finding space large enough on any particular machine; in
    other words, utilization is improved in the PAST system. The paper
    makes clear the difference between caching and replication. Caching
    cannot make guarantees about the number of servers storing the content
    but are stored along the lookup path so are easy to locate; therefore
    the cache is better suited for performance purposes. Replication can
    guarantee a predictable number of copies but they are not as readily
    accessible as cached copies, which makes replication ideal for
    fault-tolerance. Capacity in the network is managed by a combination of
    quotas and the concept of virtual servers. Quotas are maintained by
    allowing any given client to only store a maximum percentage of the
    total space available. The virtual server is used to break a large
    machine into many small ones, thereby balancing load more evenly across
    nodes. Finally, to improve performance, precaching is done at the
    clients at the DHash level.

     

    Chord is the block-lookup algorithm for CFS. The basic algorithm maps
    block identifiers to server IP addresses. Chord is completely oblivious
    with respect to DHash. Each block stored by Chord is either a piece of
    file or a directory. The unique identifiers act as pointers to blocks
    in the DHash file system. The identifier space is circular in nature;
    each node knows about its neighbors. When a node enters a network, the
    correct subset of the node's predecessor blocks is stored in the node.
    When it leaves the network, the predecessor again controls those blocks.
    A structure called the "finger table" is utilized to make larger jumps
    around the identifier circle. To prevent attacks, a node cannot decide
    where it will join in the network. Its location is determined by a
    secure hash of its IP address. It is interesting to note how Chord
    depends upon randomness in the node addition for security and Pastry
    depends on randomness in the routing algorithm itself to protect itself
    from attacks.

     

    A novel aspect of this system (when compared with PAST) is the fact that
    storage and location were separated from the abstract file system. The
    idea of breaking this abstraction down makes me wonder whether a
    distributed object or distributed virtual memory system could be
    implemented over a peer-to-peer network just as easily as the file
    system. This subtle teasing apart (as opposed to integration) of issues
    seems to be what has made many of the papers in this class important and
    successful. Perhaps one difference between a good engineer and a good
    researcher is that the engineer integrates existing ideas whereas a
    researcher can differentiate between an idea's distinct constituents.

     


  • Next message: Ian King: "Review: Dabek et al., Wide Area Cooperative Storage wth CFS"

    This archive was generated by hypermail 2.1.6 : Wed Mar 03 2004 - 17:45:03 PST