Review: CFS

From: Honghai Liu (liu789_at_hotmail.com)
Date: Tue Mar 02 2004 - 23:48:29 PST

  • Next message: Brian Milnes: "CFS Review"

    Review by: Honghai Liu

    CFS is a peer-to-peer read-only file storage system with high availability,
    scalability, load balance and robustness.

    Unlike PAST, CFS system provides file block level storage granularity
    instead of storage of the whole file. In addition, CFS is read only as far
    as the client is concerned. The focus is on how to find the desired data
    fast, tolerant fail-stop servers and load balance among servers.

    There are three layers in CFS system. The Chord, at the bottom maintains
     routing tables used to find blocks. Then DHash stores unstructured data
    blocks reliably. At the top level, FS interprets block as files and serves
    as an interface to applications.

    In Chord, hash algorithm is used to map from block identifiers to server
    identifiers. Each sever has a 160-bit identifier space as well as block identifier.
    These identifiers can be thought of as points on a circle. The relationship
    between the block's ID (source) and the server' ID ( destination) is achieved
    through successor, which is server ID immediately after (hash function of)
    block's ID. Obviously, given a specific block ID, the successor can change
     while joining, leaving of servers or failure of the network happen. And in fact,
     the CFS can tolerant such dynamic changes quite nicely.

    It is interesting to see that CFS uses the public key as the root's (of file system)
    identifier. It naturally combines security with the identifier space and results in
    fast lookup. Generally, two data structures are used for lookup, a successor
     list for correctness and finger table for accuracy. The algorithm guarantees
    that the lookup process is of O(log N) (desirable scalability). Sever selection
    in Chord is used to reduce the lookup latency to let the client contact the
    closest nodes in the network.

    In DHash layer, replication, caching, load balance and quotas are achieved.
    Block data is replicated at the k servers immediately after the successors on
     the Chord ring. And the block will be cached at the servers along the lookup
     path to increase the chance of a hit. Hash function's uniformity ensures the
    load balance if the servers are homogenous, and the notion of virtual servers
    is the tool for the administrators to tune the difference in a heterogeneous
    server environment to gain load balance. Quotas is an important way to
    guard against malicious attack to fill up the storage of system. One unique thing
     in CFS is that the there is no explicit deletion because all the data published
    have expiry time.

    CFS is an important system because its completeness in achieving as many
    goals in a P2P system. Both live data and simulation prove its claimed features,
     especially, it can perform reasonably well under the condition that nearly half
    of the nodes fail. However, preventing Byzantine failures is more desirable,
     given today's Internet is far more from a perfect distributed environment.


  • Next message: Brian Milnes: "CFS Review"

    This archive was generated by hypermail 2.1.6 : Tue Mar 02 2004 - 23:48:32 PST