From: Honghai Liu (liu789_at_hotmail.com)
Date: Tue Mar 02 2004 - 23:48:29 PST
Review by: Honghai Liu
CFS is a peer-to-peer read-only file storage system with high availability,
scalability, load balance and robustness.
Unlike PAST, CFS system provides file block level storage granularity
instead of storage of the whole file. In addition, CFS is read only as far
as the client is concerned. The focus is on how to find the desired data
fast, tolerant fail-stop servers and load balance among servers.
There are three layers in CFS system. The Chord, at the bottom maintains
routing tables used to find blocks. Then DHash stores unstructured data
blocks reliably. At the top level, FS interprets block as files and serves
as an interface to applications.
In Chord, hash algorithm is used to map from block identifiers to server
identifiers. Each sever has a 160-bit identifier space as well as block identifier.
These identifiers can be thought of as points on a circle. The relationship
between the block's ID (source) and the server' ID ( destination) is achieved
through successor, which is server ID immediately after (hash function of)
block's ID. Obviously, given a specific block ID, the successor can change
while joining, leaving of servers or failure of the network happen. And in fact,
the CFS can tolerant such dynamic changes quite nicely.
It is interesting to see that CFS uses the public key as the root's (of file system)
identifier. It naturally combines security with the identifier space and results in
fast lookup. Generally, two data structures are used for lookup, a successor
list for correctness and finger table for accuracy. The algorithm guarantees
that the lookup process is of O(log N) (desirable scalability). Sever selection
in Chord is used to reduce the lookup latency to let the client contact the
closest nodes in the network.
In DHash layer, replication, caching, load balance and quotas are achieved.
Block data is replicated at the k servers immediately after the successors on
the Chord ring. And the block will be cached at the servers along the lookup
path to increase the chance of a hit. Hash function's uniformity ensures the
load balance if the servers are homogenous, and the notion of virtual servers
is the tool for the administrators to tune the difference in a heterogeneous
server environment to gain load balance. Quotas is an important way to
guard against malicious attack to fill up the storage of system. One unique thing
in CFS is that the there is no explicit deletion because all the data published
have expiry time.
CFS is an important system because its completeness in achieving as many
goals in a P2P system. Both live data and simulation prove its claimed features,
especially, it can perform reasonably well under the condition that nearly half
of the nodes fail. However, preventing Byzantine failures is more desirable,
given today's Internet is far more from a perfect distributed environment.
This archive was generated by hypermail 2.1.6 : Tue Mar 02 2004 - 23:48:32 PST