From: Raz Mathias (razvanma_at_exchange.microsoft.com)
Date: Wed Mar 03 2004 - 17:44:24 PST
This paper presented a new peer-to-peer file system consisting of a file
management layer called DHash and the corresponding block-lookup
algorithm called Chord. Together the entire system consists of CFS, the
Cooperative File System.
The first component in CFS is the high-level file system abstraction
implemented by DHash. The main purpose of DHash is to present a file
system to clients of the network by managing the blocks stored by Chord.
DHash caches blocks on the lookup path and replicates blocks to maintain
fault tolerance. Similar to PAST, a node on the lookup path can cache
the blocks it sees; an LRU algorithm is utilized for cache content
replacement. Replication occurs by simply storing blocks in the k
successors of the current node in the node id space. Just like in
Pastry, subsequent nodes in the ID space do not have similar network
characteristics; therefore they will tend to fail independently and
fault-tolerance is achieved. Furthermore, the selection of replica
nodes when a primary goes down is not special-cased in the lookup
algorithm, simplifying the code significantly. Storage load is balanced
by the use of individual blocks. This is a significant place in which
PAST and CFS differ significantly. Large files can be better load
balanced across the system with CFS than with PAST as we no longer have
the problem of finding space large enough on any particular machine; in
other words, utilization is improved in the PAST system. The paper
makes clear the difference between caching and replication. Caching
cannot make guarantees about the number of servers storing the content
but are stored along the lookup path so are easy to locate; therefore
the cache is better suited for performance purposes. Replication can
guarantee a predictable number of copies but they are not as readily
accessible as cached copies, which makes replication ideal for
fault-tolerance. Capacity in the network is managed by a combination of
quotas and the concept of virtual servers. Quotas are maintained by
allowing any given client to only store a maximum percentage of the
total space available. The virtual server is used to break a large
machine into many small ones, thereby balancing load more evenly across
nodes. Finally, to improve performance, precaching is done at the
clients at the DHash level.
Chord is the block-lookup algorithm for CFS. The basic algorithm maps
block identifiers to server IP addresses. Chord is completely oblivious
with respect to DHash. Each block stored by Chord is either a piece of
file or a directory. The unique identifiers act as pointers to blocks
in the DHash file system. The identifier space is circular in nature;
each node knows about its neighbors. When a node enters a network, the
correct subset of the node's predecessor blocks is stored in the node.
When it leaves the network, the predecessor again controls those blocks.
A structure called the "finger table" is utilized to make larger jumps
around the identifier circle. To prevent attacks, a node cannot decide
where it will join in the network. Its location is determined by a
secure hash of its IP address. It is interesting to note how Chord
depends upon randomness in the node addition for security and Pastry
depends on randomness in the routing algorithm itself to protect itself
from attacks.
A novel aspect of this system (when compared with PAST) is the fact that
storage and location were separated from the abstract file system. The
idea of breaking this abstraction down makes me wonder whether a
distributed object or distributed virtual memory system could be
implemented over a peer-to-peer network just as easily as the file
system. This subtle teasing apart (as opposed to integration) of issues
seems to be what has made many of the papers in this class important and
successful. Perhaps one difference between a good engineer and a good
researcher is that the engineer integrates existing ideas whereas a
researcher can differentiate between an idea's distinct constituents.
This archive was generated by hypermail 2.1.6 : Wed Mar 03 2004 - 17:45:03 PST