From: Cem Paya (cemp_at_microsoft.com)
Date: Wed Mar 03 2004 - 16:32:54 PST
Review: CFS
CSE 551P, Cem Paya
This paper describes the Cooperative File System, a peer-to-peer
distribute filesystem designed for Internet scale applications. CFS is
structed as 3 layers: at the core is the Chord distributed hash table,
layered on top of that is Dhash which manages storage and retrieval of
individial blocks, including support for caching/replication. At the top
level sits the interface which presents the abstraction of an ordinary
file system, except the semantics are somewhat unusual. Main distinction
is that it is a read-only system. Each file has an owner and only that
owner can update. That protection is very strong and provided via
cryptographic means: the root block is signed with a private key and any
updates require signing with same key.
CFS envisions a peer-to-peer system with identical, unmanaged nodes.
There is no server vs. client distinction, and it's expected that there
is no administration overhead beyond the one time installation of CFS
software, a realistic assumption if non-computer-savvy users are to take
part in the chord network. Each physical servers runs any number of
virtual servers depending on available space, and each one dedicates
some space to maintaining content for other nodes. Chord manages the
mapping of content to node. It is the algorithmic innovation remarkable
for overcoming a huge challenge: maintain up-to-date data about the
where-abouts of files even as individual nodes appear and disappear
without notice, reflecting the connectivity state of users on the
Internet. Chord provides other guarantess such as even load-balancing
and quick look ups, where number of nodes queries scales as the
logarithm of the number of network population. One downside is there is
no distributed search, which is what put P2P on the map. Existing
systems such as Gnutella or the Fasttrack system enable querying the
network but their protocol is extremely primitive compared to Chord.
Both use inefficient broadcast that floods the network with queries
because there is no a priori connection between content (eg keys in the
language of CFS) and nodes hosting them.
One vague aspect of CFS is the risk model. "C" stands for cooperative
but there are some measures in place to defend against malicious nodes
fabricating content, flooding other nodes with replicated data etc. For
example each key is hashed using a cryptographically secure one-way hash
algorithm (SHA1 specifically) which makes it computationally infeasible
to come up with a key that maps to a given ID. Underlying network
communication uses nonces and echo-back to rule out trivial IP spoofing
attacks. On the other hand some security concerns are completely placed
aside-for example the authors argue that anonymous download is out of
scope because it can be layered on top of CFS using proxying or
anonymous remailers. Denial-of-service potential is also not very clear:
nodes are saving data on behalf of other nodes. One-way nature of SHA1
makes it more difficult to control where a given block lands although
trial-and-error still works since in practice there are only on the
order of millions of nodes in the network: that space is still small
enough to brute force. Even when it is not easy to single out particular
nodes for attack, all experience to date with P2P systems points at
widespread free-riding where most content is served by a small number of
peers while others download at will without contributing. CFS analog for
this would be people who use the network for storage but contribute only
very small amount of disk space (or equivalently run very few virtual
servers.)
Cem
This archive was generated by hypermail 2.1.6 : Wed Mar 03 2004 - 16:34:18 PST