From: Nathan Dire (ndire_at_cs.washington.edu)
Date: Wed Mar 03 2004 - 17:17:47 PST
"Storage management and caching in PAST, a large-scale, persistent
peer-to-peer storage utility."
A. Rowstron, P. Druschel
PAST is an Internet-based, peer-to-peer global storage utility. PAST software
is installed on nodes which form a overlay network in the Internet. The nodes
in the system provide an access point for users as well as contributing to the
storage. Files stored in PAST are replicated and immutable. This paper
focuses on the storage management and caching in PAST.
PAST uses Pastry for routing. Each node maintains a routing table which maps
nodeId's to IP addresses, a leaf set which consists of nearby nodeId's, and a
neighborhood set which consists of nearby nodes according to the network
proximity metric. Since nodeId's are quasi-random, there's no correlation
between nodeId proximity and any other factor. Pastry also handles updating
the tables as nodes are added and removed.
The PAST system system supports three basic operations: addition of files,
retrieval of files, and a reclaim operation. When a file is added, it is
given a 160-bit 'fileId' which provides unique identification for the file. A
replica of the file is stored at k diverse nodes in the network, where k can
be specified by the user. The storage request must be accepted by k nodes in
order to complete. On lookup, the file will be found if any of the k nodes on
which it was stored is available. The reclaim operation is mostly important
for user's quota accounting. To avoid agreement protocols, the reclaim
operation does not guarantee that the file is no longer available.
The responsibilities of PAST's storage management are to balance the free
space among the nodes and to maintain the replication of files. These goals
are accomplished using replica diversion and file diversion. Replica
diversion entails storing a file in the leaf set of one of the k numerically
closest nodes to the fileId as an alternative location. File diversion
entails generating a new fileId so that the k replicas are stored in a new,
unrelated leaf set.
For caching, PAST nodes use up any local free space to store cached files.
This has the advantage of efficiently using the disk space, though caching
become ineffective as utilization increases. The cache replacement policy is
GreedyDual-Size, original developed for Web proxies.
The paper explains how replicas are maintained as nodes come and go, but
I didn't exactly see how a replica is replaced when the node storing it
goes away. I'm also curious about the configuration/bootstrapping of
system. Although the paper is focused on storage management and
caching, I thought the measurement could have included more about the
perceived performance from individual clients when streaming a file.
Overall, I think the ideas are clever, but I much prefered the CFS paper. I
don't think this paper was presented well, and I like the block level
approach better (party because that's what my company does :)).
This archive was generated by hypermail 2.1.6 : Wed Mar 03 2004 - 17:17:49 PST