From: Praveen Rao (psrao_at_windows.microsoft.com)
Date: Wed Mar 03 2004 - 15:58:58 PST
This paper discusses PAST which is a distributed peer-to-peer storage
management system. PAST nodes co-operatively route file queries, store
multiple replicas of files and cache copies of popular files. PAST does
storing of caching of whole files.
Storage nodes and files are assigned unique ids and file is stored at
nodes whose identifier matches that of the file most closely.
Each node is capable of initiating and routing client requests to insert
and retrieve files. Node can optionally contribute storage to PAST.
Inserted files are replicated across multiple nodes for availability.
PAST uses Pastry routing protocol. The number of nodes traversed and
messages sent to locate a file are logarithmic order of total number of
nodes. A client must know the file-id and (decryption key if applicable)
to locate a file. Local file-system of the nodes acts as a cache for the
PAST content.
The interface of PAST is as follows:
fileId = Insert (name, owner-credentials, k, file)
file = Lookup (fileId)
Reclaim (fileId, owner-credentials)
Reclaim indicates that file is no longer maintained by PAST (and Lookup
would fail) but Lookup may still succeed. This weaker semantic avoids
complex agreement protocols among the nodes.
The Pastry routing protocol maintains a leaf set, routing table and
neighborhood set on the nodes. Leaf set consists of node ids immediately
higher and lower the current node. The neighborhood set consists of
nodes close to the current node based on proximity metric.
PAST's storage management aims at allowing high global storage
utilization and graceful perf degradation as the system approaches full
storage load. For load balancing PAST uses replica diversion (where a
file is stored on a node that is not among the k closed nodes) and file
diversion (where a file is diverted to a different fileId space by using
a different salt in id generation).
PAST allows caching of the files along the file search path to minimize
fetch latencies. Caches do not change semantics and hence cached files
can be discarded anytime only costing perf.
PAST has limitations due to whole file caching - a very large file may
not find a node to be stored on even when the system on the whole has
sufficient storage space. Experiments show that failure rates jump
drastically once the system is beyond 90% utilization.
This archive was generated by hypermail 2.1.6 : Wed Mar 03 2004 - 15:58:50 PST