From: Sellakumaran Kanagarathnam (sellak_at_windows.microsoft.com)
Date: Wed Mar 03 2004 - 13:51:46 PST
The paper discusses storage management and caching in PAST. PAST is
primarily a storage utility that uses peer-to-peer technology to achieve
strong persistence, high availability, scalability and security.
Availability is achieved by replicating files across multiple nodes.
Scalability is achieved by using P2P and Pastry. Security is achieved by
using smartcards (on each node and each user). This paper focuses
primarily on storage management and caching.
Every node provides an access point for users and optionally it serves
as storage. PAST system exports a set of operations to its clients:
Insert, Lookup and Reclaim. In response to an insert request, a fileID
is computed as the hash code of the file's name, the client's public key
and a random salt. The insert operation also specifies the number of
places the file has to be stored. PAST also has a quota system where
each client is given a storage quota. The responsibilities of storage
management are to 1) balance remaining free storage among nodes in PAST
and 2) to ensure that the copies of a file are maintained by k nodes
closest to the fileID. PAST achieves this by two techniques: replica
diversion and file diversion. The purpose of replica diversion is to
balance the remaining free storage space among the nodes in a leaf set.
If a node A, on receipt of insert request message, cannot accommodate a
copy locally, it chooses a node B from its leaf set (B should not be in
k closest and it should not hold a replica of this file already) and
asks B to store the file on behalf of A. In this case there are two
failures that have to be take care of: a) failure of B b) failure of A
itself. a) Will be detected by the process of exchanging keep alive
messages. In that case, certain adjustments happen and another node may
become one of k closest nodes for certain files; to maintain the storage
invariant, replicas will have to be created on the new node. b) Is
handled by adding a pointer to B in the k+1 closest node of the file.
The next technique of file diversion is to balance the remaining free
storage space among different portions of the nodeID space in PAST.
When file insertion fails because k nodes could not hold accommodate the
file (even with replica diversion), a failure notice is sent to the
client. The client then retries with a different fileId. If three
retries fail, then the client can choose the decrease k and retry.
Goals of cache management is to minimize client access latencies (fetch
distance), to maximize the query throughput and to balance the query
load in the system. PAST nodes use the unused portion of their disk
space to cache files. A file that is routed thru a node as part of
lookup or insert operation is cached if its size is less than a fraction
c of the node's current cache size. The cache replacement policy is
based on GD-S policy. Experiments done in an emulated network
environment show that PAST is able to achieve global storage utilization
in the high 90s. Overall, I think that the paper is well organized. It
chooses two aspects and explains them clearly. The authors have given
just enough details to convey the approach on policies and their
implementation for these two aspects.
This archive was generated by hypermail 2.1.6 : Wed Mar 03 2004 - 13:51:51 PST