From: Honghai Liu (liu789_at_hotmail.com)
Date: Tue Mar 02 2004 - 20:06:02 PST
Reviewer by: Honghai Liu
The paper presents storage management and caching in PAST,
a peer-to-peer storage system which achieve scalability, strong
persistency high availability.
A peer to peer system is a distributed system in which all nodes
have identical capabilities and responsibilities and all communication
are symmetric.
The PAST uses an efficient routing scheme called Pastry to ensure
that client requests are reliably routed to the appropriate nodes.
Client requests to retrieve a file are routed with high probability to
a node that is close in the network. The algorithm has the property
that the nodes of PAST traversed and the number of message are
logarithmic in the total number of PAST nodes. Therefore Pastry is
the key to achieve scalability for PAST.
There are 3 basic operations in PAST: insertion, lookup and reclaim.
Each PAST node is assigned a 128-bit node identifier called a nodeId.
File stored in PAST is associated with fileId, generated at the time of
file insertion and files stored in PAST is immutable. NodeId and fileId
are uniformly distributed and their msb of the fields are associated to
determine the candidate locations of k PAST nodes to insert a file.
PAST's storage management tries to achieve high storage utilization
and graceful degradation when system reaches the maximum utilization.
The goal of balancing free storage space among nodes at maximum
utilization and maintain locality of filedId seem to be conflicting, however,
replica diversion and file diversion are used to balance them. The replica
diversion is to balance the remaining storage space among the nodes and
file diversion is to balance the reaming fee storage space among different
portions of the nodeId.
It is interesting to know that storing k complete copies of a file is not the
most efficient method to achieve high availability. Using Reed-Solomon
encoding, add m additional checksum block to n original data blocks
allows recovery from up to m losses of data or checksum blocks.
Nevertheless, the algorithm is not adopted by PAST due to high cost
of overhead.
Caching is used in PAST to achieve minimal client access latencies, high
throughput and balanced query load. The cache replacement policy used
in PAST is based on GD-S policy, where is originally developed for
caching web proxies.
In summary, PAST is an interesting pure P2P system without any centralized
data storage and management model, which has the potential to meet many Internet
This archive was generated by hypermail 2.1.6 : Tue Mar 02 2004 - 20:06:04 PST