From: Raz Mathias (razvanma_at_exchange.microsoft.com)
Date: Wed Mar 03 2004 - 17:09:00 PST
This paper presents the PAST system for peer-to-peer file storage. The
system stores whole files on individual nodes, provides a routing method
called Pastry to locate the files, and provides fault tolerance and
performance through replication and caching respectively. The PAST file
system exposes three major API's, Insert, Lookup, and Reclaim.
The central piece of this paper is the Pastry routing algorithm; this
algorithm implements the Lookup operation. The algorithm works by
assigning globally unique identifiers to files and nodes. To locate a
file, we find the node that has the closest matching identifier. Every
node in the network maintains a routing table. Every node has its own
identifier. Every row in the node's table provides a closer and closer
match to that node's identifier (digit by digit). To route a message,
that node looks up the message's identifier and tries to match as many
of the digits as possible using this table. After it has found the
closest next hop in the identifier space, it forwards the request to
that node. The algorithm has a worst-case running time of O(log
number_of_nodes_in_network). File replication is achieved by storing
copies of the file in the k-closest nodes in the ID space. Caching is
achieved by storing the file along the lookup path.
Insert operations store files in the distributed system and involve
selecting a new node ID for insertion. A number of mechanisms load
balance the storage capacity of the network. First, when a node
approaches its storage capacity it can allow one of its k-closest
neighbors (in the ID space) to hold the file if it is in the leaf set of
one of the k-closest nodes; this is called replica diversion. Second,
(called file diversion) the file identifier contains a unique salt value
that may be modified to store the file somewhere else in the network all
together if a particular identifier region becomes saturated. As a
matter of policy, it is beneficial to divert a large file rather than
several small ones in order to increase lookup performance of the
overall network.
It is interesting that the network doesn't guarantee the successful
completion of delete operations. Instead, PAST provides a method called
Reclaim which is basically a best-effort attempt at reclaiming the space
that the file once occupied. A disadvantage here is that if a node
leaves the network and misses a reclaim operation, there is no guarantee
as to how long replica's may exist on that node. This is wasteful in
terms of disk space, but the paper acknowledges this fact and makes the
assumption that in the long term over many nodes, disk space is
inexpensive. Files in the network are immutable. To store make a new
version of a file, we simply reclaim the old version and save the new
one using a new file identifier.
It is striking how randomization plays a number of key roles in this
paper. To increase fault tolerance, it is important that adjacent nodes
in the ID space not be related with respect to network location; node
identifiers are random with respect to network location. Randomization
also provides fault tolerance through the avoidance of persistent
routing loops in the network. Randomization is used to preserve
security. During the routing process, the next hop is chosen from a
number of eligible options at random. This way no node in the routing
path has the opportunity to control the routing process and possibly
cause significant harm to the network.
This archive was generated by hypermail 2.1.6 : Wed Mar 03 2004 - 17:09:37 PST