From: Jeff Duzak (jduzak_at_exchange.microsoft.com)
Date: Tue Mar 02 2004 - 21:59:17 PST
This paper describes PAST, a peer-to-peer file storage system. The
paper gives a brief overview of Pastry, a distributed routing scheme
upon which PAST is build. However, most of the paper is devoted to the
mechanisms that govern the placement of files, and fine-tuning of the
parameters of these mechanisms. Last, a description of the caching
mechanism is given.
The section describing the pastry lookup scheme was very familiar, as it
is very close to the Chord routine scheme. The beginning of this
section was a bit of a pain to read, as meticulous formulas were given
to describe a number of characteristics of the system. The description
of the scheme itself was straightforward enough. Essentially, each node
forwards a message to another node whose id shares more significant
digits with the target id, until the node with the closest id to the
target id is found.
The paper then describes the file storage mechanisms of PAST. The
system has mechanisms for diverting the storage of a single replica of a
file to a nearby node, as well for diverting the storage of all replicas
of a file to an entirely new location. The replica diversion mechanism
introduces some complexity, as it mucks with the invariant that a file
will be stored on the k nodes nearest to the file's id, as well as
complicating node failure recovery.
One possible reason for the need for replica and file diversion for load
balancing is the fact that files are stored intact. It seems more
intuitive to break files into fixed-sized blocks, as the CFS system
does. The fixed-size blocks would yield a more uniform load
distribution among servers. Further, breaking large files into pieces
allows the pieces to be fetched in parallel from multiple servers.
The section describing experimental results first describes differences
in performance (in terms of file storage success rate and system
utilization) that occur when configuration parameters affecting replica
and file diversion are varied. Optimal parameters are chosen from this
analysis. Further analysis shows the performance of the system at close
to full utilization. The results seem fine, although, unfortunately,
without any other system with which to compare this system, it is
difficult to know how good the results are. Further, some of the
reported testing details were not very interesting, such as the fact
that using the filesystem test workload, 679 files are larger than the
storage capacity of the smallest server. Results for those files are
not shown, anyway.
This archive was generated by hypermail 2.1.6 : Tue Mar 02 2004 - 21:59:23 PST