From: Tarik Nesh-Nash (tarikn_at_microsoft.com)
Date: Wed Mar 03 2004 - 11:29:06 PST
This paper presents an interesting storage management and caching
mechanism for a P2P storage utility. PAST aims to get strong
persistence, high availability, scalability and security of the
contents. It is based on symmetric diverse nodes that are randomly
selected; this will obliterate the use of physical transport of storage
media and the explicit mirroring as a backup. It also enables sharing
and increases the bandwidth.
Every node in the PAST system can be used as an access point or a
storage location; it is defined with a NodeId that is randomly seleced
when the node is created. So the node ID has no correlation with its
geographical location. This makes close nodeids excellent candidates to
store replicas since the system should be probabilistically balanced.
PAST is based on P2P routing system, Pastry, that enables most of the
PAST functionality of scalability, fault resilience, self organization.
In fact, I believe that PAST's success is mainly due to Pastry's
architecture.
The storage management aims to be robust at conditions of maximum
storage while keeping the goal to copying to the nearest fields
Two solutions are presented: replica diversion and file diversion. The
replica diversion is used to balance the remaining free storage among
the leaf set. I m wondering if the operation should be recursive till
enough space is found or a certain depth is reached. Also I m concerned
about the fragmentation that this replication will cause after long use
of the system. Outdated diverted replica will be spread on the nodes
and that may deteriorate the performance. A clean up process will
eventually avoid some of the storage problems by avoiding unnecessary
old replica. If the replica diversion fails, file diversion mechanism
aims to balance the remaining free storage among different nodeId space;
this is done using different salt values for the random generation of
the NodeId.
A cache management mechanism is implemented to reduce the latency and
maximize throughput. This is done by copying the popular files close to
the client clusters.
PAST seems an attractive solution to maximize the storage and maximizing
availability. A considerable work on performance can improve the
system. It is however a limitation that the system is solely used for
storage and can not be used as general purpose file system.
This archive was generated by hypermail 2.1.6 : Wed Mar 03 2004 - 11:30:19 PST