"Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility" Review

From: Tarik Nesh-Nash (tarikn_at_microsoft.com)
Date: Wed Mar 03 2004 - 11:29:06 PST

  • Next message: Brian Milnes: "PAST Review"

    This paper presents an interesting storage management and caching
    mechanism for a P2P storage utility. PAST aims to get strong
    persistence, high availability, scalability and security of the
    contents. It is based on symmetric diverse nodes that are randomly
    selected; this will obliterate the use of physical transport of storage
    media and the explicit mirroring as a backup. It also enables sharing
    and increases the bandwidth.

    Every node in the PAST system can be used as an access point or a
    storage location; it is defined with a NodeId that is randomly seleced
    when the node is created. So the node ID has no correlation with its
    geographical location. This makes close nodeids excellent candidates to
    store replicas since the system should be probabilistically balanced.

    PAST is based on P2P routing system, Pastry, that enables most of the
    PAST functionality of scalability, fault resilience, self organization.
    In fact, I believe that PAST's success is mainly due to Pastry's
    architecture.

    The storage management aims to be robust at conditions of maximum
    storage while keeping the goal to copying to the nearest fields

    Two solutions are presented: replica diversion and file diversion. The
    replica diversion is used to balance the remaining free storage among
    the leaf set. I m wondering if the operation should be recursive till
    enough space is found or a certain depth is reached. Also I m concerned
    about the fragmentation that this replication will cause after long use
    of the system. Outdated diverted replica will be spread on the nodes
    and that may deteriorate the performance. A clean up process will
    eventually avoid some of the storage problems by avoiding unnecessary
    old replica. If the replica diversion fails, file diversion mechanism
    aims to balance the remaining free storage among different nodeId space;
    this is done using different salt values for the random generation of
    the NodeId.

    A cache management mechanism is implemented to reduce the latency and
    maximize throughput. This is done by copying the popular files close to
    the client clusters.

    PAST seems an attractive solution to maximize the storage and maximizing
    availability. A considerable work on performance can improve the
    system. It is however a limitation that the system is solely used for
    storage and can not be used as general purpose file system.

     

     


  • Next message: Brian Milnes: "PAST Review"

    This archive was generated by hypermail 2.1.6 : Wed Mar 03 2004 - 11:30:19 PST