PAST Review

From: Honghai Liu (liu789_at_hotmail.com)
Date: Tue Mar 02 2004 - 20:06:02 PST

  • Next message: Jeff Duzak: "Review of "PAST""

    Reviewer by: Honghai Liu

    The paper presents storage management and caching in PAST,
    a peer-to-peer storage system which achieve scalability, strong
    persistency high availability.

    A peer to peer system is a distributed system in which all nodes
    have identical capabilities and responsibilities and all communication
    are symmetric.

    The PAST uses an efficient routing scheme called Pastry to ensure
    that client requests are reliably routed to the appropriate nodes.
    Client requests to retrieve a file are routed with high probability to
    a node that is close in the network. The algorithm has the property
    that the nodes of PAST traversed and the number of message are
    logarithmic in the total number of PAST nodes. Therefore Pastry is
    the key to achieve scalability for PAST.

    There are 3 basic operations in PAST: insertion, lookup and reclaim.
     Each PAST node is assigned a 128-bit node identifier called a nodeId.
     File stored in PAST is associated with fileId, generated at the time of
    file insertion and files stored in PAST is immutable. NodeId and fileId
    are uniformly distributed and their msb of the fields are associated to
     determine the candidate locations of k PAST nodes to insert a file.

    PAST's storage management tries to achieve high storage utilization
    and graceful degradation when system reaches the maximum utilization.
    The goal of balancing free storage space among nodes at maximum
    utilization and maintain locality of filedId seem to be conflicting, however,
    replica diversion and file diversion are used to balance them. The replica
    diversion is to balance the remaining storage space among the nodes and
    file diversion is to balance the reaming fee storage space among different
     portions of the nodeId.

    It is interesting to know that storing k complete copies of a file is not the
    most efficient method to achieve high availability. Using Reed-Solomon
     encoding, add m additional checksum block to n original data blocks
    allows recovery from up to m losses of data or checksum blocks.
    Nevertheless, the algorithm is not adopted by PAST due to high cost
    of overhead.

    Caching is used in PAST to achieve minimal client access latencies, high
    throughput and balanced query load. The cache replacement policy used
     in PAST is based on GD-S policy, where is originally developed for
     caching web proxies.

    In summary, PAST is an interesting pure P2P system without any centralized
     data storage and management model, which has the potential to meet many Internet


  • Next message: Jeff Duzak: "Review of "PAST""

    This archive was generated by hypermail 2.1.6 : Tue Mar 02 2004 - 20:06:04 PST