Review of PAST

From: Muench, Joanna (jmuench_at_fhcrc.org)
Date: Wed Mar 03 2004 - 15:16:42 PST

  • Next message: Praveen Rao: "PAST review"

    Rowstron and Druschel (2001) present PAST, a persistent peer-to-peer storage
    utility implemented in Java and run on a Compaq AlphaServer. Their stated
    goal is to develop a scalable and self-organized storage system with strong
    persistence and a high degree of reliability. The designed system seems in
    general to meet those goals through the use of some innovative techniques,
    especially involving scalable load balancing.

    While inspired by applications such as Napster, PAST does not attempt to
    provide searchable storage. Once stored in PAST, retrieval requires the
    unique fileId generated at storage. The fileId provides more that just a
    unique identifier to the file. When looked up in Pastry, the routing
    substrate, the fileId identifies the nodes where the file is most likely to
    reside. PAST nodes are arranged in a circular namespace designed to avoid
    correlation between the nodeId and any external node properties such as
    geographic location, ownership, etc. Nodes track the status of other nodes
    in their neighborhood, updating as needed when nodes fail or are replaced.
    Routing between nodes is randomized to decrease vulnerability to malicious
    or failed nodes.

    The paper focuses largely on efficient storage policies that work well at a
    high level of utilization. PAST uses both replica diversion and file
    diversion to deal with the inevitable imbalances associated with a
    heterogeneous set of nodes and statistical variation in fileId assignments.
    The authors identify three important considerations for replica diversion:
    1) don't balance if utilization is low, 2) divert large files over small and
    3) move from below average free space to above average free space. They
    combine these considerations into a single metric. Multiple replica declines
    will in turn spur a file diversion. Caching is also important to achieve
    adequate performance retrieving popular files.

    The results section illustrates how well the system handles a high level of
    utilization. The authors do note a trade off between the success rate and
    level of utilization, but the rates are so high (above 90%) criticism would
    seem nit-picking.

    The PAST system clearly fulfills its goals of scalability,
    self-organization, strong persistence and reliability. However the system
    has some restrictions, especially the write-once nature of the files. This
    makes the system ideal for storing immutable objects (music and movies
    perhaps?) but not useful as a general file system.

    The largest surprise for this well-written paper is how clearly written it
    is despite the lack of any obvious affiliation with the UW CS department.
    The organization is excellent, although (like this review) it suffers from
    extensive use of the passive voice.


  • Next message: Praveen Rao: "PAST review"

    This archive was generated by hypermail 2.1.6 : Wed Mar 03 2004 - 15:17:58 PST