Review: A. Rowstron and P. Druschel. Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility.

From: Richard Jackson (richja_at_expedia.com)
Date: Wed Mar 03 2004 - 12:57:28 PST

  • Next message: Chuck Reeves: "Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility"

    This 2001 SOSP paper by Rowstron and Druschel describes PAST, which is a
    peer-to-peer(P2P) overlay network that can be used to reliably
    distribute files across a network of unreliable nodes. The paper
    focuses on the storage management and caching issues, but some general
    information is provided as well.
     
    The paper is organized into these sections: 1) overview of PAST, 2)
    Pastry routing mechanism, 3) storage management, 4) caching, 5)
    experimental results.
     
    The first section describes PAST. Simply, it is a P2P network of
    homogenous nodes that allows files to be stored in a reliable,
    persistent way, even though the underlying nodes may enter and leave the
    network at any time. The paper describes this as an "overlay network",
    meaning that it is built on top of existing Internet infrastructure and
    protocols. A client of PAST deals with only a single node, and can
    issue the following commands: Insert, Lookup, Reclaim. The Insert
    command adds a file to the PAST network. The Lookup command finds a
    file on the PAST network and returns it to the client, the Reclaim
    command attempts to delete/invalidate copies of the file stored in PAST,
    but it is unreliable by design. All operations are performed at a
    file-level, unlike CFS which deals with file chunks or blocks. Each
    file is identified by a unique fileID that is generated based on some
    user-provided randomness and then using a SHA-1 hash.
     
    Pastry is the routing infrastructure that PAST uses to locate and insert
    blocks. The concept is that each node and each file has a unique key
    based on a hash of some random data. Using this hash, which is assumed
    to be highly distributed, Pastry can route requests by evaluating the
    128 msbs of the key and forwarding the request to a known node with the
    numerically-closest key. A fairly complex routing table is maintained
    on each Pastry node to support this. The routing information consists
    of 1) routing table - this is multiple levels of metadata about node
    locations, 2) a leaf set - a list of the closest nodes according to
    numeric key comparisons, 3) a neighborhood set - a list of closest nodes
    according to some proximity metric such as latency. The first two sets
    of data are used for routing, the 3rd is used for node
    insertions/deletions. The concept of proximity metric seemed weak to
    me.
     
    Next, storage management is described. The main design focus for this
    area is that they wanted to balance storage among nodes as evenly as
    possible. To do this, they created two design strategies: 1) replica
    diversion, 2) file diversion. For #1, this means that, for cases that a
    file cannot be stored on the appropriate node, it will be routed to
    another node within the leaf set, and a pointer will be stored to this
    new node. This way, we can ensure that a leaf set will have its storage
    balanced fairly well. For cases that a leaf set is completely full, we
    use #2 to divert the file. This simply means that the client must
    attempt to reinsert the file with a new hash value, in hopes of hitting
    a destination node(or leaf set) that can accommodate the file. The
    process is somewhat trial-and-error once most nodes have become full.
    Another key factor is that some number of k consecutive nodes must be
    able to store a copy of the file, or else the insert will fail. This
    value of k is a key invariant that PAST uses to maintain file replicas
    in case of node failure.
     
    In the next section, caching is briefly discussed. The main point of
    caching is that once a node helps route a block for lookup/insert, it
    may also save that block locally for future use. In this way, future
    queries could be satisfied from the cache instead of forwarding the
    request to other nodes. Two different caching policies are discussed -
    1) GreedyDual-Size(GD-S), 2) Least Recently Used(LRU). The results show
    that GD-S performs slightly better, but the algorithm seems much more
    complex.
     
    The last section gives experimental results. Each portion of the system
    was tested to demonstrated performance under load. A simulator was used
    to model 2250 simultaneous nodes, and two separate, large datasets were
    used as input. The tests focused on replication, caching, and file
    insertion failures. Overall, each case performed as expected and the
    results were meaningful and intuitive. The results showed that a
    well-tuned PAST system can achieve global utilization of approximately
    98% while failing only 5% of the inserts.
     
    Overall, this system seems to be well-designed and the designers have
    thought through many difficult cases to make the system as effective as
    possible. However, the system is quite complicated with respect to the
    utilization policies and required diversion operations. I wonder if the
    system could be simplified while still retaining the strong utilization
    properties. Also, their assumptions about machine growth rates (two
    orders of magnitude) seemed to be naive. I don't think their system
    needed this assumption and they should not have considered this.
     


  • Next message: Chuck Reeves: "Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility"

    This archive was generated by hypermail 2.1.6 : Wed Mar 03 2004 - 12:57:45 PST