Review: Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility.

From: Sellakumaran Kanagarathnam (sellak_at_windows.microsoft.com)
Date: Wed Mar 03 2004 - 13:51:46 PST

  • Next message: Cliff Schmidt: "Review of Rowstron and Druschel, "Storage management and caching in PAST""

    The paper discusses storage management and caching in PAST. PAST is
    primarily a storage utility that uses peer-to-peer technology to achieve
    strong persistence, high availability, scalability and security.
    Availability is achieved by replicating files across multiple nodes.
    Scalability is achieved by using P2P and Pastry. Security is achieved by
    using smartcards (on each node and each user). This paper focuses
    primarily on storage management and caching.

    Every node provides an access point for users and optionally it serves
    as storage. PAST system exports a set of operations to its clients:
    Insert, Lookup and Reclaim. In response to an insert request, a fileID
    is computed as the hash code of the file's name, the client's public key
    and a random salt. The insert operation also specifies the number of
    places the file has to be stored. PAST also has a quota system where
    each client is given a storage quota. The responsibilities of storage
    management are to 1) balance remaining free storage among nodes in PAST
    and 2) to ensure that the copies of a file are maintained by k nodes
    closest to the fileID. PAST achieves this by two techniques: replica
    diversion and file diversion. The purpose of replica diversion is to
    balance the remaining free storage space among the nodes in a leaf set.
    If a node A, on receipt of insert request message, cannot accommodate a
    copy locally, it chooses a node B from its leaf set (B should not be in
    k closest and it should not hold a replica of this file already) and
    asks B to store the file on behalf of A. In this case there are two
    failures that have to be take care of: a) failure of B b) failure of A
    itself. a) Will be detected by the process of exchanging keep alive
    messages. In that case, certain adjustments happen and another node may
    become one of k closest nodes for certain files; to maintain the storage
    invariant, replicas will have to be created on the new node. b) Is
    handled by adding a pointer to B in the k+1 closest node of the file.
    The next technique of file diversion is to balance the remaining free
    storage space among different portions of the nodeID space in PAST.
    When file insertion fails because k nodes could not hold accommodate the
    file (even with replica diversion), a failure notice is sent to the
    client. The client then retries with a different fileId. If three
    retries fail, then the client can choose the decrease k and retry.

    Goals of cache management is to minimize client access latencies (fetch
    distance), to maximize the query throughput and to balance the query
    load in the system. PAST nodes use the unused portion of their disk
    space to cache files. A file that is routed thru a node as part of
    lookup or insert operation is cached if its size is less than a fraction
    c of the node's current cache size. The cache replacement policy is
    based on GD-S policy. Experiments done in an emulated network
    environment show that PAST is able to achieve global storage utilization
    in the high 90s. Overall, I think that the paper is well organized. It
    chooses two aspects and explains them clearly. The authors have given
    just enough details to convey the approach on policies and their
    implementation for these two aspects.


  • Next message: Cliff Schmidt: "Review of Rowstron and Druschel, "Storage management and caching in PAST""

    This archive was generated by hypermail 2.1.6 : Wed Mar 03 2004 - 13:51:51 PST