PAST Review

From: Nathan Dire (ndire_at_cs.washington.edu)
Date: Wed Mar 03 2004 - 17:17:47 PST

  • Next message: Prasanna Kumar Jayapal: "Review of the PAST paper (by Rowstron & Druschel)"

          "Storage management and caching in PAST, a large-scale, persistent
                            peer-to-peer storage utility."
                            A. Rowstron, P. Druschel

    PAST is an Internet-based, peer-to-peer global storage utility. PAST software
    is installed on nodes which form a overlay network in the Internet. The nodes
    in the system provide an access point for users as well as contributing to the
    storage. Files stored in PAST are replicated and immutable. This paper
    focuses on the storage management and caching in PAST.

    PAST uses Pastry for routing. Each node maintains a routing table which maps
    nodeId's to IP addresses, a leaf set which consists of nearby nodeId's, and a
    neighborhood set which consists of nearby nodes according to the network
    proximity metric. Since nodeId's are quasi-random, there's no correlation
    between nodeId proximity and any other factor. Pastry also handles updating
    the tables as nodes are added and removed.

    The PAST system system supports three basic operations: addition of files,
    retrieval of files, and a reclaim operation. When a file is added, it is
    given a 160-bit 'fileId' which provides unique identification for the file. A
    replica of the file is stored at k diverse nodes in the network, where k can
    be specified by the user. The storage request must be accepted by k nodes in
    order to complete. On lookup, the file will be found if any of the k nodes on
    which it was stored is available. The reclaim operation is mostly important
    for user's quota accounting. To avoid agreement protocols, the reclaim
    operation does not guarantee that the file is no longer available.
     
    The responsibilities of PAST's storage management are to balance the free
    space among the nodes and to maintain the replication of files. These goals
    are accomplished using replica diversion and file diversion. Replica
    diversion entails storing a file in the leaf set of one of the k numerically
    closest nodes to the fileId as an alternative location. File diversion
    entails generating a new fileId so that the k replicas are stored in a new,
    unrelated leaf set.

    For caching, PAST nodes use up any local free space to store cached files.
    This has the advantage of efficiently using the disk space, though caching
    become ineffective as utilization increases. The cache replacement policy is
    GreedyDual-Size, original developed for Web proxies.

    The paper explains how replicas are maintained as nodes come and go, but
    I didn't exactly see how a replica is replaced when the node storing it
    goes away. I'm also curious about the configuration/bootstrapping of
    system. Although the paper is focused on storage management and
    caching, I thought the measurement could have included more about the
    perceived performance from individual clients when streaming a file.

    Overall, I think the ideas are clever, but I much prefered the CFS paper. I
    don't think this paper was presented well, and I like the block level
    approach better (party because that's what my company does :)).


  • Next message: Prasanna Kumar Jayapal: "Review of the PAST paper (by Rowstron & Druschel)"

    This archive was generated by hypermail 2.1.6 : Wed Mar 03 2004 - 17:17:49 PST