Review of "PAST"

From: Jeff Duzak (jduzak_at_exchange.microsoft.com)
Date: Tue Mar 02 2004 - 21:59:17 PST

  • Next message: Ian King: "Review: Rowstron & Druschel, Storage management and caching in PAST"

    This paper describes PAST, a peer-to-peer file storage system. The
    paper gives a brief overview of Pastry, a distributed routing scheme
    upon which PAST is build. However, most of the paper is devoted to the
    mechanisms that govern the placement of files, and fine-tuning of the
    parameters of these mechanisms. Last, a description of the caching
    mechanism is given.
     
    The section describing the pastry lookup scheme was very familiar, as it
    is very close to the Chord routine scheme. The beginning of this
    section was a bit of a pain to read, as meticulous formulas were given
    to describe a number of characteristics of the system. The description
    of the scheme itself was straightforward enough. Essentially, each node
    forwards a message to another node whose id shares more significant
    digits with the target id, until the node with the closest id to the
    target id is found.
     
    The paper then describes the file storage mechanisms of PAST. The
    system has mechanisms for diverting the storage of a single replica of a
    file to a nearby node, as well for diverting the storage of all replicas
    of a file to an entirely new location. The replica diversion mechanism
    introduces some complexity, as it mucks with the invariant that a file
    will be stored on the k nodes nearest to the file's id, as well as
    complicating node failure recovery.
     
    One possible reason for the need for replica and file diversion for load
    balancing is the fact that files are stored intact. It seems more
    intuitive to break files into fixed-sized blocks, as the CFS system
    does. The fixed-size blocks would yield a more uniform load
    distribution among servers. Further, breaking large files into pieces
    allows the pieces to be fetched in parallel from multiple servers.
     
    The section describing experimental results first describes differences
    in performance (in terms of file storage success rate and system
    utilization) that occur when configuration parameters affecting replica
    and file diversion are varied. Optimal parameters are chosen from this
    analysis. Further analysis shows the performance of the system at close
    to full utilization. The results seem fine, although, unfortunately,
    without any other system with which to compare this system, it is
    difficult to know how good the results are. Further, some of the
    reported testing details were not very interesting, such as the fact
    that using the filesystem test workload, 679 files are larger than the
    storage capacity of the smallest server. Results for those files are
    not shown, anyway.


  • Next message: Ian King: "Review: Rowstron & Druschel, Storage management and caching in PAST"

    This archive was generated by hypermail 2.1.6 : Tue Mar 02 2004 - 21:59:23 PST