PAST Paper Review

From: Reid Wilkes (reidw_at_microsoft.com)
Date: Wed Mar 03 2004 - 11:03:27 PST

  • Next message: Tarik Nesh-Nash: ""Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility" Review"

    The topic of peer to peer systems in research is a very interesting one
    to me - not so much because I'm interested in peer to peer systems
    themselves but rather I find it interesting that the research community
    has ended up taking great interest in an idea that was originally a
    somewhat underground way to violate copyright laws. This was actually my
    first technical look at how peer to peer systems would be implemented,
    and I had no understanding of the problems related to such systems going
    into it. Although the paper never really explicitly laid out the
    problems faced in designing such a system, much of these issues could be
    inferred from the description of the way the system was designed. The
    authors chose to tackle a relatively confined set of issues with this
    paper - notably the peer-to-peer system stores immutable files. In
    addition, the main focus of the design seemed to be on resiliency to
    both technical failures and also political and social factors and
    natural disasters. It seemed to me that the majority of the author's
    goals in the paper were achieved by virtue of their choice of Pastry as
    the routing mechanism. Pastry provided the underlying node
    identification scheme, which assigned numbers to nodes in a
    pseudo-random fashion irrespective of their location or ownership.
    Identifiers for files in PAST were also pseudo-random, and created when
    the file is inserted into the system. A portion of the bits in the file
    were used to place a number of replicas of the file across the nodes in
    the network. The bits used to do this placement from the file ID were
    the same number of bits as the node identifiers, and if k replicas
    needed to be placed then the nodes chosen to store those replicas were
    the k nodes with node id's numerically closest to the file ID. This
    relatively simple idea provides a great deal of random distribution of
    file replicas across the network because of the pseudo-random assignment
    of node ID's. Much of the discussion in the paper focused on techniques
    designed to help utilize more effectively the storage capacity of the
    system. Without any work in this area, it is fairly likely that certain
    nodes will get filled up more quickly than others due to file size
    variations as well as node capacity differences. The paper describes
    replica diversion, which basically involves a node forwarding a replica
    to a neighbor in the address space which potentially has more room to
    store a file. Another option is file diversion, where a new randomly
    generated file ID is given to the file so that it will be stored in a
    different area of the address space. Experimental results show that
    these techniques greatly improve the storage capacity of the network.

     


  • Next message: Tarik Nesh-Nash: ""Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility" Review"

    This archive was generated by hypermail 2.1.6 : Wed Mar 03 2004 - 11:03:39 PST