Review of Rowstron and Druschel, "Storage management and caching in PAST"

From: Cliff Schmidt (cliff_at_bea.com)
Date: Wed Mar 03 2004 - 14:59:02 PST

  • Next message: David Coleman: "PAST review"

    The PAST system is a P2P, self-organizing storage utility, which focuses
    on providing high availability, scalability, and security. The basic
    operations are inserting, retrieving, and deleting files.

    The system is designed to allow some number, k, replications of each
    file. Unlike CFS, each file is stored as one unit on any particular
    server. Each file is given a 160-bit id upon insertion that is a hash
    of the owner's public key, the file name, and a randomly chosen salt.
    Each node in the system is given a 128-bit id upon joining that is a
    hash of the node's public key. The interesting thing about these two
    ids is that the most significant 128 bits of the file id is used to
    determine the numerical proximity to a node. The closest k node ids
    are where the k replicas of the file can be found. Since these ids
    are not based on location, network connectivity, ownership, or
    jurisdiction, the nodes across the replication should be sufficiently
    independent.

    Pastry is the routing substrate for PAST. It can route to the
    numerically closest node in less than log (base 2^b) N steps, where N
    is the total number of nodes and b is a configuration parameter. It
    enables this by having each node keep track of a routing table plus
    a leaf set and neighborhood set. The routing table points to the
    other nodes that share the first set of digits (from only the first
    digit through all but the last digit). The leaf set points to the
    numerically closest l nodes, where l is a configured parameter that
    allows for replication diversion, which is a technique used to improve
    load balancing. The neighborhood set points to the closest nodes
    based on network proximity; this is useful for node addition and
    recovery. One of the interesting things about Pastry is that it
    dynamically maintains all this info pretty efficiently. When a
    new node enters the system, it contacts a nearby node and hands it
    its id. That node sends out a message in search of the closest
    node numerically to the id and returns the leaf set and a row
    of the routing table from each node it was routed through.

    There's a lot more to PAST that is too much to summarize here,
    so I'll conclude with a few thoughts:

    - this is not as complete of a file system solution as CFS.
    - it could have been more clear about pointing out what it was
    not designed to do, which was something I thought CFS did a
    very good job with (for instance, noting the lack of a search
    feature).
    - while it appears to be easy to adjust N, it was never
    described what would be involved if one wanted to configure
    l or b. I suspect this could not be done without a very
    inefficient disruption of the system.
    - the enforcement of the quota system could have been explained
    a little more. Can any node issue a file certificate? I would
    assume so for a true P2P system, but if so, could a bad node
    issue an unlimited quota for another bad node?
    - there was a distinct decision to store entire files on a
    server, rather than follow a Reed-Solomon encoding, which would
    allow the same recovery ability with less storage. This was
    given up in favor of access time. CFS chose the other
    direction.

    changing b and l
    no search


  • Next message: David Coleman: "PAST review"

    This archive was generated by hypermail 2.1.6 : Wed Mar 03 2004 - 14:59:03 PST