From: Reid Wilkes (reidw_at_microsoft.com)
Date: Wed Mar 03 2004 - 11:03:27 PST
The topic of peer to peer systems in research is a very interesting one
to me - not so much because I'm interested in peer to peer systems
themselves but rather I find it interesting that the research community
has ended up taking great interest in an idea that was originally a
somewhat underground way to violate copyright laws. This was actually my
first technical look at how peer to peer systems would be implemented,
and I had no understanding of the problems related to such systems going
into it. Although the paper never really explicitly laid out the
problems faced in designing such a system, much of these issues could be
inferred from the description of the way the system was designed. The
authors chose to tackle a relatively confined set of issues with this
paper - notably the peer-to-peer system stores immutable files. In
addition, the main focus of the design seemed to be on resiliency to
both technical failures and also political and social factors and
natural disasters. It seemed to me that the majority of the author's
goals in the paper were achieved by virtue of their choice of Pastry as
the routing mechanism. Pastry provided the underlying node
identification scheme, which assigned numbers to nodes in a
pseudo-random fashion irrespective of their location or ownership.
Identifiers for files in PAST were also pseudo-random, and created when
the file is inserted into the system. A portion of the bits in the file
were used to place a number of replicas of the file across the nodes in
the network. The bits used to do this placement from the file ID were
the same number of bits as the node identifiers, and if k replicas
needed to be placed then the nodes chosen to store those replicas were
the k nodes with node id's numerically closest to the file ID. This
relatively simple idea provides a great deal of random distribution of
file replicas across the network because of the pseudo-random assignment
of node ID's. Much of the discussion in the paper focused on techniques
designed to help utilize more effectively the storage capacity of the
system. Without any work in this area, it is fairly likely that certain
nodes will get filled up more quickly than others due to file size
variations as well as node capacity differences. The paper describes
replica diversion, which basically involves a node forwarding a replica
to a neighbor in the address space which potentially has more room to
store a file. Another option is file diversion, where a new randomly
generated file ID is given to the file so that it will be stored in a
different area of the address space. Experimental results show that
these techniques greatly improve the storage capacity of the network.
This archive was generated by hypermail 2.1.6 : Wed Mar 03 2004 - 11:03:39 PST