From: Muench, Joanna (jmuench_at_fhcrc.org)
Date: Mon Mar 08 2004 - 15:07:43 PST
Grummadi et al. (2003) discuss observations of peer-to-peer file-sharing
workload and present an alternative routing strategy to decrease bandwidth
use. The authors note the marked change in Internet traffic over the last
several years, as WWW traffic gives way to p2p sharing, especially on
college campuses.
Given the changing nature of Internet traffic, the paper analyzes the
distribution of size and types of requests by studying 203 days of Kazaa
requests on the UW network. The distribution of requests appears to be
bimodal, with the majority of requests being between 1-10 MB, with a smaller
peak above 100 MB. Most surprising is the patience of Kazaa users, with
download times easily exceeding a week.
The most important point in the entire paper is that Kazaa clients fetch
objects at most once. Between the existence of large local storage devices,
slow download times and immutability of the media, users have no incentive
to access the same file repeatedly. This results in a non-Zipf popularity
distribution, unlike most WWW traffic. The non-Zipf-ness of the distribution
is not at all surprising, considering that the Zipf distribution was
originally used to describe word use in language. Fortunately we re-use
words frequently; therefore the Zipf distribution represents a type of
sampling with replacement. However the fetch-at-most-once dynamics of p2p
file access are closer to a sample without replacement scheme, and therefore
could be expected to follow a different distribution. This part of the paper
would have benefited from collaboration with a statistician.
The authors model the p2p file-sharing workloads based on their observations
from the UW Kazaa trace. Like any model, some input parameters require
estimation, such as the number of distinct objects. And like most modelers,
the authors make an estimate and then fail to discuss how varying that
estimate would alter the results. They do come to the interesting
observation that the system evolves towards one with no locality. However
that observation is based on the assumption that client request rates remain
constant over time, and with time begin requesting a greater number of
objects from the unpopular tail. This contradicts earlier observations in
Figure 3 that clients request fewer objects as they age. But the model does
provide some useful insights into the important influence of the rate at
which new objects become available.
The authors conclude the paper with a potential new request routing scheme
that utilizes the locality of the file-sharing workload. Because caching of
files has potential legal ramifications, an organization-based locality
aware mechanism offers a potential means to significantly reduce bandwidth.
They also summarize the importance of increased availability for the most
popular servers.
This paper provided an excellent overview of the issues involved in p2p file
sharing. Unlike earlier distributed system such as Grapevine, the authors
have the benefit of being able to observe user activity before proposing a
system architecture. Their observations are contrary to commonly accepted
wisdom, but consistent with a fetch-at-most-once scenario.
This archive was generated by hypermail 2.1.6 : Mon Mar 08 2004 - 15:09:45 PST