From: Ian King (iking_at_killthewabbit.org)
Date: Sun Mar 07 2004 - 00:17:42 PST
The authors describe the results of empirical analysis of Kazaa peer-to-peer
file sharing activity as measured at the network boundary between a major
university and the Internet; they develop and evaluate a model to describe the
observed activity, and ultimately demonstrate a departure from commonly held
beliefs regarding the nature of the studied traffic. In particular, this paper
shows that unlike webpage traffic, multimedia file traffic over peer-to-peer
file sharing networks does not follow a Zipf distribution.
It was quite a surprise to learn that P2P traffic is such a substantial
percentage of overall Internet traffic. The authors describe the nature of the
files that generate the traffic: multimedia files, with audio files typically
between a fraction of a megabyte to a few megabytes in size, and video files
usually in the tens to hundreds of megabytes. One of the observations is that
Kazaa users are very patient folk; these transfers can often require days to
complete. The authors contrast this with the "instant gratification" observed
in webpage consumption - if a page does not render relatively quickly, users
often abandon the attempt.
Some have claimed that multimedia file transfer follows the same pattern as
webpage traffic, namely a Zipf distribution; in the webpage scenario, there are
many accesses of popular but smaller pages, with less frequent accesses of
larger files. The authors demonstrate that the different nature of access -
'each client accesses once', rather than 'each client accesses many times' -
translates into a curve flatter than a Zipf distribution for the most popular
files. This is intuitively obvious: popularity wanes as "everyone has one" of
files that do not change over time (Star Wars is always Star Wars). This has
implications for caching behavior: while popular web pages remain popular
because of new content at the same address, a given set of popular multimedia
files will quickly be replaced by a new set. If that replacement is at a
sufficient pace, caching can still make a contribution to efficiency, as newly
popular files replace previously popular files; while the appearance of new
clients can also maintain the relevance of a given set of files, the effect is
far less prominent, and the required client growth would quickly exhaust network
bandwidth.
One motivation for this study is the cost of Internet connectivity at the
boundary between an institution and the general network; with P2P traffic
becoming a major portion of traffic, and billing often being based on usage,
some institutions have taken steps to limit or curb use of P2P protocols. The
authors propose that perhaps P2P sharing protocols should be more intelligent,
and allow clients to discover local clients that have already downloaded the
file they seek; by adding locality to the protocol, the "boundary crossing" is
minimized. Given the modeled behavior of quickly waning popularity, as
supported by the empirical traffic, this strategy seems quite appropriate, and
can be implemented either through proxy caching policy or changes to the P2P
protocol. However, the legal challenges to some applications of P2P file
sharing will cause many institutions to demur, lest they be accused of aiding in
unlawful activities. Perhaps Shakespeare was right....
This archive was generated by hypermail 2.1.6 : Sun Mar 07 2004 - 00:31:35 PST