From: David Coleman (dcoleman_at_cs.washington.edu)
Date: Sun Mar 07 2004 - 19:41:56 PST
Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing
Workload was very different from all the papers assigned this quarter.
This paper describes and breaks down the Internet traffic generated by
the KaZaa file-sharing system at the UW campus.
This paper answered a basic question I had about Zipf workload
distributions. I had heard of the Zipf curves in earlier papers but did
not have a definition and had not taken the time to research it
(something about being too busy reading papers).
There were three things that surprised me in this paper. First, the
amount of bandwidth consumed by peer-to-peer file-sharing services. I
suspect that the University demographics play a significant part in
skewing the bandwidth usage toward file-sharing but still would not have
ever guessed that it would essentially overwhelm conventional network
traffic. Second, the amount of time that users are willing to wait. I
would not have guessed that they would wait longer than a few hours over
the average time necessary to download that file. Finally, the quantity
of video files being transferred really surprised me. I expected the
audio file quantity to be as large as it was but not the video.
I take minor issue with the definition of web pages being mutable
objects and multimedia files being immutable. This is not a significant
detail for me, but it strikes me as a web page that is nearly constantly
updated could easily be considered a stream of immutable objects and
thus would be considered a series of fetch-at-most-once objects. If the
cnn.com home page never updated, it would be considered an immutable
object but would not be fetched much as it goes out of date. The fact
that most sites allow the user to search through archives of older
versions of the same page gives a little more credence to the concept of
a stream of immutable objects.
I suspect that the size of the University population significantly
increases the effectiveness of caching in a proxy server or redirecting
requests internally. Although the number of requests would be smaller
in a smaller client population I suspect that a much larger percentage
of requests would not be able to be fulfilled locally. Of course, the
smaller client population might generate a fewer number of requests for
older, less-popular objects which might normalize the local hit-rate
back to the results to the paper.
This was an interesting deviation from our routine and provided some
real insight and solutions to dealing with the network traffic generated
by peer-to-peer file-sharing systems. Unfortunately, I believe most
corporate-style networks will simply ban the activity instead of working
to keep traffic internal due to potential legal liability and network
usage issues. Roxio (owner of Napster, founder of the p2p revolution)
has banned it internally for just these reasons.
This archive was generated by hypermail 2.1.6 : Sun Mar 07 2004 - 19:42:04 PST