From: Brian Milnes (brianmilnes_at_qwest.net)
Date: Mon Mar 08 2004 - 16:38:10 PST
Measurement, Modeling and Analysi of a Peer-to-Peer File-Sharing Workload -
Gumaddi et al
The authors study the peer to peer Kazaa file sharing system's
performance at the University of Washington, qualify this traffic and show
that it can be improved by using locality. They traced 20 TB of Kazaa
traffic over a 200 day period, modeled it and ran trace driven simulations
to optimize it.
They measured an anonymized all of the Kazaa traffic at the boundary of the
University of Washington. Kazaa is shown to be a batch oriented system with
man small requests taking over a day and up to 20% waiting a week for large
object downloads. As the clients age they requested less data but still
requested something each week with a probability of about 50%. Kazaa's
workload is shown to be a mix of many small audio clips, a smaller
intermediate mix of 10-100 MB and the largest share of bytes transferred at
over 100 MB is video files.
The immutability of multi-media files cause Kazaa users to download an
object once in 94% of the cases and twice in 99% of the cases. The most
popular objects are new and their popularity is fleeting. About three
quarters of requests go to objects older than one month.
The distribution of objects accessed is not Zipf unlike WWW pages. The
authors posit that this is caused by the immutability of multi-media objects
and the fetch only once semantics of Kazaa users. They demonstrate this by
simulating their 40,000 most popular downloads in two cases: one in which
the clients can request a page repeatedly and one in which they can not. The
fetch repeatedly case fit a Zipf distribution and the authors compare this
to other multi-media workload distribution.
They next constructed a simplified model of Kazaa usage. They selected files
for downloading based on the Zipf distribution but they inserted new objects
at random locations into this distribution and allow clients to download at
most once. They studied the addition of a Kazaa cache to the model and show
that it quickly becomes less effective as clients download the popular
objects and move on to a large space of objects. The exception is that new
popular objects are quickly cached.
They finally studied the use of locality in Kazaa. A proxy cache could cache
up to 86% of the bytes used but might have bad legal consequences. They
propose redirecting requests at the boundary or using location dependent
clients. The unasked question is why is redirecting any better legally than
a cache? The IT department is still helping organize the illegal download of
copyrighted material. Why would using locality not compromise Kazaa's
primary goal of sharing illegal content safely?
They use their trace information and the assumptions of perfect clients to
show that locality awareness would save about 63% of traffic outside of the
network. Why is this an interesting measure when University of Washington's
huge size makes it about 1 in 50 of the largest clients? Would this scale up
to smaller Universities? Nor do they describe how to change the Kazaa
architecture to get a realistic amount of this sharing.
I'd say that this is a very nice paper in that it shows and analyzes a "flat
head Zipf" distribution in a real system. However, it's far from a finished
piece of work in terms of showing the benefits of locality in peer to peer
systems.
This archive was generated by hypermail 2.1.6 : Mon Mar 08 2004 - 16:38:49 PST