From: Jeff Duzak (jduzak_at_exchange.microsoft.com)
Date: Mon Mar 08 2004 - 11:19:45 PST
This paper describes the results of the analysis of a 200-day-long trace
of Kazaa requests from within UW. The analysis finds that the P2P
file-sharing workload is significantly different from the typical web
workload in a number of ways. The authors then show how the Kazaa
workload benefits less from caching than a web workload. Last, the
paper demonstrates how a system of file-sharing among users within the
UW can decrease external request load from the UW.
The first few sections describe a number of characteristics of the Kazaa
workload. Many of these are intuitive, such as the often long
transaction time, the dropoff of bytes requested by clients as the
clients age. The fact that the frequency of requests by a client does
not also drop off with a client's age surprised me. The paper goes on
to describe how large file requests dominate the workload of the system,
and how the most popularly requested objects are the most recently
"born". These findings, likewise, are fairly intuitive.
Next, the paper shows how the Kazaa workload, as well as the workloads
of multimedia systems described in previous papers, does not follow the
Zipf distribution. A model is used to show that the fetch-at-most-once
nature of multimedia objects explains the non-Zipf distribution of a
multimedia workload. Again, this is reasonable.
One part of the model that seems a bit unreasonable is the assumption
that clients will continue to make requests at a constant rate (2
objects / day) even when the most popular objects have been expended. I
would expect that the client request rate would drop as popular objects
are expended. I believe the assumed constant request rate exaggerates
the rate of request of unpopular objects, and therefore exaggerates the
decrease in cache hit rate. Nonetheless, the conclusion that
fetch-at-most-once behavior causes decreased cache efficiency seems
valid, and is a surprise.
Last, the paper describes a system of redirecting object requests
originating within the UW user community to peers within the same
community. The system achieves a large reduction in requests to peers
external to UW. More interestingly, tests of this system show that the
most active clients are the most important for achieving this caching
effect.
Overall, the paper presented a number of interesting results. However,
the paper was fairly repetitive of the fact that a P2P file-sharing
workload is not Zipf. I wish the paper spent more time discussing the
implications of this finding.
This archive was generated by hypermail 2.1.6 : Mon Mar 08 2004 - 11:19:47 PST