From: Praveen Rao (psrao_at_windows.microsoft.com)
Date: Mon Mar 08 2004 - 17:39:01 PST
In this paper authors discuss the nature of P2P workloads and contrast
it with HTTP workload in the internet. Authors use measurements with
Kazaa to analyse P2P workloads.
The major differences authors cite between Kazaa and web workloads are
that:
* Kazaa objects are immutable and are fetched 'at most once' per client
* popularity distribution of Kazaa objects deviates significantly from
Zipf curve
* while in the web workload is driven by document change, in Kazaa it is
driven by addition of new objects and users
As for the Kazaa users, following are the findings:
* Kazaa users are patient, some large object can take order of weeks to
download
* Users slow down as they age (they ask for less, even though they use
the system as often as new clients)
Kazaa workload can be classified into small (<10 MB) and large (over 10
MB, can be GBs) objects. Even though small objects will consume less
bandwidth overall, they are important for user experience.
Kazaa objects have the following dynamics:
* fetch once - clients fetch most objects just once
* the popularity of Kazaa objects is short-lived
* most popular objects tend to be new-born objects (new multimedia
content)
* most requests are for old objects: despite the previous finding above,
the requests for old objects are significant portion of the total
requests (I guess, people catch up slowly with the new content)
Kazaa objects do not follow Zipf curve. The difference is most marked
for most popular objects - while Zipf curver would suggest extremely
high popularity for the most popular objects, Kazaa objects have
flattened head in the curve -i.e. most popular multimedia object is
significantly less popular than what Zipf would predict.
Authors state that Kazaa objects not following Zipf curve is because of
two main reasons:
* immutability of objects
* fetch once semantics (unlike web content)
Authors present a new model for P2P file-sharing workloads. In this
model there are three factors at work
1) Underlying Zipf distribution
2) The way new objects are inserted into the system
3) The client's fetch-once behavior
Another important aspect of P2P workloads is that cache hit-ratio goes
down as the client age. This worsens performance (bandwidth demands
rise).
Arrival of new objects improves performance. New object counterbalance
the effect of fetch once behavior and make the sharing beneficial again.
New clients on the other hand do not compensate for performance loss due
to aging clients. This is because new clients cannot counteract the hit
penalty of clients aging, which occurs at the same rate.
Authors show measurements to highlight the benefits of local caching of
P2P objects. This can bring down the external bandwidth consumed
tremendously.
This paper was a great analysis of P2P workloads, which I always thought
to be a 'little different' from web workloads (primarily due to
multimedia content and their sizes) I hadn't thought about all the
implications paper talks about.
The organization of the paper was also very good with summary presented
with each section.
This archive was generated by hypermail 2.1.6 : Mon Mar 08 2004 - 17:39:17 PST