From: Honghai Liu (liu789_at_hotmail.com)
Date: Sun Mar 07 2004 - 21:39:01 PST
Reviewer: Honghai Liu
Measurement, Modeling, and Analysis of a Peer-to-Peer File Sharing workload
The paper presents detailed analysis and modeling based on a typical P2P file
system Kazaa. Specifically, it compares the traffic pattern and models of P2P
system to the web-based system with some interesting findings.
Web-based traffics have been studied extensively and a typical Zipf distribution is
observed, where a small number of web sites are extremely popular, however,
there is a long tail of unpopular sites. The curve presents a exponential distribution
of (popularity)^(-a), therefore, on the log-log based axis, the curve looks like a
straight line heading down to the right. The caches/proxy , such as DNS or http,
have been heavily used to take the advantage of this distribution because it's very
effective on the popular web site.
P2P systems, such as Kazaa's file sharing system typically store information very
different from web information. The files in P2P are normally multi-media video
or audio clips which are several orders higher than web-based files in size. As a
result, users are usually fetch the clips usually only once. It is observed that the
distribution of the P2P file system, especially for the popular files does not follow
Zipf curve. Moreover, in Kazaa, object arrivals play an important role, while in
the Web, updates to existing pages prevail.
It is interesting to know that cache miss distributions in P2P is the different from
those in web proxy. The cache misses in proxy are due to updates of the content
of the web pages and the reason of Kazaa's cache misses is because of the new
arrival clips. However, the new clients and arrivals cannot compensate the effect
of client aging.
Similarly, locality awareness is as important to P2P as to proxy web. In fact, with
even conservative availability estimate, the distributed cache would achieve traffic
saving by 63%.
In only question I would have with the paper is that it seems to me, most of the
conclusions in the paper can be applied to other file systems such as ftp service.
So I wonder what is the real difference between traditional file system and P2P
shared file system in respective of distribution, cache misses and popularity of files.
This archive was generated by hypermail 2.1.6 : Sun Mar 07 2004 - 21:39:04 PST