Review: Measurement, Modeling, and Analysis of a P2P File-Sharing Workload

From: Honghai Liu (liu789_at_hotmail.com)
Date: Sun Mar 07 2004 - 21:39:01 PST

  • Next message: Reid Wilkes: "P2P Measurement Paper Review"

    Reviewer: Honghai Liu

    Measurement, Modeling, and Analysis of a Peer-to-Peer File Sharing workload

     

     

    The paper presents detailed analysis and modeling based on a typical P2P file
    system Kazaa. Specifically, it compares the traffic pattern and models of P2P
    system to the web-based system with some interesting findings.

     

    Web-based traffics have been studied extensively and a typical Zipf distribution is
     observed, where a small number of web sites are extremely popular, however,
    there is a long tail of unpopular sites. The curve presents a exponential distribution
    of (popularity)^(-a), therefore, on the log-log based axis, the curve looks like a
    straight line heading down to the right. The caches/proxy , such as DNS or http,
    have been heavily used to take the advantage of this distribution because it's very
     effective on the popular web site.

     

    P2P systems, such as Kazaa's file sharing system typically store information very
    different from web information. The files in P2P are normally multi-media video
    or audio clips which are several orders higher than web-based files in size. As a
     result, users are usually fetch the clips usually only once. It is observed that the
    distribution of the P2P file system, especially for the popular files does not follow
     Zipf curve. Moreover, in Kazaa, object arrivals play an important role, while in
    the Web, updates to existing pages prevail.

     

    It is interesting to know that cache miss distributions in P2P is the different from
    those in web proxy. The cache misses in proxy are due to updates of the content
    of the web pages and the reason of Kazaa's cache misses is because of the new
    arrival clips. However, the new clients and arrivals cannot compensate the effect
    of client aging.

    Similarly, locality awareness is as important to P2P as to proxy web. In fact, with
     even conservative availability estimate, the distributed cache would achieve traffic
     saving by 63%.

     

    In only question I would have with the paper is that it seems to me, most of the
    conclusions in the paper can be applied to other file systems such as ftp service.
    So I wonder what is the real difference between traditional file system and P2P
     shared file system in respective of distribution, cache misses and popularity of files.


  • Next message: Reid Wilkes: "P2P Measurement Paper Review"

    This archive was generated by hypermail 2.1.6 : Sun Mar 07 2004 - 21:39:04 PST