P2P Measurement Paper Review

From: Reid Wilkes (reidw_at_microsoft.com)
Date: Sun Mar 07 2004 - 23:28:44 PST

  • Next message: Jeff Duzak: "Review of "Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload""

    Peer to peer traffic is different from web traffic because any object in
    a peer-to-peer system is generally requested at most once by any user
    whereas in web workloads the same documents are requested many times by
    any user. This idea is made abundantly clear in the paper by Gummadi et.
    al. in an attempt to explain why peer-to-peer traffic on the internet is
    different from well-understood web traffic. First, the authors describe
    a trace done of P2P activity over the Kazaa system at the University of
    Washington over 200 days in 2002. The data confirms that the P2P traffic
    does not follow the Zipf curve of normal web traffic for popularity. The
    authors then build a model of P2P traffic using the trace data to
    generate appropriate parameters for the model. By running simulations on
    the model the authors again show that P2P traffic does not follow the
    Zipf curve. Although the popularity of objects is assumed (and the data
    seems to support this assumption) to follow a Zipf distribution in terms
    of how many unique users request individual objects - the authors
    maintain that the request-once nature of the content accounts for the
    non-Zipf like distribution of requests. The request-once nature is due
    to the fact that content in a P2P network is immutable, and clients all
    have large caches so do not need to request the item again. The paper
    also discusses some possible techniques for reducing the amount external
    traffic generated by a P2P system within an organization. The basic idea
    is caching - whether by using a traditional proxy cache at the
    organizational boundary or by installing redirectors which can redirect
    requests bound for an external server to a local server which is able to
    offer the same data. In either case caching certainly looks to provide
    large bandwidth savings. One interesting point made about caching in P2P
    systems, however, is that caching is so useful because of the fact that
    new content constantly enters the P2P system and become the new "most
    popular" content. Without this effect, caching would not be very useful
    because with a static set of objects and request-once behavior, the most
    popular objects (and therefore the cached objects) would quickly be
    requested once by all clients and thereafter would be requested no more.
    The remaining requests would be for less-popular items which would be
    harder to satisfy with cache hits. Overall I thought this paper was
    quite interesting - once again I find papers from UW to be on the whole
    much better written and more intelligible than average. However, this
    particular paper may have run a bit long and become a bit redundant. I
    felt the basic idea, which can be expressed in a few sentences, was
    being driven home over and over again with slightly different lead-ins
    each time. Further, although the paper quotes statistics showing that at
    one point in time file sharing traffic consumed a majority of internet
    traffic, I wonder if the same is true today given the collapse of many
    P2P networks under legal scrutiny. More pointedly - has the phenomena of
    file sharing now been replaced by fixed content servers which distribute
    content to paying customers? Even if this is true, I would suspect that
    many of the observations made in this paper would still hold true.


  • Next message: Jeff Duzak: "Review of "Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload""

    This archive was generated by hypermail 2.1.6 : Sun Mar 07 2004 - 23:28:50 PST