From: Reid Wilkes (reidw_at_microsoft.com)
Date: Sun Mar 07 2004 - 23:28:44 PST
Peer to peer traffic is different from web traffic because any object in
a peer-to-peer system is generally requested at most once by any user
whereas in web workloads the same documents are requested many times by
any user. This idea is made abundantly clear in the paper by Gummadi et.
al. in an attempt to explain why peer-to-peer traffic on the internet is
different from well-understood web traffic. First, the authors describe
a trace done of P2P activity over the Kazaa system at the University of
Washington over 200 days in 2002. The data confirms that the P2P traffic
does not follow the Zipf curve of normal web traffic for popularity. The
authors then build a model of P2P traffic using the trace data to
generate appropriate parameters for the model. By running simulations on
the model the authors again show that P2P traffic does not follow the
Zipf curve. Although the popularity of objects is assumed (and the data
seems to support this assumption) to follow a Zipf distribution in terms
of how many unique users request individual objects - the authors
maintain that the request-once nature of the content accounts for the
non-Zipf like distribution of requests. The request-once nature is due
to the fact that content in a P2P network is immutable, and clients all
have large caches so do not need to request the item again. The paper
also discusses some possible techniques for reducing the amount external
traffic generated by a P2P system within an organization. The basic idea
is caching - whether by using a traditional proxy cache at the
organizational boundary or by installing redirectors which can redirect
requests bound for an external server to a local server which is able to
offer the same data. In either case caching certainly looks to provide
large bandwidth savings. One interesting point made about caching in P2P
systems, however, is that caching is so useful because of the fact that
new content constantly enters the P2P system and become the new "most
popular" content. Without this effect, caching would not be very useful
because with a static set of objects and request-once behavior, the most
popular objects (and therefore the cached objects) would quickly be
requested once by all clients and thereafter would be requested no more.
The remaining requests would be for less-popular items which would be
harder to satisfy with cache hits. Overall I thought this paper was
quite interesting - once again I find papers from UW to be on the whole
much better written and more intelligible than average. However, this
particular paper may have run a bit long and become a bit redundant. I
felt the basic idea, which can be expressed in a few sentences, was
being driven home over and over again with slightly different lead-ins
each time. Further, although the paper quotes statistics showing that at
one point in time file sharing traffic consumed a majority of internet
traffic, I wonder if the same is true today given the collapse of many
P2P networks under legal scrutiny. More pointedly - has the phenomena of
file sharing now been replaced by fixed content servers which distribute
content to paying customers? Even if this is true, I would suspect that
many of the observations made in this paper would still hold true.
This archive was generated by hypermail 2.1.6 : Sun Mar 07 2004 - 23:28:50 PST