Review: Saroiu et al., A Measurement Study of Peer-to-Peer File Sharing Systems

From: Ian King (iking_at_killthewabbit.org)
Date: Sun Mar 07 2004 - 00:17:42 PST

  • Next message: David Coleman: "P2P Measurement Review"

    The authors describe the results of empirical analysis of Kazaa peer-to-peer
    file sharing activity as measured at the network boundary between a major
    university and the Internet; they develop and evaluate a model to describe the
    observed activity, and ultimately demonstrate a departure from commonly held
    beliefs regarding the nature of the studied traffic. In particular, this paper
    shows that unlike webpage traffic, multimedia file traffic over peer-to-peer
    file sharing networks does not follow a Zipf distribution.

    It was quite a surprise to learn that P2P traffic is such a substantial
    percentage of overall Internet traffic. The authors describe the nature of the
    files that generate the traffic: multimedia files, with audio files typically
    between a fraction of a megabyte to a few megabytes in size, and video files
    usually in the tens to hundreds of megabytes. One of the observations is that
    Kazaa users are very patient folk; these transfers can often require days to
    complete. The authors contrast this with the "instant gratification" observed
    in webpage consumption - if a page does not render relatively quickly, users
    often abandon the attempt.

    Some have claimed that multimedia file transfer follows the same pattern as
    webpage traffic, namely a Zipf distribution; in the webpage scenario, there are
    many accesses of popular but smaller pages, with less frequent accesses of
    larger files. The authors demonstrate that the different nature of access -
    'each client accesses once', rather than 'each client accesses many times' -
    translates into a curve flatter than a Zipf distribution for the most popular
    files. This is intuitively obvious: popularity wanes as "everyone has one" of
    files that do not change over time (Star Wars is always Star Wars). This has
    implications for caching behavior: while popular web pages remain popular
    because of new content at the same address, a given set of popular multimedia
    files will quickly be replaced by a new set. If that replacement is at a
    sufficient pace, caching can still make a contribution to efficiency, as newly
    popular files replace previously popular files; while the appearance of new
    clients can also maintain the relevance of a given set of files, the effect is
    far less prominent, and the required client growth would quickly exhaust network
    bandwidth.

    One motivation for this study is the cost of Internet connectivity at the
    boundary between an institution and the general network; with P2P traffic
    becoming a major portion of traffic, and billing often being based on usage,
    some institutions have taken steps to limit or curb use of P2P protocols. The
    authors propose that perhaps P2P sharing protocols should be more intelligent,
    and allow clients to discover local clients that have already downloaded the
    file they seek; by adding locality to the protocol, the "boundary crossing" is
    minimized. Given the modeled behavior of quickly waning popularity, as
    supported by the empirical traffic, this strategy seems quite appropriate, and
    can be implemented either through proxy caching policy or changes to the P2P
    protocol. However, the legal challenges to some applications of P2P file
    sharing will cause many institutions to demur, lest they be accused of aiding in
    unlawful activities. Perhaps Shakespeare was right....


  • Next message: David Coleman: "P2P Measurement Review"

    This archive was generated by hypermail 2.1.6 : Sun Mar 07 2004 - 00:31:35 PST