Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload

From: Sellakumaran Kanagarathnam (sellak_at_windows.microsoft.com)
Date: Mon Mar 08 2004 - 16:14:57 PST

  • Next message: Brian Milnes: "Gummadi paper review"

    This paper analyzes P2P workload by looking at a 200 day trace of Kazaa
    P2P traffic at the University of Washington. The authors then develop a
    model of multimedia workloads and use it to study and confirm the
    various conclusions from the trace. At the end, the authors explore and
    suggest ways for utilizing locality-awareness to improve external
    bandwidth savings. The paper presents a lot of data in various graphs.

    There is a key point that is stressed in more than one place in the
    paper: Kazaa's workload is driven by immutable multimedia objects and
    leads us to fetch-at-most-once behavior; in contrast, a regular page
    like, CNN or Google may be fetched over and over again. There are two
    important traits of file sharing systems: P2P design and shared files
    are predominantly multimedia files. The authors define 3 goals for
    themselves: a) to understand the fundamental properties of file-sharing
    systems b) to explore the forces driving P2P workloads c) to demonstrate
    performance optimization by exploiting untapped locality in workload.

    The 200-day trace was collected at the University of Washington between
    May 28th and Dec 17th 2002. Both hardware and software were used to
    collect the traces and these were installed at the network border
    between the university and the internet. Software used a kernel packet
    filter that delivered TCP packets to user-level process. This process
    identified the http requests (by looking Kazaa-specific HTTP headers
    like X-Kazaa-IP). KazaaLite which did not supply usernames as a HTTP
    header was special cased. Kazaa users are patient (nearly 20% of users
    are willing to wait for a week to download a large file). In the world
    of we, the users expect instant gratification and they are unforgiving
    if they do not receive it. Users slow down as they age. There are two
    reasons: attrition and older clients have slower request rates. Average
    session lengths are typically small (2.41 minutes for small files).
    There are three types of Kazaa objects: small - up to 10MB - typically
    audio files and large - > 100MB - typically video files. Medium files
    are between 10MB and 100MB in size. The majority of requests (91%) are
    for small objects but the majority of bytes transferred (65%) are due to
    large files. Kazaa clients fetch objects at most once, popularity of
    Kazaa objects is shortlived, most popular Kazaa objects tend to be
    recently born objects and most requests are for old objects and Kazaa
    does not follow Zipf distribution.

    The authors then discuss why Kazaa is not following Zipf distribtution
    and compare Kazaa with other non-Zipf workloads. Next, the authors
    describe a model of P2P file-sharing behavior, with client requests
    based on underlying Zipf popularity distributions. Analysis shows the
    following:

    1) Fetch-at-most once client behavior, caused by the immutability of
    objects in P2P file-sharing systems, leads to significant deviation from
    Zipf popularity distributions

    2) As a result, without the introduction of new objects and client,
    P2P file-sharing system performance decreases over time (bandwidth
    demands rise), because client requests "slide down the Zipf curve"

    3) The introduction of new objects in P2P file-sharing systems act
    as a rejuvenating force that counter-balances the impact of
    fetch-at-most-once client behavior

    4) Introducing new clients does not have a similar effect, because
    they cannot counteract the hit rate penalty of client aging, which
    occurs at the same rate.

    Because P2P file-sharing consume a large portion of internet bandwidth,
    many organizations curb P2P bandwidth. Instead, exploitation of locality
    could be helpful. Locality exploitation refers to using the content
    already available within an organization before going to internet. A
    simulated ideal proxy cache would result in an external bandwidth
    savings of 86%. Given the legal issues, deploying a proxy cache would
    not be feasible and so there are two potential implementations of
    locality-aware architecture: Centralized request redirection and
    decentralized request redirection.


  • Next message: Brian Milnes: "Gummadi paper review"

    This archive was generated by hypermail 2.1.6 : Mon Mar 08 2004 - 16:15:04 PST