From: Sellakumaran Kanagarathnam (sellak_at_windows.microsoft.com)
Date: Mon Mar 08 2004 - 16:14:57 PST
This paper analyzes P2P workload by looking at a 200 day trace of Kazaa
P2P traffic at the University of Washington. The authors then develop a
model of multimedia workloads and use it to study and confirm the
various conclusions from the trace. At the end, the authors explore and
suggest ways for utilizing locality-awareness to improve external
bandwidth savings. The paper presents a lot of data in various graphs.
There is a key point that is stressed in more than one place in the
paper: Kazaa's workload is driven by immutable multimedia objects and
leads us to fetch-at-most-once behavior; in contrast, a regular page
like, CNN or Google may be fetched over and over again. There are two
important traits of file sharing systems: P2P design and shared files
are predominantly multimedia files. The authors define 3 goals for
themselves: a) to understand the fundamental properties of file-sharing
systems b) to explore the forces driving P2P workloads c) to demonstrate
performance optimization by exploiting untapped locality in workload.
The 200-day trace was collected at the University of Washington between
May 28th and Dec 17th 2002. Both hardware and software were used to
collect the traces and these were installed at the network border
between the university and the internet. Software used a kernel packet
filter that delivered TCP packets to user-level process. This process
identified the http requests (by looking Kazaa-specific HTTP headers
like X-Kazaa-IP). KazaaLite which did not supply usernames as a HTTP
header was special cased. Kazaa users are patient (nearly 20% of users
are willing to wait for a week to download a large file). In the world
of we, the users expect instant gratification and they are unforgiving
if they do not receive it. Users slow down as they age. There are two
reasons: attrition and older clients have slower request rates. Average
session lengths are typically small (2.41 minutes for small files).
There are three types of Kazaa objects: small - up to 10MB - typically
audio files and large - > 100MB - typically video files. Medium files
are between 10MB and 100MB in size. The majority of requests (91%) are
for small objects but the majority of bytes transferred (65%) are due to
large files. Kazaa clients fetch objects at most once, popularity of
Kazaa objects is shortlived, most popular Kazaa objects tend to be
recently born objects and most requests are for old objects and Kazaa
does not follow Zipf distribution.
The authors then discuss why Kazaa is not following Zipf distribtution
and compare Kazaa with other non-Zipf workloads. Next, the authors
describe a model of P2P file-sharing behavior, with client requests
based on underlying Zipf popularity distributions. Analysis shows the
following:
1) Fetch-at-most once client behavior, caused by the immutability of
objects in P2P file-sharing systems, leads to significant deviation from
Zipf popularity distributions
2) As a result, without the introduction of new objects and client,
P2P file-sharing system performance decreases over time (bandwidth
demands rise), because client requests "slide down the Zipf curve"
3) The introduction of new objects in P2P file-sharing systems act
as a rejuvenating force that counter-balances the impact of
fetch-at-most-once client behavior
4) Introducing new clients does not have a similar effect, because
they cannot counteract the hit rate penalty of client aging, which
occurs at the same rate.
Because P2P file-sharing consume a large portion of internet bandwidth,
many organizations curb P2P bandwidth. Instead, exploitation of locality
could be helpful. Locality exploitation refers to using the content
already available within an organization before going to internet. A
simulated ideal proxy cache would result in an external bandwidth
savings of 86%. Given the legal issues, deploying a proxy cache would
not be feasible and so there are two potential implementations of
locality-aware architecture: Centralized request redirection and
decentralized request redirection.
This archive was generated by hypermail 2.1.6 : Mon Mar 08 2004 - 16:15:04 PST