From: Richard Jackson (richja_at_expedia.com)
Date: Mon Mar 08 2004 - 16:45:02 PST
This paper for the October 2003 SOSP conference analyzes the key factors
that influence the Kazaa peer-to-peer(P2P) network. The paper presents
extensive quantitative analysis which provides new ideas about the
large-scale workings of P2P systems.
The paper is divided into the following sections: 1) measurement using
a trace, 2) Zipf's law analysis, 3)a model for P2P workloads, 4)
locality awareness.
This paper focuses on measurement and analysis of the widely-used Kazaa
P2P system. The paper attempts to thoroughly understand the workloads,
and also how various factors influence the workloads. A majority of
the findings are based on a 200-day trace that was taken during 2002 at
the University of Washington. The trace measured Kazaa HTTP requests
that were initiated within the University and were targeted outside the
University. The most important properties of P2P traffic included: 1)
objects are immutable, 2) objects are downloaded at most once, 3) large
objects consume most of the bandwidth, 4) popularity of objects changes
frequently.
The section on Zipf attempted to understand the observed non-Zipf
properties of the P2P traffic. The conclusion is that the non-Zipf
behavior is caused by the P2P "download at most once" semantics. To me,
this really means that the queries are Zipf-like, but after the first
download, they are loaded from local cache instead of downloaded again.
If a user downloaded a file, used it, then deleted it, they would be
forced to download it again the next time they wanted to use it. The
Zipf properties would then apply. The "download at most once" property
is what makes P2P distinctly different from most Web traffic. The
immutable property is what makes this "download at most once" property
feasible.
A generic model was developed to allow P2P experiments with various
differing parameters. For example, the paper focused on the arrival
rate for new objects and new clients. They found that 1) sharing
effectiveness diminishes with age - because all popular/cached files
have already been downloaded, 2) new object arrivals actually improve
performance - as they upset the previous popularity distributions. Some
components of the model include: # of clients, # of objects, object
birth rate, client birth rate, etc.
Regarding locality awareness, the authors did a study to determine how
much efficiency could be gained by caching P2P traffic within an
environment such as the University of Washington. The found that
methods such as a local caching proxy server, or a locality-aware P2P
redirector could provide at least a 63% cache hit ratio within the
controlled network. The cost-savings of this are obvious and
significant.
About the statement that "Users Slow Down As They Age," I think this may
be due to the users becoming more literate in the capabilities of the
P2P system. That is, a more experienced user will make careful
decisions about what they want to download based on latency statistics
or other factors. A naive, new user is likely to download a huge amount
of objects(even many duplicate objects) without any awareness of their
likely outcome.
Overall, this paper provided a huge amount of great analysis regarding
P2P traffic. Much of the discussion was novel and many interesting
questions were raised. The section on Zipf was very good, as the
authors questioned previous work that has been widely adopted. I think
this paper was done at the right time and provided some key information
that will be difficult to obtain again, as the P2P systems are growing
and mutating at a fast pace. One aspect of the paper that seemed weak
was the effect of local-caching within the University. How much did
this contribute to the long-term scalability results?
This archive was generated by hypermail 2.1.6 : Mon Mar 08 2004 - 16:45:14 PST