Review: Gummadi, et al. Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload.

From: Richard Jackson (richja_at_expedia.com)
Date: Mon Mar 08 2004 - 16:45:02 PST

  • Next message: Prasanna Kumar Jayapal: "Review of Measurements, Modelling, and Analysis of a P2P System (by Gummadi et al)"

    This paper for the October 2003 SOSP conference analyzes the key factors
    that influence the Kazaa peer-to-peer(P2P) network. The paper presents
    extensive quantitative analysis which provides new ideas about the
    large-scale workings of P2P systems.
     
    The paper is divided into the following sections: 1) measurement using
    a trace, 2) Zipf's law analysis, 3)a model for P2P workloads, 4)
    locality awareness.
     
    This paper focuses on measurement and analysis of the widely-used Kazaa
    P2P system. The paper attempts to thoroughly understand the workloads,
    and also how various factors influence the workloads. A majority of
    the findings are based on a 200-day trace that was taken during 2002 at
    the University of Washington. The trace measured Kazaa HTTP requests
    that were initiated within the University and were targeted outside the
    University. The most important properties of P2P traffic included: 1)
    objects are immutable, 2) objects are downloaded at most once, 3) large
    objects consume most of the bandwidth, 4) popularity of objects changes
    frequently.
     
    The section on Zipf attempted to understand the observed non-Zipf
    properties of the P2P traffic. The conclusion is that the non-Zipf
    behavior is caused by the P2P "download at most once" semantics. To me,
    this really means that the queries are Zipf-like, but after the first
    download, they are loaded from local cache instead of downloaded again.
    If a user downloaded a file, used it, then deleted it, they would be
    forced to download it again the next time they wanted to use it. The
    Zipf properties would then apply. The "download at most once" property
    is what makes P2P distinctly different from most Web traffic. The
    immutable property is what makes this "download at most once" property
    feasible.
     
    A generic model was developed to allow P2P experiments with various
    differing parameters. For example, the paper focused on the arrival
    rate for new objects and new clients. They found that 1) sharing
    effectiveness diminishes with age - because all popular/cached files
    have already been downloaded, 2) new object arrivals actually improve
    performance - as they upset the previous popularity distributions. Some
    components of the model include: # of clients, # of objects, object
    birth rate, client birth rate, etc.
     
    Regarding locality awareness, the authors did a study to determine how
    much efficiency could be gained by caching P2P traffic within an
    environment such as the University of Washington. The found that
    methods such as a local caching proxy server, or a locality-aware P2P
    redirector could provide at least a 63% cache hit ratio within the
    controlled network. The cost-savings of this are obvious and
    significant.
     
    About the statement that "Users Slow Down As They Age," I think this may
    be due to the users becoming more literate in the capabilities of the
    P2P system. That is, a more experienced user will make careful
    decisions about what they want to download based on latency statistics
    or other factors. A naive, new user is likely to download a huge amount
    of objects(even many duplicate objects) without any awareness of their
    likely outcome.
     
    Overall, this paper provided a huge amount of great analysis regarding
    P2P traffic. Much of the discussion was novel and many interesting
    questions were raised. The section on Zipf was very good, as the
    authors questioned previous work that has been widely adopted. I think
    this paper was done at the right time and provided some key information
    that will be difficult to obtain again, as the P2P systems are growing
    and mutating at a fast pace. One aspect of the paper that seemed weak
    was the effect of local-caching within the University. How much did
    this contribute to the long-term scalability results?
     
     
     


  • Next message: Prasanna Kumar Jayapal: "Review of Measurements, Modelling, and Analysis of a P2P System (by Gummadi et al)"

    This archive was generated by hypermail 2.1.6 : Mon Mar 08 2004 - 16:45:14 PST