P2P Measurements review

From: Praveen Rao (psrao_at_windows.microsoft.com)
Date: Mon Mar 08 2004 - 17:39:01 PST

  • Next message: David V. Winkler: "Review: Measurement, Modeling, and Analysis of a Peer-to-Peer File Sharing Workload"

    In this paper authors discuss the nature of P2P workloads and contrast
    it with HTTP workload in the internet. Authors use measurements with
    Kazaa to analyse P2P workloads.

    The major differences authors cite between Kazaa and web workloads are
    that:
    * Kazaa objects are immutable and are fetched 'at most once' per client
    * popularity distribution of Kazaa objects deviates significantly from
    Zipf curve
    * while in the web workload is driven by document change, in Kazaa it is
    driven by addition of new objects and users

    As for the Kazaa users, following are the findings:
    * Kazaa users are patient, some large object can take order of weeks to
    download
    * Users slow down as they age (they ask for less, even though they use
    the system as often as new clients)

    Kazaa workload can be classified into small (<10 MB) and large (over 10
    MB, can be GBs) objects. Even though small objects will consume less
    bandwidth overall, they are important for user experience.

    Kazaa objects have the following dynamics:
    * fetch once - clients fetch most objects just once
    * the popularity of Kazaa objects is short-lived
    * most popular objects tend to be new-born objects (new multimedia
    content)
    * most requests are for old objects: despite the previous finding above,
    the requests for old objects are significant portion of the total
    requests (I guess, people catch up slowly with the new content)

    Kazaa objects do not follow Zipf curve. The difference is most marked
    for most popular objects - while Zipf curver would suggest extremely
    high popularity for the most popular objects, Kazaa objects have
    flattened head in the curve -i.e. most popular multimedia object is
    significantly less popular than what Zipf would predict.

    Authors state that Kazaa objects not following Zipf curve is because of
    two main reasons:
    * immutability of objects
    * fetch once semantics (unlike web content)

    Authors present a new model for P2P file-sharing workloads. In this
    model there are three factors at work
    1) Underlying Zipf distribution
    2) The way new objects are inserted into the system
    3) The client's fetch-once behavior

    Another important aspect of P2P workloads is that cache hit-ratio goes
    down as the client age. This worsens performance (bandwidth demands
    rise).

    Arrival of new objects improves performance. New object counterbalance
    the effect of fetch once behavior and make the sharing beneficial again.

    New clients on the other hand do not compensate for performance loss due
    to aging clients. This is because new clients cannot counteract the hit
    penalty of clients aging, which occurs at the same rate.

    Authors show measurements to highlight the benefits of local caching of
    P2P objects. This can bring down the external bandwidth consumed
    tremendously.

    This paper was a great analysis of P2P workloads, which I always thought
    to be a 'little different' from web workloads (primarily due to
    multimedia content and their sizes) I hadn't thought about all the
    implications paper talks about.

    The organization of the paper was also very good with summary presented
    with each section.


  • Next message: David V. Winkler: "Review: Measurement, Modeling, and Analysis of a Peer-to-Peer File Sharing Workload"

    This archive was generated by hypermail 2.1.6 : Mon Mar 08 2004 - 17:39:17 PST