Gummadi paper review

From: Brian Milnes (brianmilnes_at_qwest.net)
Date: Mon Mar 08 2004 - 16:38:10 PST

  • Next message: Richard Jackson: "Review: Gummadi, et al. Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload."

    Measurement, Modeling and Analysi of a Peer-to-Peer File-Sharing Workload -
    Gumaddi et al

                The authors study the peer to peer Kazaa file sharing system's
    performance at the University of Washington, qualify this traffic and show
    that it can be improved by using locality. They traced 20 TB of Kazaa
    traffic over a 200 day period, modeled it and ran trace driven simulations
    to optimize it.

    They measured an anonymized all of the Kazaa traffic at the boundary of the
    University of Washington. Kazaa is shown to be a batch oriented system with
    man small requests taking over a day and up to 20% waiting a week for large
    object downloads. As the clients age they requested less data but still
    requested something each week with a probability of about 50%. Kazaa's
    workload is shown to be a mix of many small audio clips, a smaller
    intermediate mix of 10-100 MB and the largest share of bytes transferred at
    over 100 MB is video files.

    The immutability of multi-media files cause Kazaa users to download an
    object once in 94% of the cases and twice in 99% of the cases. The most
    popular objects are new and their popularity is fleeting. About three
    quarters of requests go to objects older than one month.

    The distribution of objects accessed is not Zipf unlike WWW pages. The
    authors posit that this is caused by the immutability of multi-media objects
    and the fetch only once semantics of Kazaa users. They demonstrate this by
    simulating their 40,000 most popular downloads in two cases: one in which
    the clients can request a page repeatedly and one in which they can not. The
    fetch repeatedly case fit a Zipf distribution and the authors compare this
    to other multi-media workload distribution.

    They next constructed a simplified model of Kazaa usage. They selected files
    for downloading based on the Zipf distribution but they inserted new objects
    at random locations into this distribution and allow clients to download at
    most once. They studied the addition of a Kazaa cache to the model and show
    that it quickly becomes less effective as clients download the popular
    objects and move on to a large space of objects. The exception is that new
    popular objects are quickly cached.

    They finally studied the use of locality in Kazaa. A proxy cache could cache
    up to 86% of the bytes used but might have bad legal consequences. They
    propose redirecting requests at the boundary or using location dependent
    clients. The unasked question is why is redirecting any better legally than
    a cache? The IT department is still helping organize the illegal download of
    copyrighted material. Why would using locality not compromise Kazaa's
    primary goal of sharing illegal content safely?

    They use their trace information and the assumptions of perfect clients to
    show that locality awareness would save about 63% of traffic outside of the
    network. Why is this an interesting measure when University of Washington's
    huge size makes it about 1 in 50 of the largest clients? Would this scale up
    to smaller Universities? Nor do they describe how to change the Kazaa
    architecture to get a realistic amount of this sharing.

    I'd say that this is a very nice paper in that it shows and analyzes a "flat
    head Zipf" distribution in a real system. However, it's far from a finished
    piece of work in terms of showing the benefits of locality in peer to peer
    systems.


  • Next message: Richard Jackson: "Review: Gummadi, et al. Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload."

    This archive was generated by hypermail 2.1.6 : Mon Mar 08 2004 - 16:38:49 PST