review of P2P measurement paper

From: Muench, Joanna (jmuench_at_fhcrc.org)
Date: Mon Mar 08 2004 - 15:07:43 PST

  • Next message: Cem Paya: "Review: P2P file-sharing workload"

    Grummadi et al. (2003) discuss observations of peer-to-peer file-sharing
    workload and present an alternative routing strategy to decrease bandwidth
    use. The authors note the marked change in Internet traffic over the last
    several years, as WWW traffic gives way to p2p sharing, especially on
    college campuses.

    Given the changing nature of Internet traffic, the paper analyzes the
    distribution of size and types of requests by studying 203 days of Kazaa
    requests on the UW network. The distribution of requests appears to be
    bimodal, with the majority of requests being between 1-10 MB, with a smaller
    peak above 100 MB. Most surprising is the patience of Kazaa users, with
    download times easily exceeding a week.

    The most important point in the entire paper is that Kazaa clients fetch
    objects at most once. Between the existence of large local storage devices,
    slow download times and immutability of the media, users have no incentive
    to access the same file repeatedly. This results in a non-Zipf popularity
    distribution, unlike most WWW traffic. The non-Zipf-ness of the distribution
    is not at all surprising, considering that the Zipf distribution was
    originally used to describe word use in language. Fortunately we re-use
    words frequently; therefore the Zipf distribution represents a type of
    sampling with replacement. However the fetch-at-most-once dynamics of p2p
    file access are closer to a sample without replacement scheme, and therefore
    could be expected to follow a different distribution. This part of the paper
    would have benefited from collaboration with a statistician.

    The authors model the p2p file-sharing workloads based on their observations
    from the UW Kazaa trace. Like any model, some input parameters require
    estimation, such as the number of distinct objects. And like most modelers,
    the authors make an estimate and then fail to discuss how varying that
    estimate would alter the results. They do come to the interesting
    observation that the system evolves towards one with no locality. However
    that observation is based on the assumption that client request rates remain
    constant over time, and with time begin requesting a greater number of
    objects from the unpopular tail. This contradicts earlier observations in
    Figure 3 that clients request fewer objects as they age. But the model does
    provide some useful insights into the important influence of the rate at
    which new objects become available.

    The authors conclude the paper with a potential new request routing scheme
    that utilizes the locality of the file-sharing workload. Because caching of
    files has potential legal ramifications, an organization-based locality
    aware mechanism offers a potential means to significantly reduce bandwidth.
    They also summarize the importance of increased availability for the most
    popular servers.

    This paper provided an excellent overview of the issues involved in p2p file
    sharing. Unlike earlier distributed system such as Grapevine, the authors
    have the benefit of being able to observe user activity before proposing a
    system architecture. Their observations are contrary to commonly accepted
    wisdom, but consistent with a fetch-at-most-once scenario.


  • Next message: Cem Paya: "Review: P2P file-sharing workload"

    This archive was generated by hypermail 2.1.6 : Mon Mar 08 2004 - 15:09:45 PST