Review of An Analysis of Internet Content Delivery Systems

From: Alan L. Liu (aliu@cs.washington.edu)
Date: Mon Nov 08 2004 - 00:16:20 PST

  • Next message: Scott Schremmer: "Internet Delivery Systems"

    The paper examined UW's inbound and outbound HTTP traffic, specifically
    looking at the amount used by Web and certain P2P applications.

    The paper breaks down their nine day trace into WWW, Akamai, Kazaa, and
    Gnutella traffic. One strength of the paper was in picking out the
    significance of their findings, namely that the number of P2P requests
    (at least those that are HTTP-driven) is two orders of magnitude smaller
    than the number of WWW requests, but the average size of a requested
    object is three orders of magnitude larger in size, so the result is
    that there are actually twice as many open P2P connections as WWW.
            What is also significant is that adding a modest number of new P2P
    clients (assuming they have the same usage characteristics as
    preexisting clients) would markedly increase UW's total net traffic.
    This suggests that network administrators at UW and places like UW may
    need to brace for an unprecedented trend towards increased bandwidth
    demands.

    I felt the weakest parts of the paper were in describing how caching
    could help alleviate P2P bandwidth usage. It was much shorter than the
    exhaustive sections analyzing the traces, which although somewhat
    interesting, contained a lot of obvious findings. Certainly there is a
    place for collecting the data to back up intuition, but I felt too much
    time was spent on the non-earth-shattering (such as that popular video
    files are approximately the size of a CD-R's capacity: 700MB, which
    anybody who uses a P2P program for an hour will intuit). It felt like
    the paper merely threw a bone out for those hoping to read more about
    caching. For example, UW is a net provider of content, so the papers
    suggest that having a reverse cache to satisfy outbound traffic might be
    more beneficial. But what does "beneficial" mean? Under current context,
    it seems like the campus bottleneck link is probably from the campus
    border to the rest of the Internet. Having a reverse cache would only
    lessen UW's intra-network traffic, while having neglible effect on usage
    of that bottleneck link. On the other hand, a forward cache that could
    divert inbound traffic, turning it into an intra-network connection,
    would have a dramatic effect on freeing up what is the scarcest resource.
            Another potential problem with the caching section is when the paper
    claims that an increase in P2P client population would have a positive
    effect on cache hits. However, with a larger sample size, wouldn't a
    larger variance in requested objects ensue? They work under the
    assumption that the distribution of object requests remains the same,
    but anyone who has taken basic statistics knows variance increases with
    sample size.
            One limitation of applying the paper to current Internet traffic is
    that BitTorrent has a plurality of the bandwidth used and it is not
    HTTP-based, nor is there a requirement to use standard ports for
    connections (see http://in.tech.yahoo.com/041103/137/2ho4i.html). In
    fact, because of the pressure of ISPs to restrict bandwidth used by BT,
    the usage of nonstandard ports is actually encouraged by the community.
    Possible future work could be in identifying these more elusive P2P
    applications. In BT's case, the openness of most trackers make this
    possible. I would also be curious whether these applications do a better
    job of direction peers to nearby objects, which would really obviate P2P
    caches since the nearby peers essentially *are* the cache.


  • Next message: Scott Schremmer: "Internet Delivery Systems"

    This archive was generated by hypermail 2.1.6 : Mon Nov 08 2004 - 00:16:22 PST