internet content delivery systems

From: Chandrika Jayant (cjayant@cs.washington.edu)
Date: Mon Nov 08 2004 - 01:07:09 PST

  • Next message: Kevin Wampler: "Content Delivery review"

    “An Analysis of Internet Content Delivery Systems”
    Written by Saroiu, Gummadi, Dunn, Gribble, and Levy
    Reviewed by Chandrika Jayant

                This paper analyzes internet content delivery with respect to four systems: HTTP web traffic, Akamain CDN, Kazaa P2P, and Gnutella P2P (HTTP for downloading P2P, non-HTTP for searching P2P). The traces are done on all incoming and outcoming Internet traffic at the University of Washington (with over 60,000 people) over a span of 9 days. The paper is very timely as traffic patterns have changed drastically over the past few years and some surprising observations result. It’s really amazing how much P2P traffic hogs bandwidth- I knew it took up a lot of space but I had no idea how proportionally blatant it was. Kazaa accounted for almost 2/5 of observed TCP traffic!

                P2P traffic accounts for the majority of transferred HTTP bytes. P2P documents are 3 orders of magnitude larger than web objects, which isn’t surprising but it IS surprising that such a low P2P request rate and small population of systems still has twice the flows of web traffic. Small number of large objects account for very large fraction of observed P2P traffic. There are so many concurrent requests being serviced because of P2P’s long transfer rates(1000 times the rates for web traffic objects). In P2P, clients and servers are not clearly divided- load travels “similarly” in both directions. However, load is very poorly spread out. A small number of clients/servers make up most of the observed traffic. I’m surprised that the load is not inherently balanced better, especially in Kazaa which has supernodes.

                A problem with growing P2P network sharing is that it doesn’t seem to scale very well at all. The bandwidth cost of each Kazaa peer turned out to be about 90 times that of a web client- each added peer majorly affects the network.

                The authors present caching as a possible solution to the problems of growing CDN and P2P systems. They come to the conclusion that since much content is static in CDN’s (unlike in WWW), a local web proxy could reduce the need for a separate CDN. This is not discussed much and almost glossed over in comparison to the discussion on P2P networks- perhaps another paper should explore if this would even be useful at all.

    Proxy caching could help in P2P systems but the authors present a very preliminary proposal. P2P traffic appears to be a good candidate for caching since its traffic is very repetitive. Wide-area demands, if this was successfully deployed, would be greatly reduced. It seems natural the way that the authors present the idea, but why haven’t other people thought of this yet (or why isn’t it mentioned)?

                The lack of generality in the paper bothered me. First of all, the authors picked Akamai as their CDN, and Kazaa and Gnutella as their P2P systems without explaining how indicative these specific systems are of the general trends of their categories (CDN vs P2P). Also, the whole paper is written in the context of a large university setting. There is no discussion of what this means in non-university/ smaller settings. I would assume in non-university settings, P2P traffic wouldn’t be nearly as prevalent. Also, since there is better bandwidth available at a university compared to a home, many people will try to download files from that type of setting, possibly biasing the outbound/inbound traffic model set here. I appreciate the value of this paper in the specific setting and would even believe that it could model parts of other networks quite well, but I would need to be convinced.

                The fact that video traffic had increased by almost 400% in a similar study three years prior to 2002, and MP3 traffic by 300% speaks for itself- this is an exciting and new branch in networking. Obviously P2P sharing needs to be handled in a drastically different way for it ever to be able to scale, as populations and networks grow. Caching seems like a plausible idea to battle this inevitable problem.

               


  • Next message: Kevin Wampler: "Content Delivery review"

    This archive was generated by hypermail 2.1.6 : Mon Nov 08 2004 - 01:07:15 PST