From: Alan L. Liu (aliu@cs.washington.edu)
Date: Mon Nov 08 2004 - 00:16:20 PST
The paper examined UW's inbound and outbound HTTP traffic, specifically
looking at the amount used by Web and certain P2P applications.
The paper breaks down their nine day trace into WWW, Akamai, Kazaa, and
Gnutella traffic. One strength of the paper was in picking out the
significance of their findings, namely that the number of P2P requests
(at least those that are HTTP-driven) is two orders of magnitude smaller
than the number of WWW requests, but the average size of a requested
object is three orders of magnitude larger in size, so the result is
that there are actually twice as many open P2P connections as WWW.
What is also significant is that adding a modest number of new P2P
clients (assuming they have the same usage characteristics as
preexisting clients) would markedly increase UW's total net traffic.
This suggests that network administrators at UW and places like UW may
need to brace for an unprecedented trend towards increased bandwidth
demands.
I felt the weakest parts of the paper were in describing how caching
could help alleviate P2P bandwidth usage. It was much shorter than the
exhaustive sections analyzing the traces, which although somewhat
interesting, contained a lot of obvious findings. Certainly there is a
place for collecting the data to back up intuition, but I felt too much
time was spent on the non-earth-shattering (such as that popular video
files are approximately the size of a CD-R's capacity: 700MB, which
anybody who uses a P2P program for an hour will intuit). It felt like
the paper merely threw a bone out for those hoping to read more about
caching. For example, UW is a net provider of content, so the papers
suggest that having a reverse cache to satisfy outbound traffic might be
more beneficial. But what does "beneficial" mean? Under current context,
it seems like the campus bottleneck link is probably from the campus
border to the rest of the Internet. Having a reverse cache would only
lessen UW's intra-network traffic, while having neglible effect on usage
of that bottleneck link. On the other hand, a forward cache that could
divert inbound traffic, turning it into an intra-network connection,
would have a dramatic effect on freeing up what is the scarcest resource.
Another potential problem with the caching section is when the paper
claims that an increase in P2P client population would have a positive
effect on cache hits. However, with a larger sample size, wouldn't a
larger variance in requested objects ensue? They work under the
assumption that the distribution of object requests remains the same,
but anyone who has taken basic statistics knows variance increases with
sample size.
One limitation of applying the paper to current Internet traffic is
that BitTorrent has a plurality of the bandwidth used and it is not
HTTP-based, nor is there a requirement to use standard ports for
connections (see http://in.tech.yahoo.com/041103/137/2ho4i.html). In
fact, because of the pressure of ISPs to restrict bandwidth used by BT,
the usage of nonstandard ports is actually encouraged by the community.
Possible future work could be in identifying these more elusive P2P
applications. In BT's case, the openness of most trackers make this
possible. I would also be curious whether these applications do a better
job of direction peers to nearby objects, which would really obviate P2P
caches since the nearby peers essentially *are* the cache.
This archive was generated by hypermail 2.1.6 : Mon Nov 08 2004 - 00:16:22 PST