From: Raz Mathias (razvanma_at_exchange.microsoft.com)
Date: Mon Mar 08 2004 - 13:49:34 PST
Today's paper discussed the usage patterns of users on peer-to-peer networks (specifically Kazaa) and compared them to usage patterns of the web as well as video streaming.
The paper argues that the central cause for the observed behavior is the fact that data on peer-to-peer networks is immutable and the clients fetch objects once rather than many times. In contrast, web pages change frequently and clients repeatedly fetch the same object over and over again. The result is that the popularity of web pages follow a Zipf distribution, in which the popularity of the i'th most popular file is proportional to 1 / (i ^ alpha). The paper demonstrates that this distribution does not hold for Kazaa objects and argues that the fetch-at-most-once behavior (as opposed to fetch-repeatedly) is the cause. The popularity of the most popular files on Kazaa is far lower than the Zipf distribution would predict. Also, the popularity of files on P2P varies (actually decreases) over time as new objects enter and old ones age, whereas for web pages, the popularity remains fairly constant. The paper argues that peer-to-peer systems are driven by new content whereas the web is driven by content change.
The paper then goes to the problem of how to optimize for external bandwidth in an organization utilizing peer-to-peer. It argues that caches that keep the most popular data are ineffective over time because the most popular data changes over time. I believe that the lack of caching efficacy is overstated because of the fact that the paper assumes a Zipf model of popularity for the underlying objects. Therefore, of course, as new objects enter the network at the highest popularity level, the cache quickly becomes invalidated. The assumption on Zipf popularity distribution over the objects is a bit weak and unconvincing and serves as a brittle foundation upon which caching is evaluated because the less severe the difference between the i'th and i+1'th object (and the longer the object remains popular), the more effective the organization's cache will be.
The paper presented an alternative to caching which is redirection. I thought the paper could be a bit more explicit as to what makes caching different from redirection. Does the redirection process do a better job of estimating the underlying popularity distribution? Why couldn't a cache simply hold all of the data that the redirector redirects? I'm not completely sure I understood why it is a bad idea to emulate the redirector with a cache?
An interesting issue that's brought up is the legal issue. The fact that peer-to-peer networks result in so much traffic may cause their demise. The increased load on the network must be managed by the organization somehow, yet utilizing a redirector actually has the effect of indexing all the (possibly pirated) content, which is the reason for Napster's downfall. More specifically, a university is not left with many options if it wants to control its external network costs.
This archive was generated by hypermail 2.1.6 : Mon Mar 08 2004 - 13:50:47 PST