CSE-561 Notes (10/30/02)

Probability Distributions:

Commonly occurring distributions: uniform, normal, gamma, exponential, poisson, pareto, log-normal, weibull
gamma distribution is the continuous version of the poisson distribution
log-normal distribution plotted on log scale becomes a normal distribution
pareto distribution shows features of zipf curves (which occur in context of word frequencies)
All of these distributions are heavy-tailed.

On the scale and performance of Cooperative Web Proxy Caching

1. Results of the paper:

Research methodology: It's important to understand the problem before solving it
Cooperative web caching isn't worth it
Up to a certain scale it is worth it
At a reasonable scale, all algorithms are the same
Semantic grouping doesn't work (pretty much the same as random grouping assuming that a 5% hit rate improvement is insignificant)
The paper assumed everything to be static => is this assumption reasonable?
The data is from 1999 => are they still true? (Note that the paper talks about data from 1996 as being outdated)
The paper came up with an analytical model
One thing that the paper doesn't address is that there might be differences between residential users/ university users and home users

2. The question whether to install a proxy cache or not is up to the ISP

3. How did they collect the necessary data?

Basically they did passive monitoring of routers. (Neil pointed out that this was quite difficult, e.g. in case of fragmented IP headers you have to collect the fragments, sometimes you have multiple flows for one request, etc.)

4. Using the URL to count possible cache hits (or to count the times that pages were requested) could be insufficient due to mirrors etc. => they used these numbers just to calculate an upper bound on the possibility of requests being cached.

5. How can you do peer-to-peer web caching?

You can just intercept peer-to-peer traffic using a proxy. This could also serve other goals, for instance finding a better source for the requested data

6. After these discussions we turned to the graphs of the paper (asking the question what we can conclude from the graphs):

Figure 1: The knee is important. A good graph for pointing this out
Figure 4: There is one thing that isn't addressed in the text that is quite interesting: The lines never tail off like one could expect them to do but are straight.
Figure 5: One could consider of not having so many bars in one graph
Figure 6: This graph is very powerful in it's conclusion. Note that the authors changed the scale of the second graph in order to emphasize the gaps
Figure 8: Fat bars => graphs gets outstanding => good graph

7. Conclusion: Probably the best way to conclude this is to consider the effect that this paper had being a very strong negative result: It basically shut down a whole area of research.

Revealing ISP topologies using Rocketfuel

Why don't ISPs reveal their topologies:

Customers may get biased while choosing their ISP
Security: ISP networks will become more vulnerable to attacks

Concerns while mapping the ISP topologies:

Number of traceroutes employed should be minimized
Should be able to detect if two different IP addresses belong to the same router (alias resolution)
Should not take long (as ISP maps change over time)

Issues with Validation:

Are ISPs honest about their feedback on the maps constructed by Rocketfuel?
The technique should be validated on a simulated network.

Discussions about figures in the paper:

Figure 10: Cute figure (innovative use of bar graphs)
Figure 11: This graph is commonly observed
Figure 16: Shows that router outdegree follows power law.