CSE-561 Notes (10/30/02)
Probability Distributions:
- Commonly occurring distributions: uniform, normal, gamma, exponential,
poisson, pareto, log-normal, weibull
- gamma distribution is the continuous version of the poisson distribution
- log-normal distribution plotted on log scale becomes a normal distribution
- pareto distribution shows features of zipf curves (which occur in context of
word frequencies)
- All of these distributions are heavy-tailed.
On the scale and performance of Cooperative Web Proxy Caching
1. Results of the paper:
- Research methodology: It's important to understand the problem before solving it
- Cooperative web caching isn't worth it
- Up to a certain scale it is worth it
- At a reasonable scale, all algorithms are the same
- Semantic grouping doesn't work (pretty much the same as random grouping assuming that a 5% hit rate improvement is insignificant)
- The paper assumed everything to be static => is this assumption reasonable?
- The data is from 1999 => are they still true? (Note that the paper talks about data from 1996 as being outdated)
- The paper came up with an analytical model
- One thing that the paper doesn't address is that there might be differences between residential users/ university users and home users
2. The question whether to install a proxy cache or not is up to the ISP
3. How did they collect the necessary data?
Basically they did passive monitoring of routers. (Neil pointed out
that this was quite difficult, e.g. in case of fragmented IP headers
you have to collect the fragments, sometimes you have multiple flows
for one request, etc.)
4. Using the URL to count possible cache hits (or to count the
times that pages were requested) could be insufficient due to mirrors
etc. => they used these numbers just to calculate an upper bound on
the possibility of requests being cached.
5. How can you do peer-to-peer web caching?
You can just intercept peer-to-peer traffic using a proxy. This could
also serve other goals, for instance finding a better source for the
requested data
6. After these discussions we turned to the graphs of the paper (asking the
question what we can conclude from the graphs):
- Figure 1: The knee is important. A good graph for pointing this out
- Figure 4: There is one thing that isn't addressed in the text that is quite interesting: The lines never tail off like one could expect them to do but are straight.
- Figure 5: One could consider of not having so many bars in one graph
- Figure 6: This graph is very powerful in it's conclusion. Note that the authors changed the scale of the second graph in order to emphasize the gaps
- Figure 8: Fat bars => graphs gets outstanding => good graph
7. Conclusion: Probably the best way to conclude this is to
consider the effect that this paper had being a very strong negative
result: It basically shut down a whole area of research.
Revealing ISP topologies using Rocketfuel
Why don't ISPs reveal their topologies:
- Customers may get biased while choosing their ISP
- Security: ISP networks will become more vulnerable to attacks
Concerns while mapping the ISP topologies:
- Number of traceroutes employed should be minimized
- Should be able to detect if two different IP addresses belong to the same
router (alias resolution)
- Should not take long (as ISP maps change over time)
Issues with Validation:
- Are ISPs honest about their feedback on the maps constructed by Rocketfuel?
- The technique should be validated on a simulated network.
Discussions about figures in the paper:
- Figure 10: Cute figure (innovative use of bar graphs)
- Figure 11: This graph is commonly observed
- Figure 16: Shows that router outdegree follows power law.