Janet's Lecture. **Goal of this lecture. 0. Questions on projects --> utilize her office hour. 1. She desires discussion-oriented classes. She expects more involvement from students. 2. This week's papers are measurement paper. She will go through the paper focusing on (0) what method did they use? (1) What are fair assumptions to make for setting up measurement environment and analysis on results, (2) what kinds of setting would be reasonable for validation of measurement studies? (3) Learn about graphs. Good? Bad? **Difference between 1st paper and 2nd paper. 1st paper (V. Paxson paper) | 2nd paper (S.Savage paper/ Sting) Criteria 1. Scope Paxson paper is about Internet packet dynamics. It presents the characteristics of packet dynamics with many measurement studies, such as bottleneck bandwidth, packet loss, packet corruptions, etc. Paxson paper tries to convey complete picture on Internet pathologies. On the other hand, Sting paper only discusses measurement on packet loss. 2. Measurement Scale Measurement on Paxson paper is conducted twice on 35 sites. Data sets are only two runs. In sting paper, experiments were carried out at the university of Washington for 24 hour, and validation was missing. It would be also good to compare with other packet loss measurement technologies. **Questions raised by students and Janet Q1. What do you need to do for longitudinal measurement study? A. Somewhat straightforward answer. We need to collect data from data set spanning for a long time. Q2. What kind of change do you expect by conducting the same experiment on large HTTP traffic? Q3. Do you think the conclusion of these measurement papers applies to web traffic or P2P traffic? Q4. What is TCP vs. ICMP for measuring loss? A. TCP congesting control mechanism can affect packet loss rate. ** Understanding Sting measurement method. Q. Large packet hole filling does not seem to work. A. Yes it does not work. Because the missing byte is overwritten by the next packet. We found error in this paper!!! ** Graphs of V. Paxson paper. First three graphs - same kind - packet time series diagrams - Point of graph: helpful to understand the paper To convince the readers More of an explanation than to show actually results. Figure 1. Out of delivery with two distinct slopes. Q. Why do you think the authors presented a graph plotting a massive reordering event without connecting them as a line? A. They want readers to notice the same thing. If you look at the carefully, you can see the difference of slope. Also it's important to note that between out-of-order deliveries, there is a time gap. This implies that there were route changes. Figure 4. Histogram of single bottleneck estimates for N2. Note that x-axis is not linear. It's semi log scale. Q. What does the author try to convey to readers? A. Bottleneck bandwidth he found for sites in N2. This graph is not good at presenting the accuracy of bottleneck bandwidth measurement tool, because the bottleneck bandwidth of links in N2 is not presented. (The author was trying to show that they found many T1 connections, which was present in N2) They used self-validation method by comparing the measurement results with known facts of the network. Q. Why do you think they used semi log scale for X-axis? A. Probably to save space. Q. Why did the author not annotate the graph? A. Maybe it was too obvious. Figure 5. N2 loss rates for data packets and acks. It's typical CDF graph. CDF means cumulative distribution function. Y-axis in CDF graph is cumulative probability. Q. What message does the author try to deliver? A. Ack loss rate comes in the middle of three cases and represents inherent loss rate. Table 1. Conditional Ack loss rates for different regions. Q. Why is this graph misleading? A. Because they used the same percentage (%) metric for columns that represents different meaning of ratio. The second column and third column give the proportion of all connections that were quiescent in N1 and N2. The fourth and fifth columns give the proportion of acks lost for busy periods and the final column summarizes the relative change of these figures. Q. How can we improve this graph? A. 1. Draw separate ones for different columns. 2. Annotate the table with "percentage of time", "loss rate" Figure 7. Distribution of packet loss outage durations exceeding 200 msec Note that X-axis is log scale and Y-axis is cumulative probability. Figure 8. Log -Log complementary distribution plot of N2 ack outage durations Q. Why the author did not annotate that X-axis is log scale for Figure 7? A. It is confusing to readers because Figure 8 is annotated with log log scale. Q. What is Pareto distribution? A. It is a continuous distribution and has significant heavy tail distribution. We cannot ignore the distribution on tail side. It has infinite variance (parameter a < 2) and infinite mean (parameter a < 1). ** Graphs of S. Savage Paper. Time of Day Graph - right way of graphing data to show effect - scattered plot (both dimensions important). Hard to see diagonal effect and the percent loss. CDF of Loss Rate - misleading since bottom is not zero on y axis - trade off : harder to get overall scale easier to see details. ** Validation Method for measurement study 1. Compare with other measurement tool 2. Use network emulator, and check whether the tool discovers the parameter of emulated network. 3. If you are (or know) network administrator, you can get data from them and compare the result with it. ** Things to consider designing a measurement tool. 1. Is clock synchronization required? 2. Does it require instrumenting a router? (End-to-End measurement, or not)? 3. Does it require cooperation from a receiver? 4. How can we minimize perturbation of measurement probe traffic?