CSE/EE 461: Introduction to Computer-Communication Networks, Autumn 2007
  CSE Home   About Us   Search   Contact Info 
 
Course Home
 Home
Administation
 Overview
 Using course email
 Email archive
 Anonymous feedback
 View feedback
 Homework Turnin
 
Most Everything
 Schedule
 
Information
 UW/ACM Tutorials
    Final Homework: You Tube
Out: Wednesday November 28
Due: Wednesday December 5

Preliminaries

"The value of this assignment is in the having done it, not the answers."

By now I'm sure you recognize sentences like that as saying that both the questions and the correctness of potential answers are a bit ill defined.

Overview

We're going to use a trace of network traffic to try to estimate some network characteristics of interest. The trace was taken using tcpdump to record all packets into and out of a local machine. While that was going on, I interacted with only one application, Firefox, and it was used only to fetch and view a You Tube video. (Details on all this a little later.)

Using the packet trace, try to estimate the following network characteristics. Note that all of your estimates are going to be just that, and in some cases a proper conclusion might be "insufficient data to make any reasonable estimate of this quantity."

  1. What is the RTT between the local machine (the one on which the trace was captured) and each other machine pertinent to viewing the You Tube video?

  2. What is the "bottleneck bandwidth" between the local machine and each other machine, in each direction?

  3. To what extent does the TCP connection manage to maintain the bandwidth-delay product number of bytes on the wire?

  4. To overcome variability in network delays, the Flash player buffers some video before it starts playing. Suppose, unrealistically, that it starts playing when 64KB of video has been received. Characterize the sources of delay (from the time the user requested the video until it starts playing), using the following bins, as well as any others that you think are useful:
    • Software architecture - delays due to the way software is built, e.g., contacting services that are required before we can even begin to fetch the actual video data (e.g., DNS).
    • Internet path bandwidth - what fraction of the delay is due to bandwidth limitations imposed by the Internet path between the source and destination machine?
    • Internet path queueing - what fraction is due to queueing at the intermediate routers?
    • Internet path latency - what fraction...
    • Network losses -- the penalties associated with retransmitting (including any retransmissions based on mistakenly assuming a packet was lost)
    • TCP slow start - suppose TCP could remember the appropriate send rate from the source to the local machine, and started at that rate, rather than using slow start to ramp up the transmission rate.
    • Steady state behavior - suppose TCP were able to keep a fixed rate, rather than repeated engaging in AIMD behavior. How much loss is there, relative to this fixed rate, of what TCP is actually doing?

More Overview

The traces may not conform to your expectations of how TCP should behave. For some reason the You Tube, FireFox, Linux,and other such people didn't keep in mind that we might be trying to use traces of their code's behavior when they built it.

The point is to use what you know about networking to do as good a job as you can, using only the information in the trace.

If you find that you need to make some assumption about machine processing speeds/delays, you should make whatever assumption is most convenient for you. Usually this is that the end hosts are infinitely fast.

This should be incredibly fun, by the way...

Trace Capture Procedure

The procedure to capture the trace was:
  1. Start capturing all the packets into/from my machine:
    $ tcpdump -i eth0 -w youtubeTrace.tcpdump host rocketship.cs.washington.edu
    Note that I wasn't actively running anything else at this point, but I didn't make any special effort to quiesce the machine.
  2. Launch FireFox. For whatever reason, I don't seem to have a home page, and so it looked like FireFox was not fetching a page just yet.
  3. Navigate to
    www.youtube.com/watch?v=68072V5iEx4
    (It really doesn't matter what this video is, but there's the URL in case you want to make your own trace.)
  4. "Watch" the entire video. Playing time is 4:33.
  5. Shortly after the video ended, shut down FireFox.
  6. Shortly after that, shut down tcpdump.

Trace Details

There are actually two trace files:
  • youtubeTrace.tcpdump Captured on my office Linux box, rocketship. It is connected directly to a gigabit Ethernet switch, and then into the department networking infrastructure.

  • youtubeTrace2.tcpdump
    Captured on one of my home Linux boxes. As I don't run any kind of name server, it is known as 192.168.0.107. It is connected through two gigabit Ethernet switches to my DSL gateway/router, known as 192.168.0.1.

When I took the trace on my home machine, it looked to me like all video data was finished downloading at about 1:12 into the play. I didn't notice at what time it was finished loading on the office machine run.

You should answer the questions for one of the two traces; which one is up to you. There are two mainly because I thought it might be interesting to have more than one.

Resources

tcpdump

tcpdump can print "human readable" versions of packets, as well as capturing raw packet data. The man page explains how to do this, including how to cause it to read a previously captured trace file as input (rather than watching the live network). An example invocation, used in a script described below, is

tcpdump -tt -vv -r youtubeTrace.tcpdump
and here are the first few packets output:

1196198593.413534 IP (tos 0x0, ttl 64, id 53241, offset 0, flags [DF], proto UDP (17), length 74) rocketship.cs.washington.edu.32812 > ns2.cs.washington.edu.domain: [bad udp cksum c95f!] 32079+ A? rocketship.cs.washington.edu. (46)
1196198593.415159 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 317) ns2.cs.washington.edu.domain > rocketship.cs.washington.edu.32812: 32079* q: A? rocketship.cs.washington.edu. 1/6/6 rocketship.cs.washington.edu.[|domain]
1196198595.484362 arp who-has ns2.cs.washington.edu tell rocketship.cs.washington.edu
1196198595.484495 arp reply ns2.cs.washington.edu is-at 00:b0:d0:f0:17:e8 (oui Unknown)
1196198596.427464 IP (tos 0x0, ttl 64, id 56255, offset 0, flags [DF], proto UDP (17), length 61) rocketship.cs.washington.edu.32812 > ns2.cs.washington.edu.domain: [bad udp cksum ccee!] 11021+ AAAA? www.youtube.com. (33)
1196198596.428871 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 120) ns2.cs.washington.edu.domain > rocketship.cs.washington.edu.32812: 11021 q: AAAA? www.youtube.com. 0/1/0 ns: youtube.com. SOA[|domain]
The first field is a timestamp - tcpdump records the time at which the packet was received. What follows is the IP and TCP (if it's a TCP/IP packet) headers, and then maybe some bit of the payload. The man page will help understand the details, as will your knowledge of what information is contained in IP and TCP headers.

On Windows, tcpdump is called WinDump.

mkgraph.pl

Use of this script is completely optional.

This is a home brewed perl script that produces a slew of graphs from the TCP traffic in the trace. (That is, it ignores non-TCP traffic.) It turns out to be hard to find a sensible way to graph so much data, especially due to range problems. What the script does is only marginally useful. The questions posed above can be answered using just tcpdump to examine the data; the graphs are probably useful mainly to identify places in the trace that deserve your closer scrutiny.

Documentation on the script is provided by a comment at the top. Here's a brief rundown, though.

Invocation is something like this:

$ ./mkgraph.pl youtubeTrace.tcpdump
This will:
  • delete any existing directory named youtubeTrace.tcpdump-output
  • create a directory with that name
  • convert the binary data in the trace file into characters, using the tcpdump command in the first part of this section. That output is left as a file in the new directory.
  • scan that output file looking for TCP packets
  • partition that data into one-way streams, by which I mean transmissions from a particular host/port to a particular host/port. (Note that this isn't necessarily accurate - the script doesn't notice connections being closed and reopened, so if that occurs the data for multiple connections would show up in a single graph.)
  • create a graph (in the new subdirectory) for each one-way flow showing sequence number + packet data length of each packet sent.
Time (the x-axis in the graphs) has been normalized to start at 0 by subtracting the timestamp of the first record seen from all timestamps.

A tiny bit of effort has gone into this script. It's even flakier than tcpdump, so view it skeptically. A small fraction of the tiny effort has gone into testing the script using WinDump. (The script is inherently fragile because of the scheme of parsing the output of tcpdump/WinDump. The two don't produce character-by-character equal output, as well as other issues.)

WireShark

It's here. I've never used it.

Capturing Your Own Traces

You might want to capture a trace of your own, more likely out of interest than because that will help with this assignment. On Unix, you must be root (to capture data -- you can run tcpdump to examine a trace file as a normal user). This means you probably have to own/manage a Unix box to capture your own trace.

I haven't tried capturing data with WinDump, but I suspect that (a) it doesn't require any special privileges, except maybe under Vista, and that (b) if you're using Vista you've already decided to turn off all security features, so it should work there as well.

Files

All available (e.g., by scp) at attu:/cse/courses/cse461/07au/:
  • youtubeTrace.tcpdump
  • youtubeTrace2.tcpdump
  • mkgraph.pl

What To Hand In

A short report that for each question tells us:
  • How you went about getting an answer
  • What answer you got, if you got one
  • Why you couldn't get one, if one can't be had
I think it's better to consider the assignment as "tell us what you can, given that you have only X hours to spend on this" than "give us the most accurate and complete answers conceivable." What is 'X'? I'd guess 1 to 10, with high likelihood of being within a factor of two.

Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
[comments to zahorjan at cs.washington.edu]