|
|
|
|
Final Homework: You Tube
Out: Wednesday November 28
Due: Wednesday December 5
Preliminaries
"The value of this assignment is in the having done it, not the answers."
By now I'm sure you recognize sentences like that as saying
that both the questions and the correctness of potential answers are
a bit ill defined.
Overview
We're going to use a trace of network traffic to try to
estimate some network characteristics of interest.
The trace was taken using tcpdump to record
all packets into and out of a local machine. While that
was going on, I interacted with only one application, Firefox ,
and it was used only to fetch and view a You Tube
video. (Details on all this a little later.)
Using the packet trace, try to estimate the following network
characteristics. Note that all of your estimates are going to be just that,
and in some cases a proper conclusion might be "insufficient data to make
any reasonable estimate of this quantity."
- What is the RTT between the local machine (the one on which
the trace was captured) and each other machine pertinent to viewing
the
You Tube video?
- What is the "bottleneck bandwidth" between the local machine and
each other machine, in each direction?
- To what extent does the TCP connection manage to maintain
the bandwidth-delay product number of bytes on the wire?
- To overcome variability in network delays, the Flash player buffers
some video before it starts playing. Suppose, unrealistically,
that it starts playing when 64KB of video has been received.
Characterize the sources of delay (from the time the user requested the
video until it starts playing), using the following bins, as well as
any others that you think are useful:
- Software architecture - delays due to the way software is built, e.g.,
contacting services that are required
before we can even begin to fetch the actual video data (e.g., DNS).
- Internet path bandwidth - what fraction of the delay is due to bandwidth
limitations imposed by the Internet path between the source and destination machine?
- Internet path queueing - what fraction is due to queueing at the intermediate
routers?
- Internet path latency - what fraction...
- Network losses -- the penalties associated with retransmitting (including
any retransmissions based on mistakenly assuming a packet was lost)
- TCP slow start - suppose TCP could remember the appropriate send rate
from the source to the local machine, and started at that rate, rather than
using slow start to ramp up the transmission rate.
- Steady state behavior - suppose TCP were able to keep a fixed rate,
rather than repeated engaging in AIMD behavior. How much loss is there,
relative to this fixed rate, of what TCP is actually doing?
More Overview
The traces may not conform to your expectations of how TCP should behave.
For some reason the You Tube , FireFox , Linux ,and other such people didn't keep in mind that we might be trying
to use traces of their code's behavior when they built it.
The point is to use what you know about networking to do as good a job
as you can, using only the information in the trace.
If you find that you need to make some assumption about machine processing speeds/delays, you should make whatever assumption is most convenient for you.
Usually this is that the end hosts are infinitely fast.
This should be incredibly fun, by the way...
Trace Capture Procedure
The procedure to capture the trace was:
- Start capturing all the packets into/from my machine:
$ tcpdump -i eth0 -w youtubeTrace.tcpdump host rocketship.cs.washington.edu
Note that I wasn't actively running anything else at this point, but I
didn't make any special effort to quiesce the machine.
- Launch FireFox. For whatever reason, I don't seem to have a home
page, and so it looked like FireFox was not fetching a page just yet.
- Navigate to
www.youtube.com/watch?v=68072V5iEx4
(It really doesn't matter what this video is, but there's the URL in
case you want to make your own trace.)
- "Watch" the entire video. Playing time is 4:33.
- Shortly after the video ended, shut down FireFox.
- Shortly after that, shut down tcpdump.
Trace Details
There are actually two trace files:
- youtubeTrace.tcpdump
Captured on my office Linux box,
rocketship .
It is connected directly to a
gigabit Ethernet switch, and then into the department networking
infrastructure.
- youtubeTrace2.tcpdump
Captured on one of my home Linux boxes. As I don't run
any kind of name server, it is known as 192.168.0.107.
It is connected through two gigabit Ethernet switches to my
DSL gateway/router, known as 192.168.0.1.
When I took the trace on my home machine, it looked to me like
all video data was finished downloading at about 1:12 into the
play. I didn't notice at what time it was finished loading on
the office machine run.
You should answer the questions for one of the two traces; which one is
up to you. There are two mainly because I thought it might be interesting
to have more than one.
Resources
tcpdump
tcpdump can print "human readable" versions of
packets, as well as capturing raw packet data. The man page
explains how to do this, including how to cause it to read
a previously captured trace file as input (rather than watching
the live network). An example invocation, used in a script
described below, is
tcpdump -tt -vv -r youtubeTrace.tcpdump
and here are the first few packets output:
1196198593.413534 IP (tos 0x0, ttl 64, id 53241, offset 0, flags [DF], proto UDP (17), length 74) rocketship.cs.washington.edu.32812 > ns2.cs.washington.edu.domain: [bad udp cksum c95f!] 32079+ A? rocketship.cs.washington.edu. (46)
1196198593.415159 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 317) ns2.cs.washington.edu.domain > rocketship.cs.washington.edu.32812: 32079* q: A? rocketship.cs.washington.edu. 1/6/6 rocketship.cs.washington.edu.[|domain]
1196198595.484362 arp who-has ns2.cs.washington.edu tell rocketship.cs.washington.edu
1196198595.484495 arp reply ns2.cs.washington.edu is-at 00:b0:d0:f0:17:e8 (oui Unknown)
1196198596.427464 IP (tos 0x0, ttl 64, id 56255, offset 0, flags [DF], proto UDP (17), length 61) rocketship.cs.washington.edu.32812 > ns2.cs.washington.edu.domain: [bad udp cksum ccee!] 11021+ AAAA? www.youtube.com. (33)
1196198596.428871 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto UDP (17), length 120) ns2.cs.washington.edu.domain > rocketship.cs.washington.edu.32812: 11021 q: AAAA? www.youtube.com. 0/1/0 ns: youtube.com. SOA[|domain]
The first field is a timestamp - tcpdump records the time at which the packet was received.
What follows is the IP and TCP (if it's a TCP/IP packet) headers,
and then maybe some bit of the payload.
The man page will help understand the details,
as will your knowledge of what information
is contained in IP and TCP headers.
On Windows, tcpdump is called WinDump.
mkgraph.pl
Use of this script is completely optional.
This is a home brewed perl script that produces a slew of graphs
from the TCP traffic in the trace. (That is, it ignores non-TCP
traffic.) It turns out to be hard to find a sensible way to graph
so much data, especially due to range problems.
What the script does is only marginally useful. The questions posed above
can be answered using just tcpdump to examine the data; the graphs are probably
useful mainly to identify places in the trace that deserve your closer
scrutiny.
Documentation on the script is provided by a comment at the top. Here's a
brief rundown, though.
Invocation is something like this:
$ ./mkgraph.pl youtubeTrace.tcpdump
This will:
- delete any existing directory named youtubeTrace.tcpdump-output
- create a directory with that name
- convert the binary data in the trace file into characters, using
the tcpdump command in the first part of this section. That output is
left as a file in the new directory.
- scan that output file looking for TCP packets
- partition that data into one-way streams, by which I mean transmissions
from a particular host/port to a particular host/port. (Note that this
isn't necessarily accurate - the script doesn't notice connections being
closed and reopened, so if that occurs the data for multiple connections
would show up in a single graph.)
- create a graph (in the new subdirectory) for each one-way flow
showing
sequence number + packet data length of each packet sent.
Time (the x-axis in the graphs) has been normalized to start at 0 by
subtracting the timestamp of the first record seen from all timestamps.
A tiny bit of effort has gone into this script. It's even flakier than
tcpdump, so view it skeptically. A small fraction of the tiny effort has gone
into testing the script using WinDump. (The script is inherently fragile
because of the scheme of parsing the output of tcpdump/WinDump. The two don't
produce character-by-character equal output, as well as other issues.)
WireShark
It's here. I've never used it.
Capturing Your Own Traces
You might want to capture a trace of your own, more likely out of interest
than because that will help with this assignment. On Unix, you must
be root (to capture data -- you can run tcpdump to examine a trace file
as a normal user). This means you probably have to own/manage a Unix
box to capture your own trace.
I haven't tried capturing data with WinDump, but I suspect
that (a) it doesn't require any special privileges, except maybe under
Vista, and that (b) if you're using Vista you've already decided to
turn off all security features, so it should work there as well.
Files
All available (e.g., by scp) at attu:/cse/courses/cse461/07au/ :
youtubeTrace.tcpdump
youtubeTrace2.tcpdump
mkgraph.pl
What To Hand In
A short report that for each question tells us:
- How you went about getting an answer
- What answer you got, if you got one
- Why you couldn't get one, if one can't be had
I think it's better to consider the assignment as "tell us what you can,
given that you have only X hours to spend on this" than "give us the most
accurate and complete answers conceivable." What is 'X'? I'd guess 1 to 10,
with high likelihood of being within a factor of two.
|