CSE 461 15sp: Homework 1

Out: Monday, March 30, 2015
Due: Monday, April 6, 2015, 2:00 PM
Turnin: Online, course dropbox
Teams No, do this individually

Wireshark

Requirements

Wireshark
Wireshark is a software tool that can capture and examine packet traces. A packet trace is a record of traffic at a location on the network, that is, the traffic seen by some network interface (e.g., an Ethernet or WiFi adapter). The trace is a log of the bits that make up each packet seen, along with timestamps indicating when they were seen. Capturing traces requires root privilege. It's easy enough to do if you have root, so you might want to try it on your own machine. For this assignment, we start from an already captured trace file and use Wireshark only to help investigate its contents.

Wireshark is already installed on CSE machines. If it is not already installed on your machine, your standard mechanism for installing software might provide it, or else you can download it from www.wireshark.org.

Trace File
The trace file is here.

Warm-up: Inspecting the Trace

The Wireshark GUI has three main sections, as shown in the figure below. In the top panel is a list of the packets in the trace. The bottom two panels show details on a single packet, selected by clicking on one in the top panel. The middle panel shows header fields - each protocol layer adds a header that encapsulates the information passed down by the layer above. The bottom panel shows the raw bytes that make up the packet. (Note that we're using "packet" as a general term here. Strictly speaking, a unit of information at the link layer (which is what is captured in the trace) is called a frame. At the network layer (IP), the unit is called a packet, at the transport layer a datagram or a segment (depending), and at the application layer a message. We'll often use packet out of habit, though.)

Figure 1: Example Wireshark session

Select a packet for which the Protocol column is HTTP and the Info column says it is a GET. This is a packet that carries a web (HTTP) request, for instance as sent from your browser to a web server. Let's have a closer look to see how the packet structure reflects the protocols that are in use.

Since we are fetching a web page, we know that the protocol layers being used are as shown below. That is, HTTP is the application layer protocol used to fetch URLs. Like many Internet applications, it runs on top of the TCP transport layer, which itself runs on top of the IP network layer. IP runs on top of some link/physical layer protocols, depending on the physical network. These are typically combined by Wireshark and displayed as Ethernet, if the trace was captured on a wireless interface, or 802.11, if it was captured on a wireless interface.

Figure 2: Protocol stack for a web fetch

With the HTTP GET packet selected you can examine the protocol header for each layer, using the middle panel. You can expand the information for each layer by clicking on the + expander or icon to see details about the information it provides.

The first Wireshark block is the Frame. Its bits (header plus payload (data)) are all the bits of that transmission unit. Expanding the Frame line (in the middle panel) provides overall information about the packet, including when it was captured and how many bits long it is.
The second block is Ethernet. Expanding it provides details about the contents of this frame's Ethernet header.
Then come IP, TCP, and HTTP, in that order. Each layer transmits a header, used to communicate between the protocol agents at that layer at each end, and encapsulates the data and headers of the layers above. Note that the order of the headers is from the bottom of the protocol stack upwards. This is because as packets are passed down the stack, the header information of each lower layer protocol is added to the front of the information from the higher layer protocol. That is, the lower layer protocols come first in the packet on the wire.

Now find another HTTP message, the response from the server to your computer, and look at the structure of its packets. The packet whose Info field contains "200 OK" is the final packet carrying the server's HTTP response message. (That is, the HTTP response is so large that it is carried as many network packets.) In our trace, there are two blocks synthesized by Wireshark and shown in its middle panel, as seen in the next figure. These blocks provide you, the Wireshark user, with potentially useful information, but the information they contain is not encoded in this particular packet (or, at least, not fully contained in this packet).

The first extra block, "[11 reassembled TCP segments ]", is a "reassembly" of the complete HTTP response using all the packets into which it was broken. Each of these packets is shown earlier in the trace.
The second extra block, "Line-based text data", describes the HTTP data contained in the HTTP message. (The full message is an HTTP header plus this data.) In our case the data is of type text/html. (Wireshark understands this by parsing the HTTP header, the same way that your browser understands it.) The data is displayed using a format appropriate for its type.

Figure 3: Inspecting a HTTP 200 OK response

Question 1: Packet Structure Diagram

Draw a figure of the HTTP GET packet (packet 4 in the trace) that shows the position and size in bytes of the HTTP, TCP, IP and Ethernet protocol headers. Your figure can simply show the overall packet as a long, thin rectangle. Leftmost elements are the first sent on the wire. On this drawing, show for each protocol layer the byte range containing the protocol header. If the topmost layer has data, it will be contained in the final segment of the packet. In that case, show its byte range as well. (So, your diagram partitions the bytes of the packets into many protocol headers and possibly one data segment.)

To work out sizes, observe that when you select a protocol block in the middle panel by clicking on it, Wireshark highlights the bytes it corresponds to in the packet in the lower panel and displays their length at the bottom of the window.

Q1: Hand in your packet drawing.

Questions 2-3: Protocol Overhead

Estimate the download protocol overhead, or percentage of the download bytes taken up by protocol overhead. To do this, consider HTTP data (headers and message) to be useful data for the network to carry, and lower layer headers (TCP, IP, and Ethernet) to be the overhead. We would like this overhead to be small, so that most bits are used to carry content that applications care about. To work this out, first look at only the packets in the download direction for a single web fetch. (The GET travels upstream. The other direction is downstream.) The packets should start with a short TCP packet described as a SYN ACK, which is the beginning of a connection. They will be followed by mostly longer packets in the middle (of roughly 1 to 1.5KB), of which the last one is an HTTP packet. This is the main portion of the download. And they will likely end with a short TCP packet that is part of ending the connection. For each packet, you can inspect how much overhead it has in the form of Ethernet / IP / TCP headers, and how much useful HTTP data it carries in the TCP payload. You may also look at the HTTP packet in Wireshark to learn how much data is in the TCP payloads over all download packets.

Q2: Estimate the download protocol overhead on packet 7 in the given trace.

Q3: Estimate the download protocol overhead for the entire HTTP response, as defined above.

Question 4-5: Demultiplexing Keys

When an Ethernet frame arrives at a computer, the Ethernet layer must hand the packet it contains to the next higher layer to be processed. There can be many "next higher layers" installed on any particular system, and the act of finding the right one to hand any particular incoming packet to is called demultiplexing. We know that in our case the higher layer is IP. But how does the Ethernet protocol know this? We have the same issue at the IP layer -- IP must be able to determine that the contents of IP message is a TCP packet so that it can hand it to the TCP protocol to process. The answer is that protocols use have fields in their headers indicating what the next higher level protocol is. These fields, called demultiplexing keys, are filled in by the protocol layer on the sender side and are read by the protocol layer on the receiving side, since the path up through the layers on the receiver should be the same as the path down through the layers on the sender.

Look at the Ethernet and IP headers of a download packet in detail to answer the following questions:

Q4: Which Ethernet header field is the demultiplexing key indicating that the next higher layer is IP? What value is used in this field to indicate IP?

Q5: Which IP header field is the demultiplexing key indicating that the next higher layer is TCP? What value is used in this field to indicate TCP?

Optional No-credit Bonus: Why doesn't TCP's header contain a demultiplexing key? How can TCP know to deliver the data to HTTP on the receiving side?