Project 3: Bufferbloat

Turnin: Online
Teams: Teams of 1 or 2
Due: June 8, 2022 @ 11:59PM PDT.

Overview

In this project, we will use Mininet to study the bufferbloat phenomenon. We will compare the performance of TCP Reno and TCP BBR over a network with slow uplink connection.

For part 1, you will set up a mininet VM and clone the starter code for running the experiment. For part 2, you will implement a mininet network in which the nodes connect over TCP Reno connections. After you complete the “TODO” fields in the skeleton code, you will generate experiment results using the framework, and answer the related questions. For part 3, you will rerun the experiment using TCP BBR.

Background

For general background on Bufferbloat, look at the following article: BufferBloat: What’s Wrong with the Internet?.
View this refresher on how to work with the Mininet environment and the more in-depth Mininet Wiki.
and the official documentation of the Mininet Python API.

Introduction

In this project we will study the dynamics of TCP in home networks. Take a look at the figure below which shows a “typical” home network with a Home Router connected to an end host. The Home Router is connected via Cable or DSL to a Headend router at the Internet access provider’s office. We are going to study what happens when we download data from a remote server to the End Host in this home network.

A network diagram of the Home network host connected to a home router which is connected using a DSL cable to the Headend router in the cloud connected to the target server.

In a real network it’s hard to measure cwnd (because it’s private to the server) and the buffer occupancy (because it’s private to the router). To ease our measurement, we are going to emulate the network in Mininet.

Goals

Learn first-hand the dynamics of TCP sawtooth and router buffer occupancy in a network.
Learn why large router buffers can lead to poor performance. This problem is often called “bufferbloat”.
Learn the difference between TCP Reno and TCP BBR and how they perform compared to the other.
Learn how to use Mininet to run reproducible experiments with a traffic generator, collect statistics, and plot them.
Practice packaging your experiments so it’s easy for others to run your code.

Part 1: Setup

This project re-uses your existing mininet environment from projects 1 and 2. If you need to re-create it, follow the instructions here to set up the Vagrant VM. This VM is based off Ubuntu 20.04 and includes a default set of mininet binaries and example scripts.

You will need to make the following modifications to the environment:

As the vagrant user in the vagrant home directory of the virtual machine, clone the new starter code repository:

cd ~/
git clone https://gitlab.cs.washington.edu/561p-course-staff/project-3-starter project-3

Starter Code

Do not run the starter code immediately unless you fill in code to create a topology in the class BBTopo. Otherwise, it will fail.

In the folder, you should find the files below. bufferbloat.py and run.sh are the only files you need to modify.

File	Purpose
run.sh	Runs the experiment and generates all graphs in one go.
bufferbloat.py	Creates the topology, measures cwnd, queue sizes and RTTs and spawns a webserver.
monitor.py	Monitor the queue length
plot_queue.py	Plots the queue occupancy at the bottleneck router.
plot_ping.py	Parses and plots the RTT reported by ping.
plot_defaults.py	Utility functions for creating pretty plots
helper.py	Utility functions for plot scripts
webserver.py	The script that starts the server
index.html	The index file for our server
README.md	Where you will write the instructions of the scripts and answers to the questions.

We will be using python3 for this project. Install the necessary packages using the commands below in the virtual machine:

sudo apt-get update
sudo apt install python3-pip
sudo python3 -m pip install mininet matplotlib

Part 2: TCP Reno

Within Mininet, create the following topology. Here h1 is your home computer that has a fast connection (1Gb/s) to your home router with a slow uplink connection (1.5Mb/s). The round-trip propagation delay, or the minimum RTT between h1 and h2 is 20ms. The router buffer size can hold 100 full sized ethernet frames (about 150kB with an MTU of 1500 bytes).

Mininet topology showing the h1 home computer that has a fast connection to home router with a slow uplink connection to h2.

Then do the following:

Start a long lived TCP flow sending data from h1 to h2. Use iperf/iperf3.
Start back-to-back ping train from h1 to h2 10 times a second and record the RTTs.
Plot the time series of the following:
- The RTT reported by ping
- Queue size at the bottleneck
Spawn a webserver on h1. Periodically download the index.html web page (three times every five seconds, each time wait for the previous one to finish) from h1 and measure how long it takes to fetch it (on average). The starter code has some hints on how to do this. Make sure that 1) the webpage download data is going in the same direction as the long-lived flow and 2) the curl command successfully fetches the webpage.
The long lived flow, ping train, and webserver downloads should all be happening simultaneously.

Repeat the above experiment and replot all two graphs with a smaller router buffer size (Q=20 packets).

Note:

Always run the script using sudo sudo ./run.sh

If your Mininet script does not exit cleanly due to an error (or if you pressed Control-C), you may want to issue a clean command sudo mn -c before you start Mininet again.

Part 2 Questions

Include your answers to the following questions in your README file. Remember to keep answers brief.

What is the average webpage fetch time and its standard deviation when q=20 and q=100?
Why do you see a difference in webpage fetch times with short and large router buffers? What about the increased buffer size leads to the change in observed application-level load time?
Bufferbloat can occur in other places such as your network interface card (NIC). Check the output of ifconfig eth0 of your mininet VM. What is the (maximum) transmit queue length on the network interface reported by ifconfig? For this queue size, if you assume the queue drains at 100Mb/s, what is the maximum time a packet might wait in the queue before it leaves the NIC?

You can run ifconfig in the mininet shell by inserting CLI(net) in your script and run it. When the shell launches, for example, run h1 ifconfig to run the command on host h1.

How does the RTT reported by ping vary with the queue size? Describe the relation between the two.
Identify and describe two ways to mitigate the bufferbloat problem.

Part 3: TCP BBR

In this part, we will try to mitigate the bufferbloat problem by using TCP BBR. TCP BBR is a TCP congestion control algorithm developed by Google in 2016. It uses Bottleneck Bandwidth and Round-trip propagation time as an indicator of congestion, in contrast to other loss-based algorithms which use packet loss.

Create a new copy of run.sh and call it run_bbr.sh.
Modify the shell script to pass bbr as an argument to bufferbloat.py. It accepts --cong for congestion control algorithm. In part 2, TCP Reno is the default value. Also, do not forget to modify the plot file names.
Run the script and answer the questions below.

Part 3 Questions

What is the average webpage fetch time and its standard deviation when q=20 and q=100?
Compare the webpage fetch time between q=20 and q=100 from Part 3. Which queue length gives a lower fetch time? How is this different from Part 2?
Do you see the difference in the queue size graphs from Part 2 and Part 3? Give a brief explanation for the result you see.
Do you think we have completely solved the bufferbloat problem? Explain your reasoning.

Deliverables

Final Code: Remember one of the goals of this assignment is for you to build a system that is easy to type run to reproduce results. Therefore, your final code MUST be runnable as a single shell command (sudo ./run.sh and sudo ./run_bbr.sh). Please include all the given files in your zip.
README: A file named README.md with instructions to reproduce the results as well as the answers to the questions in the previous section. Please identify your answers with the question number, and please keep your answers brief.
Plots: There should be 8 plots in total, 4 for part 2 and 4 for part 3. For each part, there are 2 for router buffer sizes 100 and 20 packets. They MUST have the following names and be in the top level directory of your submission folder.
- reno-buffer-q100.png, reno-rtt-q100.png
- reno-buffer-q20.png, reno-rtt-q20.png
- bbr-buffer-q100.png, bbr-rtt-q100.png
- bbr-buffer-q20.png, bbr-rtt-q20.png

Submission

Archive all the materials (your project-3 directory and everything in it) in a single .zip file named partner1netid_partner2netid.zip.
(Optional) If you’re submitting any extensions, make sure you submit them in addition to your regular submission materials as additional files in your archive, not overwriting the regular parts of the assignment!
Submit the partner1netid_partner2netid.zip file to Gradescope, and be sure to add all your group members to the gradescope submission!

Extension 1: Packet loss, delay, and jitter

** Reminder: ** This part of the assignment is for intellectual curiosity only, and is neither part of the grade nor extra credit. As an extension it’s less fully documented than the main assignment, so please ask questions as needed!

The standard assignment focuses on understanding how to measure bufferbloat and its impact on TCP performance in standard wired network links. With wireless links, there is often additional packet loss and variance in delay (jitter) which can have a big impact on the performance of different end-to-end congestion control protocols.

Using the configuration options for the TCLink provided by the underlying TCInterface, generate a plot of the performance across several levels of packet loss rates, delays, and jitter rates. Which protocol is more performant in the presence of increased random loss? Which is more performant in the presence of jitter? Which is more performant in the presence of a large end-to-end delay?

Deliverables

An extension1_bufferbloat.py file setting up your topologies with loss, jitter, and delay.
An extension1_run.sh script for generating performance plots across variable loss, jitter, and delay.
Answers to the high-level questions above in an extension 1 section of your README.md file.

Extension 2: A more realistic link model

While the basic loss, delay, and jitter configurations in mininet let you roughly approximate a non-ideal link, they don’t capture the loss correlation present in today’s high-bandwidth wireless links. Fortunately the linux kernel contains support for specifying more complex behavior that you can bring into your mininet network!

The netem traffic control queuing discipline allows you to specify correlated loss, reordering, and slotted delay, in addition to more basic random loss, jitter, and delay. You can use the .tc() method of the mininet TCLink to directly call into the linux traffic-control subsystem and setup a more complex netem-based link.

Using TC, replace the random loss qdisc from extension 1 with a qdisc with bursty loss (using either the loss state or loss gemodel loss configurations) and slotted delay. Examine the impact of bursty loss on the performance of tcp-reno and tcp-bbr, as well as the impact of a realistic (for LTE) slotted delay parameter of 10ms.

You might need to consult the man pages for tc (man tc) and netem (man tc-netem) for information, and can experiment outside mininet to learn how the tools work.

Note: Even fully configured netem in mininet is only a coarse approximation of the actual behavior of real-world wireless links. For higher-fidelity testing there are actual wireless emulators available both for network simulation environments like NS3, or as standalone hardware.

Deliverables

An extension2_bufferbloat.py file setting up your topologies with correlated loss and slotted delay.
An extension2_run.sh script for generating performance plots across variable correlated loss and delays.
Your general thoughts on the relative performance of tcp-bbr and tcp-reno in these different conditions in an extension 2 section of your README.md file.