CSEP551 -- Programming Assignment #2
Out: Thursday October 15th, 2009
Due: Thursday November 5th, 2009, before class
[
overview |
details |
bonus |
partners |
what to turn in
]
Overview
In this assignment, you will investigate the impact of system call
overhead on application performance. Your ultimate goal is to produce
a set of graphs that looks something like the following:
This graph shows some application-level benchmark performance as a
function of system call latency. To generate this graph, you will
modify the linux kernel so that you can add a specifiable amount of
overhead to every system call. By varying this overhead, you can
measure benchmark performance as a function of this overhead to
generate the curve.
Choice of OS: Note that this assignment
is written with the assumption that you'll modify the Linux kernel.
If you'd prefer to do this assignment on another OS, such as Windows,
you are welcome to do this. However, you will need to be careful of
intellectual property and confidentiality issues. If you modify
Windows kernel code, instead of submitting your code patches, please
instead give us a little more written explanation of how your code
patches work and the performance or coding complexity implications of
it. Do NOT submit any confidential information. Talk to your
instructor or TA if you have concerns.
Assignment details
Here are the steps we recommend that you should follow:
- Set up an environment in which you can install linux and compile
linux kernels. I strongly recommend using VMware for this. (It is true
that VMware will affect your benchmark performance, but let's ignore
that for this assignment.) You'll need to:
- Find a computer on which you can install VMware. VMware lets
you install VMware player for free. You can also get trial versions
of VMware workstation or VMware server for various environments.
- Install VMware.
- Prepare a Linux virtual machine. I recommend you browse and
download a linux "virtual appliance" from VMware, but
feel free to install Linux from scratch if you feel up to a
challenge. You can get virtual appliances here.
- Download and install the linux kernel source code. You can get
source code for different kernel versions from www.kernel.org.
- Practice compiling and installing a new kernel based on the
kernel source code you downloaded. If you've never
compiled/installed a kernel, you can find plenty of help on the Web,
for example, from here.
- Modify the linux kernel so that you can add overhead to every
system call that occurs. To do this, you'll need to:
- Figure out how system calls work on Linux, and trace them
through the linux kernel source code. There are plenty of Web pages
that should help with this, but as a hint, Linux 2.6 uses the
sysenter and sysexit instructions to do efficient user
to kernel crossings. See, for example, here.
- Add your code to the system call path. You'll have to decide
where to do this; there is a quick way that will require a small
amount of assembly hacking in one location, and a more onerous way
that will require C programming but patching a bunch of files and
routines.
- Your code should introduce a variable amount of overhead,
presumably by doing computation in a loop, and varying the loop
count.
- Make it possible for user-level programs to change the amount
of system call overhead introduced by your code. I suggest that you
add something to the /proc virtual file system to do this -- i.e.,
by writing a number to a file you create in proc, your code will
vary the amount of loop overhead according to that number. There
are plenty of web pages that will help you learn how to add
something to the /proc filesystem; see, for example, here.
- Calibrate your system call overhead by measuring the latency
of a simple system call (e.g., close(100)) as a function of your
variable overhead.
- Measure the performance of the following three benchmarks
as a function of system call overhead:
- compiling the linux kernel source tree
- the maximum throughput of apache serving a small, static file
- something of your choice
Be careful of caching effects when running your benchmarks, in particular,
of the file system buffer cache!
Bonus
If you want some extra credit, find a way to add a similar delay loop
(including another /proc interface) on the ethernet packet reception
and transmission paths. Calibrate this delay loop. Devise an
experiment to measure the impact of this delay on the throughput and
latency of serving a small, static web page from Apache. Devise
another experiment to measure the impact of this delay on the
throughput and latency of serving a large, static web page from
Apache. Explain your results.
Partnering up
You have the choice of doing this assignment solo, or in teams of two.
If you'd like to do it in a team, first try to find your own
teammate. If you can't, then try posting a note to the course
message board seeking a partner. If nobody bites, talk to your
instructor or TA to see if they can help you find a partner.
What to turn in
You should submit your assignment using the following "dropbox"
URL:
https://catalysttools.washington.edu/collectit/dropbox/gribble/7604
Your submission should be a single .tar.gz or .zip file, containing the
following elements:
- a description of your linux environment, including whether you used
VMware, which linux kernel version you modified, etc.
- any source code you generated, including files that you
modified in the linux kernel tree. (If you modified a
proprietary OS and have confidentiality or IP restrictions, skip
this step and do not submit any code.)
- a writeup that includes:
- your name (include both team members if you're working with a partner)
- a brief description of how you added overhead (i.e., explain the design / implementation of your code)
- a brief description of the mechanism you introduced that allows user-level code to vary the system call overhead
- a graph showing the calibration of your introduced system call overhead
- the graphs of your benchmark results, and a description of your measurement method
- a short analysis of the graphs -- why are they shaped the way they are, why are they differently shaped for the different benchmarks, and what high-level lessons do you draw from this?