CSE 551 Project Ideas, Spring 2004
[Almost by definition, any idea of your own trumps any idea of ours!
Don't feel constrained by these!]
Core OS ideas
- (hard) Look into Linux and hyperthreading. This would involve getting your hands on
a hyperthreading-enabled machine, running Linux on it, and doing
some kind of performance analysis of some discrete chunk of the OS
to see how it interacts with hyperthreading. If possible, propose and evaluate
code changes to better take advantage to hyperthreading. Here
is a decent introduction to hyperthreading.
- (hard) In the 1990s, there were several papers published that
examined the performance bottlenecks of operating systems
and how architectural and technology trends affected them:
- Anderson, Levy, Bershad, and Lazowska. "The Interaction
of Architecture and Operating System Design", ASPLOS 1991.
- Ousterhout. "Why Aren't Operating Systems Getting Faster
As Fast As Hardware?", USENIX Summer Conference, 1990.
- Brown and Seltzer. "Operating System Benchmarking in
the Wake of LMbench: A Case Study of the Performance of
NetBSD on the Intel x86 Architecture", 1997 SIGMETRICS.
- Bradley Chen et al. "The Measured Performance of
Personal Computer Operating Systems", ACM TOCS 1995.
Repeat an aspect of one of these studies on modern hardware,
and come to conclusions about the performance of today's
operating systems on today's hardware.
- Analyze the "structure" of some large pieces of software
in an attempt to draw some conclusions about the cleanliness of
architecture, the degree of modularity, and the degree of code
reuse. Here are some ways you might analyze the structure:
- Use static analysis to build a function call-graph, and
analyze some properties of the graph.
- If the program is written in C, use the graph of header
file #includes as a proxy for graph structure.
- Use some sort of diff or grep like tool to look for
snippets of cut-and-paste code repeated throughout the
codebase to look for poor coding instances. You could also
look for instances of cut-and-pasting across different code
bases.
Some software you might analyze this way: the linux kernel,
the apache web server, the NetBSD kernel.
Distributed systems and networking ideas
- Design and evaluate a better transfer protocol for distributing
large P2P objects. Ideas you can incorporate are: swarming
(transferring pieces of the file from many sources in parallel),
erasure or tornado coding (making it so you don't care which pieces
you grab, only that you grab "enough"), chaining (a "bucket brigade"
such that somebody who downloads a block hands it off to somebody
else that wants it), and possibly scheduling (making people who
share interests wait a little while so they can organize into an
overlay multicast tree (an advanced bucket brigade). Evaluate using
simulation, emulation on Emulab, or measurement of a real
implementation on PlanetLab.
- Play around with stp, the safe transport protocols in cyclone
framework described in the "Upgrading Transport Protocols using
Mobile Code" SOSP 2003 paper. There are bound to be many os issues
that come up implementing transports in the kernel, and the cyclone
angle may be interesting. Practically, building any transport
variations would be good, e.g., tcp + or - feature X. Some
hypotheses you might explore:
- code size: tcp is bloated. One hypothesis is that this is so
primarily because each side needs to be able to handle whatever
set of functionality the other possesses, e.g., maybe it will do
timestamps, maybe it won't. STP uses a setup protocol to get two
sides with the same code. Does that mean that the code can be
much smaller/simpler? Try it for some tcp variations to see how
much gunk you can strip out.
- TCP w/ fine-grained failover. Design a mechanism to allow an
ongoing tcp connection to work well in a situation in which the
packets are "rerouted" through a proxy using some kind of source
routing plus encapsulation. (The idea here is to deflect packets
through an alternate route if your default route suffers some
kind of path failure.) tcp-migrate is one choice, but i think
there are other designs that might not involve breaking and
restablishing an end-to-end TCP connection.
- Rethink the "congestion manager" paper from SIGCOMM 99,
taking advantage of stp.
Wild ideas
- Figure out if you can apply call-graph analysis ideas or
backtracking ideas to design a system that detects patterns
of host and network behavior that correspond to spyware.
For instance, you might look for patterns of network activity in
which an HTTP page request (e.g., to Yahoo or Google) is immediately
followed with an HTTP page request that contains a certain pattern
(i.e., a spyware program reporting or requesting a signature). Or,
you might look for data flowing from "susceptible" programs such
as IE, Netscape, or the login prompt to some other program that then
communicates over the network sometime later. Some papers that
contain call-graph analysis and backtracking techniques:
- the "Backtracking Intrustions" paper from SOSP 2003.
- the "Performance Debugging for Distributed Systems of
Black Boxes" paper from SOSP 2003.
- the "Magpie" project by Rebecca Isaacs et al.
- the "Pinpoint" work by Mike Chen et al.
- Build a public domain benchmarking/test tool for intrusion
detection systems. Collect a historical list of attacks that
have occurred in the past, and build an engine that, given a
"signature", squirts traffic that looks like that attack at
a designated host. You should be able to find and "reverse"
signatures from existing IDS systems to seed your tool.
Quarter-baked ideas
- Do something using PlanetLab.
- Do something with wireless (802.11, mobile nodes, etc.)
- (risky, since you're dealing with somebody else's research code.) Dream up, implement, and evaluate some novel "virtual machine
service" using the microDenali virtual machine monitor. See the
NSDI 2004 paper on microDenali for more details on what I mean by
virtual machine service. Alternatively, you can use the open-source
Xen virtual machine monitor from the University of Cambridge.
Some potential service areas:
- Wrap an untrusted piece of code in a VM. For example, write
a modified mail server (IMAP or pop daemon) that looks for
attachments, strips them out of email, and plunks them into their
own private VM that the mail client can download and run.
Challenges: incorporating copy-on-write or hash-caching
techniques to minimize the size of the image sent between the
mail server and client, figuring out how to help the user
move the attachment between the sandbox VM and their real
environment once the attachment is deemed safe, etc.
- Write the "file-system trace" virtual disk. In a nutshell,
make it possible to run an OS plus apps in a VM, and to have
the virtual disk of the VM log all disk activity. Then, use
this log to perform some sort of experiment, such as perhaps
figuring out how to optimize the layout of the virtual disk
on the physical disk.
- Write the "speculate and commit/undo" VM service. Users often
want to try out some risky activity (such as installing a new
program, installing a new kernel image, trying some new loadable
kernel module, migrating from one DB format to another, etc.),
see whether or not it works by playing with it for a while,
and then "committing" of it works and "undoing" if it doesn't.
Use the ability to perform VM checkpoints to build a service and
user interface that allows users to do cvs-style check-in/check-out
and checkpoint-tree management of their entire system. Users should,
for example, be able to simultaneously boot and compare two different
"forks" of their system, and pick which to keep.
Idea generator