CSE 551 Spring 2004 Project Ideas

CSE 551 Project Ideas, Spring 2004

[Almost by definition, any idea of your own trumps any idea of ours! Don't feel constrained by these!]

Core OS ideas

(hard) Look into Linux and hyperthreading. This would involve getting your hands on a hyperthreading-enabled machine, running Linux on it, and doing some kind of performance analysis of some discrete chunk of the OS to see how it interacts with hyperthreading. If possible, propose and evaluate code changes to better take advantage to hyperthreading. Here is a decent introduction to hyperthreading.
(hard) In the 1990s, there were several papers published that examined the performance bottlenecks of operating systems and how architectural and technology trends affected them:
- Anderson, Levy, Bershad, and Lazowska. "The Interaction of Architecture and Operating System Design", ASPLOS 1991.
- Ousterhout. "Why Aren't Operating Systems Getting Faster As Fast As Hardware?", USENIX Summer Conference, 1990.
- Brown and Seltzer. "Operating System Benchmarking in the Wake of LMbench: A Case Study of the Performance of NetBSD on the Intel x86 Architecture", 1997 SIGMETRICS.
- Bradley Chen et al. "The Measured Performance of Personal Computer Operating Systems", ACM TOCS 1995.
Repeat an aspect of one of these studies on modern hardware, and come to conclusions about the performance of today's operating systems on today's hardware.
Analyze the "structure" of some large pieces of software in an attempt to draw some conclusions about the cleanliness of architecture, the degree of modularity, and the degree of code reuse. Here are some ways you might analyze the structure:
- Use static analysis to build a function call-graph, and analyze some properties of the graph.
- If the program is written in C, use the graph of header file #includes as a proxy for graph structure.
- Use some sort of diff or grep like tool to look for snippets of cut-and-paste code repeated throughout the codebase to look for poor coding instances. You could also look for instances of cut-and-pasting across different code bases.
Some software you might analyze this way: the linux kernel, the apache web server, the NetBSD kernel.

Distributed systems and networking ideas

Design and evaluate a better transfer protocol for distributing large P2P objects. Ideas you can incorporate are: swarming (transferring pieces of the file from many sources in parallel), erasure or tornado coding (making it so you don't care which pieces you grab, only that you grab "enough"), chaining (a "bucket brigade" such that somebody who downloads a block hands it off to somebody else that wants it), and possibly scheduling (making people who share interests wait a little while so they can organize into an overlay multicast tree (an advanced bucket brigade). Evaluate using simulation, emulation on Emulab, or measurement of a real implementation on PlanetLab.
Play around with stp, the safe transport protocols in cyclone framework described in the "Upgrading Transport Protocols using Mobile Code" SOSP 2003 paper. There are bound to be many os issues that come up implementing transports in the kernel, and the cyclone angle may be interesting. Practically, building any transport variations would be good, e.g., tcp + or - feature X. Some hypotheses you might explore:
- code size: tcp is bloated. One hypothesis is that this is so primarily because each side needs to be able to handle whatever set of functionality the other possesses, e.g., maybe it will do timestamps, maybe it won't. STP uses a setup protocol to get two sides with the same code. Does that mean that the code can be much smaller/simpler? Try it for some tcp variations to see how much gunk you can strip out.
- TCP w/ fine-grained failover. Design a mechanism to allow an ongoing tcp connection to work well in a situation in which the packets are "rerouted" through a proxy using some kind of source routing plus encapsulation. (The idea here is to deflect packets through an alternate route if your default route suffers some kind of path failure.) tcp-migrate is one choice, but i think there are other designs that might not involve breaking and restablishing an end-to-end TCP connection.
- Rethink the "congestion manager" paper from SIGCOMM 99, taking advantage of stp.

Wild ideas

Figure out if you can apply call-graph analysis ideas or backtracking ideas to design a system that detects patterns of host and network behavior that correspond to spyware. For instance, you might look for patterns of network activity in which an HTTP page request (e.g., to Yahoo or Google) is immediately followed with an HTTP page request that contains a certain pattern (i.e., a spyware program reporting or requesting a signature). Or, you might look for data flowing from "susceptible" programs such as IE, Netscape, or the login prompt to some other program that then communicates over the network sometime later. Some papers that contain call-graph analysis and backtracking techniques:
- the "Backtracking Intrustions" paper from SOSP 2003.
- the "Performance Debugging for Distributed Systems of Black Boxes" paper from SOSP 2003.
- the "Magpie" project by Rebecca Isaacs et al.
- the "Pinpoint" work by Mike Chen et al.
Build a public domain benchmarking/test tool for intrusion detection systems. Collect a historical list of attacks that have occurred in the past, and build an engine that, given a "signature", squirts traffic that looks like that attack at a designated host. You should be able to find and "reverse" signatures from existing IDS systems to seed your tool.

Quarter-baked ideas

Do something using PlanetLab.
Do something with wireless (802.11, mobile nodes, etc.)
(risky, since you're dealing with somebody else's research code.) Dream up, implement, and evaluate some novel "virtual machine service" using the microDenali virtual machine monitor. See the NSDI 2004 paper on microDenali for more details on what I mean by virtual machine service. Alternatively, you can use the open-source Xen virtual machine monitor from the University of Cambridge. Some potential service areas:
- Wrap an untrusted piece of code in a VM. For example, write a modified mail server (IMAP or pop daemon) that looks for attachments, strips them out of email, and plunks them into their own private VM that the mail client can download and run. Challenges: incorporating copy-on-write or hash-caching techniques to minimize the size of the image sent between the mail server and client, figuring out how to help the user move the attachment between the sandbox VM and their real environment once the attachment is deemed safe, etc.
- Write the "file-system trace" virtual disk. In a nutshell, make it possible to run an OS plus apps in a VM, and to have the virtual disk of the VM log all disk activity. Then, use this log to perform some sort of experiment, such as perhaps figuring out how to optimize the layout of the virtual disk on the physical disk.
- Write the "speculate and commit/undo" VM service. Users often want to try out some risky activity (such as installing a new program, installing a new kernel image, trying some new loadable kernel module, migrating from one DB format to another, etc.), see whether or not it works by playing with it for a while, and then "committing" of it works and "undoing" if it doesn't. Use the ability to perform VM checkpoints to build a service and user interface that allows users to do cvs-style check-in/check-out and checkpoint-tree management of their entire system. Users should, for example, be able to simultaneously boot and compare two different "forks" of their system, and pick which to keep.

Idea generator

When all else fails, this is how to choose a topic