Project 2: Programming with threads

Out: Friday, January 25th
Due: Wednesday, February 6th at 10:00 AM

This assignment is designed to give you experience working with Java threads in a wide variety of scenarios. This project does not involve VMWare. You should do this project in your project teams.

Start early! This project does not require a huge amount of code, but it does require some background reading and a good bit of thinking.


Preleminaries

For this project, you can use any Linux machine, as long as it is a multi-processor . How can you tell if a machine is a multi-processor? The file /proc/cpuinfo has information about the processor(s) on a system. You can list its contents using the command 'more /proc/cpuinfo'.

You have all been given accounts on a 4-way multiprocessor, amlia.cs.washington.edu. In addition, most or all lab Linux machines are multi-processors.

To get started, first download the scaffolding files . Use the command 'tar xvf os-proj2.tar' to unpack this tar file. There are four sub-directories under src/, corresponding to the four parts of the assignment. Each sub-directory has a makefile, which supports the "make" target (for building code) and the "make run" target (for running code). Also, 'make clean' deletes .class files and emacs backup files.

Note that any new Java files will be automatically compiled. You should not need to modify the Makefiles.

Requirements

To get a good grade on this project, your programs must use appropriate synchronization to ensure thread-safe operation. Remember, a program is not thread-safe unless it is correct under all possible schedule interleavings . In other words, no matter what the scheduler does, your program must do the right thing. This is why writing concurrent programs is so difficult.

Note that bugs in concurrent programs are often called Heisenbugs . This is because they have an annoying tendency to disappear or mutate under inspection. This is why it is particularly important to get the program right the first time.

How do you write thread-safe programs? A good start is to follow the two rules we laid down in class: 1) Any shared, mutable state must be properly synchronized; and 2) Compound actions must be protected by a single lock.

You are free to use any built-in Java functionality. You should check with Andrew or one of the TAs before using any external packages.


Part 1: Two-lock Queue

A major goal when using threads is to improve performance.  Unfortunately, the use of locks can ruin performance by disabling concurrency inside of critical sections.  One notorious example of excessive locking was the Big Kernel Lock, which was used by early Linux systems to run safely on multiple processor machines.  The BKL made Linux safe, but it hurt performance by only allowing a single process to execute in the kernel at at time.

Of course, we can't simply get rid of locks.  But, what we can do in some cases is to reduce the granularity of locks.  It is better to have many small critical sections (each protected by their own lock) rather than one giant critical section (protected by a single lock).  Or, if you prefer, it's better to lock each room individually rather than lock the entire house.

In this problem, you are given a thread-safe queue that uses a single lock for concurrency.  Your task is to decompose that single lock into two locks: one for enqueue and one for dequeue. This improves concurrency: one thread can be enqueuing while another is dequeing.  You must also fix the isEmpty and getSize methods of the SimpleQueue interface.

This exact problem was described in: Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms, by Maged M. Michael and Michael L. Scott.  This is probably worth checking out...

You can execute a simple test routine by executing make run in the twolockqueue directory. Always keep in mind, however, that it is very difficult to exhaustively test non-deterministic programs.  Also, this particular test has the unfortunate effect of superimposing a single global lock.  Bottom line: don't rely on this test (or any test) to prove correctness.


Part 2: Parallel 15 Puzzle Solver

The 15-puzzle is a well-known brainteaser. The goal is to unjumble a set of squares by moving tiles into the available gap.

As computer scientists, we realize that solving the 15-puzzle by hand is time-consuming and tedious.  As such, many search strategies have been applied to this problem.   In this problem, we give you a 15-puzzle solver based on the well-known A* search algorithm.  Your task is to parallelize the program so it can run on multiple processors. 

It is helpful to understand a few details of A* search.  The search itself is brute force: we try all moves, then all successors to those moves, etc. A* uses a strategy called iterative deepening.  At each iteration, we consider all solutions of size N.  If no solution is found, then we move on to consider solutions of size N+1.  This sounds wasteful, and it is, but not in a way that affects the asymptotic runtime of the algorithm.  More details are available online, or just read the code.

Step 1: Multi-threaded Puzzle Solver

Your first task is implement a multi-threaded version of the 15-puzzle solver.  The number of threads to spawn is defined by an argument to the program:

java FifteenPuzzleSolver n

Where n is the number of threads. At present, any argument other than 1 causes the program to crash.  (no argument defaults to 1 thread).   Obviously, it is not enough to simply spawn threads: the threads should do the work of solving the 15-puzzle.  In the benchmark step (described below), you will verify whether the threads had any effect.

Note that the "random" board generator in Board.java uses a fixed random number seed.  This is good, because it allows for repeated testing of the same board.  On my MacBook, the single-threaded algorithm can solve the given puzzle in about a minute.  If it requires much longer, then something has probably gone haywire.

You may assume that a FifteenPuzzleSolver instance is only used once.  Also, it is permissable to have extra "helper" threads, as long as they aren't doing any useful work towards solving the puzzle. 

Clarifiation (added 2/1/08): You should make sure that there are always N active worker threads.  This can be accomplished with a thread pool, or by ensuring that any "dead" worker thread is replaced by a new thread.

Step 2: Benchmark

Your next task is to benchmark the 15-puzzle solver using a variable number of threads.  Your writeup should include a graph with threads on the x-axis and latency on the y-axis. The number of threads should range from 1 to roughly 10.

In addition, your writeup should provide an explanation of this graph. In particular, you should explain your results in terms of Amdahl's law.  Roughly speaking, what are N and F for this experiment?

If you are running on a shared machine, make sure your results are not skewed by someone else's experiment. One way to guard against this is to run the UNIX top application, which shows which processes are consuming CPU resources.

Step 3 (Extra Credit): Implement another solution

For extra credit, you can implement a different approach to parallelizing the 15-puzzle. You should compare the performance of your second approach with the first, and provide an explanation of the differences.

To get the most points, your second technique should be both interesting and different from your first approach.  Changing data structures is not very interesting.  Also, we are primarily concerned with parallelizing the existing algorithm. Don't try to turn this into a CSE421 problem by (for example) inventing a clever way to record and recognize previously computed states.

The writeup is critical for this.  A brilliant idea with a bad writeup will not score well.

If you have any questions, contact Andrew, Kevin, or Marissa.

 


Part 3: Multi-threaded web server

Web servers are frequently implemented as multi-threaded applications. This is done for two reasons. First, multiple threads allow the web server to utilize multiple processors. Second, using threads provides a convenient way to allow each processor to handle multiple requests in parallel. For the latter reason, it makes sense to run a multi-threaded web server, even on a uniprocessor machine.

For general web server background, you should look at the following overview . Note that the web server code in scaffolding is derived from this web server.

For this part of the project, we provide you with a single-threaded web server. You are required to convert it into a multi-threaded web server.

Web server operation

To run the web server, execute 'make run' in the web server directory. By default, the web server runs on port 8080. So, you should be able to access the homepage at the following URL:

http://hostname:8080/

... where "hostname" is replaced by the name of the host you are running on. If you are running on the same machine as another team, you may encounter a port conflict. In this case, simply change your port number to something else. I would make sure this works before doing anything else.

The web server serves files from the webroot/ subdirectory. You can add files here if you wish. Caveats: the web server is primitive: it only supports text files and html files. Also, any file larger than 64K will be truncated.

Step 1: A multi-threaded web server

Your first step is to convert the single-threaded web server into a multi-threaded web server. Each thread should be capable of handling a request. That is, each worker thread should execute the request processing functionality, which is currently defined in HttpServer's await method.

Your solution should utilize a static thread pool , which is created at system startup. After the system starts up, you should not create (or destroy) any threads. For benchmarking (described below), the size of the thread pool should be easily adjustable. How you do this is up to you.

Step 2: Benchmark

Using the web benchmark described below, measure the throughput and response time for the web server. For your tests, you should vary the number of worker threads from 1 to roughly 20 (you do not need to test every value in this range). You can use any file for the workload (/index.html is just fine). You should use 25 simulated clients (-n 25).

You should submit a writeup that includes a graph with threads on the x-axis and throughput (in bytes/sec or req/sec) on the y-axis. For output, you can use the web benchmark output. Explain the results! In particular, you should discuss how many threads the web server can utilize, as compared to the 15-puzzle application.

Using the Web Benchmark

The WebStone web benchmark tool is installed on forkbomb as /cse451/projects/tools/webclient. It measures the throughput and latency of a webserver under varying loads. It simulates a number of active clients, all sending requests in parallel (it does this by forking several times). Each client requests a set of files, possibly looping through that set multiple times. When the test is complete, each client outputs the connection delay, response time, bytes transfered, and throughput it observed (depending on the server, the clients may all observe very similar results, or the data may vary widely). The tool takes the following arguments:

All of the above parameters are required. The URLLIST file should contain one relative URL (just the file name) per line followed by a space and then a number (the number is the weight representing how often to request the file relative to other files in the list - most likely, 1 is the value you want).

For example, to test a webserver on almond.cs.washington.edu, with two simulated clients each fetching the index.html file twice, one would run:

/cse451/projects/tools/webclient -w almond.cs.washington.edu -p 12703 -l 2 -n 2 -u ./urls

Where the file urls would contain the following single line:

/index.html 1


Part 4: GUI Hacking

Graphical user interface frameworks are usually single-threaded. This means that all GUI-related processing is done in a single thread. Unfortunately, this design can cause graphical applications to become unresponsive if there are long-running computations that occur in response to GUI events. For example, a very long spell-check operation could cause a word processor to freeze up. For this reason, long-lived tasks are usually deferred to a separate thread. This improves responsiveness, but introduces synchronization issues.

For this part of the project, we provide you with a basic GUI which calculates the first N fibonacci numbers. Your task is to implement the "cancel" button, which stops a long-lived Fibonacci computation in its tracks. This is trickier than it sounds, because it requires digging into the details of how Java's GUI framework (called Swing) manages threads and concurrency.

GUI operation

If you are Linux remotely from a Windows machine, make sure Reflection is running. Next, run the existing GUI by typing "make run" in the fibogui directory. Try typing in a small number like 5 and pressing return. This should print out the first five fibonacci numbers. Next, try typing in a large number like 45. You should notice that the GUI becomes completely unresponsive. In fact, you may need to kill the application from the command-line with ctrl-c.

(You might notice that the implementation of Fibonnaci is particularly slow. This was done intentionally to make the system do more work. You shouldn't try to optimize the provided fibonacci algorithm. This is operating systems, not algorithms...)

(BTW, no fair making fun of the GUI. This is operating systems, not HCI...)

Part 1: Task cancellation

Your job is to implement the cancel button. Task cancellation should stop the current computation, but should display the results that have been obtained so far. In other words, don't throw away work that has been done; it is OK to display a partial result on the screen.  Your program should only allow one active Fibonacci calculation. A new calculation can start only when the previous calculation has ended (or has been cancelled).

Part 2: Continuous update

The current fibogui displays its results all at once. Modify the GUI to display results incrementally (as they are calculated). In this case, the cancel button should simply stop computing new numbers.

How to get started

To get started, you should read Sun's documentation on thread-safe GUIs. In particular, you should understand the method SwingUtilities.invokeLater. What does it do, and why is it necessary? Only once you deeply understand this should you attempt to modify code.

In addition, you will need to understand how thread interruption works in Java. Read Sun's documentation and ask your TAs or instructor if you have questions.

 


What to handin

You should handin the os-proj2/ directory, which contains your modified Java files. You do not need to turn in the .class files (although it is not a problem if you do).   You can run make clean in each directory to clear out unnecessary files.

In addition, you should submit a writeup that contains an analysis of the 15-puzzle and the multi-threaded web server. This writeup can be in Word format, pdf format, or plain text. If you have graphs in separate files, please indicate this at the top of your writeup.

We will test your solutions by invoking 'make run'.   So, make sure your solutions do something sensible on startup.

Grading

Two-lock Queue : 20 points
15 Puzzle: 30 Points, 10 bonus points
Web server: 25 points
Fibonacci GUI: 25 points