Project 2: Programming with threads

Out: Monday, October 16th
Due: Wednesday, November 1st at noon

This assignment is designed to give you experience working with Java threads in a wide variety of scenarios. This project does not involve VMWare. You should do this project in your project teams.

Start early! This project does not require a huge amount of code, but it does require some background reading and a good bit of thinking.

Preleminaries

For this project, you can use any Linux machine, as long as it is a multi-processor . How can you tell if a machine is a multi-processor? The file /proc/cpuinfo has information about the processor(s) on a system. You can list its contents using the command 'more /proc/cpuinfo'.

You have all been given accounts on a 4-way multiprocessor, amlia.cs.washington.edu. In addition, most or all lab Linux machines are multi-processors.

To get started, first download the scaffolding files . Use the command 'tar xvf os-project2.tar' to unpack this tar file. There are four sub-directories under src/, corresponding to the four parts of the assignment (the jars directory contains helper functionality; you do not need to look in there). Each sub-directory has a makefile, which supports the "make" target (for building code) and the "make run" target (for running code). Also, 'make clean' deletes .class files and emacs backup files.

Note that any new Java files will be automatically compiled. You should not need to modify the Makefiles.

Requirements

To get a good grade on this project, your programs must use appropriate synchronization to ensure thread-safe operation. Remember, a program is not thread-safe unless it is correct under all possible schedule interleavings . In other words, no matter what the scheduler does, your program must do the right thing. This is why writing concurrent programs is so difficult.

Note that bugs in concurrent programs are often called Heisenbugs . This is because they have an annoying tendency to disappear or mutate under inspection. This is why it is particularly important to get the program right the first time.

How do you write thread-safe programs? A good start is to follow the two rules we laid down in class: 1) Any shared, mutable state must be properly synchronized; and 2) Compound actions must be protected by a single lock.

Part 1: Semaphores

A semaphore is a well-known synchronization primitive, which is heavily used throughout the Linux kernel. A semaphore represents a thread-safe integer, which supports methods for incrementing and decrementing its value (up and down, respectively). In addition, any attempt to decrement the semaphore below zero will cause the decrementing thread to block. Incrementing the semaphore causes a single sleeping thread to awaken. You can read more about semaphores in Silbershatz section 6.5 (note that the author refers to "up" and "down" as "signal" and "wait").

Your assignment for part 1 is to implement semaphores using built-in Java functionality (e.g., locks and condition variables). You can use any feature of Java, except java.util.concurrent.Semaphore (or, any other off-the-shelf implementation of Semaphores). However, the Java documentation does provide some useful documentation on Semaphore behavior.

For this part, the scaffolding code in Semaphore.java provides the semaphore interface, which you are required to fill in. In addition, the make run command invokes a simple test routine. Note that the semaphore in this example should behave exactly like a lock (i.e., only one thread in the critical section at a time).

Part 2: Approximating π

For this problem, you will approximate the mathematical constant π using a simple, randomized algorithm. You will measure the performance of the algorithm using a variable number of threads.

Consider a circle or radius r inscribed in a square:

From basic geometry, we know the areas of the circle and the square (π r^2 and 4 r^2, respectively).

In addition, we can use a large number of random points to estimate the ratio of the circle's area to the square's area, as shown in the following pseudocode:

int pointsInCircle = 0;
for i=0..NUM_SAMPLES {
  Point p = chooseRandomPointInSquare();
  if (p.insideCircle())
    pointsInCircle++;
}

float areaRatio = (float) pointsInCircle / NUM_SAMPLES;

Step 1: Calculate π

Using the information provided above, write a multi-threaded program that approximates π. The program should function by generating a large number of sample points. Each thread should perform some subset of the total work. Your program should replace the stub code in StubCalculator.java:

public double calculate(int numSamples, int numThreads) {
    return 0.0; // TODO: implement this!
}

We will test your π calculator by invoking 'make test'. The test functionality is contained in PiCalculatorTest.java. This allows you to use the 'make run' target for the benchmarking step.

Step 2: Benchmark

Finally, benchmark your π calculator using a variable number of threads. You should submit a writeup that includes a graph with threads on the x-axis and latency on the y-axis. The number of threads should range from 1 to roughly 10.

Key point: The number of random samples should be a fixed constant that does not depend on the number of threads. So, with K samples and N threads, each thread does K/N random samples.

In addition, your writeup should provide an explanation of this graph. In particular, you should explain your results in terms of Amdahl's law (what are N and F for this experiment?).

Hints

I would start by implementing a single-threaded pi calculator. You can do this by simply ignoring the numThreads argument. Once you have this working, you can parallelize the work across multiple threads.
You may need a lot of samples to get a good estimate. Start with around 20 million.
You should build on Java's random number generation in java.util.Random
...and Java timer functionality in System.currentTimeMillis()
If you are running on a shared machine, make sure your results are not skewed by someone else's experiment. One way to guard against this is to run the UNIX top application, which shows which processes are consuming CPU resources.
For processor information, read /proc/cpuinfo on Linux.

Part 3: Multi-threaded web server

Web servers are frequently implemented as multi-threaded applications. This is done for two reasons. First, multiple threads allow the web server to utilize multiple processors. Second, using threads provides a convenient way to allow each processor to handle multiple requests in parallel. For the latter reason, it makes sense to run a multi-threaded web server, even on a uniprocessor machine.

For general web server background, you should look at the following overview . Note that the web server code in scaffolding is derived from this web server.

For this part of the project, we provide you with a single-threaded web server. You are required to convert it into a multi-threaded web server. In addition, you are required to add implement a "statistics" page, which provides visibility into the web server's behavior.

Web server operation

To run the web server, execute 'make run' in the web server directory. By default, the web server runs on port 8080. So, you should be able to access the homepage at the following URL:

http://hostname:8080/

... where "hostname" is replaced by the name of the host you are running on. If you are running on the same machine as another team, you may encounter a port conflict. In this case, simply change your port number to something else. I would make sure this works before doing anything else.

The web server serves files from the webroot/ subdirectory. You can add files here if you wish. Caveats: the web server is primitive: it only supports text files and html files. Also, any file larger than 64K will be truncated.

Step 1: A multi-threaded web server

Your first step is to convert the single-threaded web server into a multi-threaded web server. Each thread should be capable of handling a request. That is, each worker thread should execute the request processing functionality, which is currently defined in HttpServer's await method.

Your solution should utilize a static thread pool , which is created at system startup. After the system starts up, you should not create (or destroy) any threads. For benchmarking (described below), the size of the thread pool should be easily adjustable. How you do this is up to you.

Step 2: Add a "statistics" page

Internet companies like Amazon need visibility into the health of their machines. For example, an operator should be able to determine if a machine is up or down, and whether the machine is responding to user requests.

For this step, you will add a bit of visibility into the operation of the webserver. The /STATS URL has been coded to return a special "statistics" page (see Response.java). You need to augment this page to return the following information:

Request rate over the previous 10 seconds
The number of active threads (threads working on a request)
The number of idle threads
Anything else you think would be useful

Recall that we used timer functionality in the lecture on condition variables. You can also utilize java.util.Timer . Note that in either case, timers run in a seperate thread, and appropriate synchronization is required.

Step 3: Benchmark

Using the web benchmark described below, measure the throughput and response time for the web server. For your tests, you should vary the number of worker threads from 1 to roughly 20 (you do not need to test every value in this range). You can use any file for the workload (/index.html is just fine). You should use 25 simulated clients (-n 25).

You should submit a writeup that includes a graph with threads on the x-axis and throughput (in bytes/sec or req/sec) on the y-axis. For output, you can use the web benchmark output, or you can utilize the statistics information you implemented in step 2. Explain the results! In particular, you should discuss how many threads the web server can utilize, as compared to the pi calculator application.

Using the Web Benchmark

The WebStone web benchmark tool is installed on forkbomb as /cse451/projects/tools/webclient. It measures the throughput and latency of a webserver under varying loads. It simulates a number of active clients, all sending requests in parallel (it does this by forking several times). Each client requests a set of files, possibly looping through that set multiple times. When the test is complete, each client outputs the connection delay, response time, bytes transfered, and throughput it observed (depending on the server, the clients may all observe very similar results, or the data may vary widely). The tool takes the following arguments:

-n CLIENTS specify the number of parallel clients to simulate.
-l LOOPS specify the number of times each client should fetch the files.
-w SERVER specify the hostname of the webserver (when sioux is started, it prints a URL including the hostname).
-p PORT specify the port number (when sioux is started, it prints a port number).
-u URLLIST specify a file containing a list of URLs to fetch, one per line, each followed by a weight (e.g. 1).

All of the above parameters are required. The URLLIST file should contain one relative URL (just the file name) per line followed by a space and then a number (the number is the weight representing how often to request the file relative to other files in the list - most likely, 1 is the value you want).

For example, to test a webserver on almond.cs.washington.edu, with two simulated clients each fetching the index.html file twice, one would run:

/cse451/projects/tools/bin/webclient -w almond.cs.washington.edu -p 12703 -l 2 -n 2 -u ./urls

Where the file urls would contain the following single line:

/index.html 1

Part 4: GUI Hacking

Graphical user interface frameworks are usually single-threaded. This means that all GUI-related processing is done in a single thread. Unfortunately, this design can cause graphical applications to become unresponsive if there are long-running computations that occur in response to GUI events. For example, a very long spell-check operation could cause a word processor to freeze up. For this reason, long-lived tasks are usually deferred to a separate thread. This improves responsiveness, but introduces synchronization issues.

For this part of the project, we provide you with a basic GUI which calculates the first N fibonacci numbers. Your task is to implement the "cancel" button, which stops a long-lived Fibonacci computation in its tracks. This is trickier than it sounds, because it requires digging into the details of how Java's GUI framework (called Swing) manages threads and concurrency.

GUI operation

If you are Linux remotely from a Windows machine, make sure Reflection is running. Next, run the existing GUI by typing "make run" in the fibogui directory. Try typing in a small number like 5 and pressing return. This should print out the first five fibonacci numbers. Next, try typing in a large number like 45. You should notice that the GUI becomes completely unresponsive. In fact, you may need to kill the application from the command-line with ctrl-c.

Your job is to implement the cancel button. Task cancellation should stop the current computation, but should display the results that have been obtained so far. In other words, don't throw away work that has been done; it is OK to display a partial result on the screen.

Added 10/16/06: Your program should only allow one active Fibonacci calculation. A new calculation can start only when the previous calculation has ended (or has been cancelled).

(You might notice that the implementation of Fibonnaci is particularly slow. This was done intentionally to make the system do more work. You shouldn't try to optimize the provided fibonacci algorithm. This is operating systems, not algorithms...)

(BTW, no fair making fun of the GUI. This is operating systems, not HCI...)

How to get started

To get started, you should read Sun's documentation on thread-safe GUIs. In particular, you should understand the method SwingUtilities.invokeLater. What does it do, and why is it necessary? Only once you deeply understand this should you attempt to modify code.

In addition, you will need to understand how thread interruption works in Java. Read Sun's documentation, and ask your TAs or instructor if you have questions.

Bonus: Incremental Display

The current fibogui displays its results all at once. Modify the GUI to display results incrementally (as they are calculated). In this case, the cancel button should simply stop computing new numbers.

What to handin

You should handin the src/ directory, which contains your modified Java files. You do not need to turn in the .class files (although it is not a problem if you do). In addition, you should submit a writeup that contains an analysis of the π calculator and the multi-threaded web server. This writeup can be in Word format, pdf format, or plain text. If you have graphs in separate files, please indicate this at the top of your writeup.

We will test your solutions by invoking 'make run'. So, make sure this does something sensible for each of your solutions. Exception: we will test the pi-calculator by invoking 'make test'.

Grading

Semaphores: 20 points
π calculator: 25 points
Web server: 30 points
Fibonacci GUI: 25 points (plus 10 possible bonus points)