CSE 451 Spring 2003: Project 3

Project 3 - Threads

Administrivia: For this project, we encourage you to work with the same partner you had in the previous one. However, if you'd like to switch partners, you are welcome to do this but you need to let us know by Tuesday, April 22 in an email. Note that for future team projects everyone will be required to find a different partner, with whom they have not worked until then.

Parts 1 and 2:
      Out: Friday, April 18, 2003
      Due: Tuesday, April 29, 11:59 PM
Parts 3, 4 and 5:
      Out: Friday, April 25, 2003
      Due: Tuesday, May 6, 11:59 PM
Writeup:
      Due: Tuesday, May 6, 11:59 PM (electronic) and Wednesday, May 7, beginning of lecture (paper copy)

Tasks:

Implement a user-level thread manager.
Add a set of common synchronization primitives to your thread package.
Build a thread pool in C
Turn a single-threaded server into a multithreaded server

Assignment Goals

To gain understanding of how thread packages are implemented, including data structures used, the interface provided, and common problems they face.
To understand some of the problems with user-level threads.
To make use of common synchronization primitives such as locks and condition variables.
To gain experience in writing and debugging multithreaded code. In particular, you will be exposed to code that can deadlock or have race conditions.
To understand the concept of overlapping I/O and computation.
To practice discussion and analysis of your designs.

Background

As I'm sure you know, the Internet has exploded in popularity over the past decade. Driving much of this explosion is the proliferation of networked servers, which are programs that run "in the guts" of the Internet. Client programs (such as web browsers) communicate with these servers over the network, sending requests and reading back the servers' responses. For example, a web browser will send a web server a request for a particular web page; the server will process this request, and then respond by sending back the page's HTML, which the web browser reads, parses, and displays.

One of the challenges in building Internet services is dealing with concurrency. In particular, many independent clients can simultaneously send requests to a given server. A server therefore must be designed to deal with concurrent requests. There are many different strategies for doing this; we will explore two in this project.

A very simple design strategy is to build the server as a single-threaded program. The server's single thread waits until a new request arrives, reads the request, processes the request, and then writes a response back to the originating client. In the meantime, if another request arrives while the server is processing the first, that other request is ignored until the first has been dealt with fully. An example interaction between a server and two clients is illustrated in the following figure.

A more sophisticated strategy would be to build the server as a multithreaded program. When a request arrives at the server, a thread is dispatched to handle it. If multiple tasks arrive concurrently, then multiple server threads will execute concurrently. Each dispatched thread does exactly what the single-threaded server did previously; it reads the request, processes it, then writes a response:

For the purposes of this example, we assume that the server allows a maximum of 2 threads at a time. As the picture above shows, even though request #2 arrives while request #1 is being processed by the first server thread, the second server thread is available to deal with request #2 concurrently. As a result, while the first server thread is processing request #1 (hence using the CPU), the second server thread can read request #2 over the network. Thus, network I/O and computation are overlapped. This has many beneficial effects: the CPU is more effectively utilized, the throughput of the server is increased, and the response time of the server (as seen by the clients) is decreased.

Programmers realized that they could implement multithreading support entirely at the user level, without modifying the kernel at all. The first part of this assignment will require you to implement the thread manager and locking primitives for a user-level thread package.

As we will see, there have been problems with threads implemented entirely at the user level, which motivated kernel developers to include thread support into the kernel. For the second phase of the assignment, you will be using the thread support provided by the kernel, and not using the package that you created. You will be given the source code to a very simple single-threaded networked server. Your goal is to convert this single-threaded server into a multithreaded server, by building a "thread pool" and integrating it into the server. A thread pool is an object that contains a fixed number of threads and supports two operations: dispatch, which causes one thread from the pool to wake up and enter a specified function, and destroy, which kills off all of the threads in the pool and frees up any memory associated with the pool.

Simplethreads

You will be using the simplethreads package for the first part of the assignment. It is available on spinlock/coredump in /cse451/doug/simplethreads-1.02.tar.gz

This project, since it does not involve modifying the kernel, does not require VMWare. The simplethreads package should work on any Linux 2.4 machine (it will not work on Linux 2.2 due to missing support for a necessary syscall), and other varieties of UNIX that support pthreads as well. (Please do not use tahiti, fiji, ceylon, or sumatra.)

To begin, copy that file to your work directory (/cse451/LOGIN) and run
tar -xvzf simplethreads-1.02.tar.gz
to expand it (if you are working on a different machine, see scp(1)).

Simplethreads contains a lot of files, but most are safe to ignore. Pay attention to:

Dir/File Contents

lib/ The simplethreads thread library itself.

lib/sthread_user.c Your part 1 and part 2 implementations go here.

lib/sthread_ctx.h Support for creating new stacks and switching between them.

lib/sthread_queue.h A simple queue that you may find useful.

include/ Contains sthread.h, the public API to the library
(the functions available for apps using the library).

test/ Test programs for the library.

Dir/File	Contents
`lib/`	The simplethreads thread library itself.
`lib/sthread_user.c`	Your part 1 and part 2 implementations go here.
`lib/sthread_ctx.h`	Support for creating new stacks and switching between them.
`lib/sthread_queue.h`	A simple queue that you may find useful.
`include/`	Contains `sthread.h`, the public API to the library (the functions available for apps using the library).
`test/`	Test programs for the library.

Like many UNIX programs, simplethreads uses a configure script to determine parameters of the machine needed for compilation (in fact, you'll find many UNIX packages follow exactly the same steps as simplethreads for compilation). In the simplethreads-1.02 directory, run ./configure to test all these parameters.

Finally, build the package by typing make. You can also build an individual part, such as the library, by running make in a subdirectory. Or, if you just want to compile one file, run make myfile.o (from within the directory containing myfile.c).

Note that, as you make changes, it is only necessary to repeat the last step (make).

One interesting feature of simplethreads is that applications built using it can use either the native, kernel-provided threads or the user-level threads that you'll implement. When running the test programs, use --sthread-pthread to select kernel threads or --sthread-user to select user threads (they default to kernel threads).

In summary, the steps are:

Copy the tar.gz source to your directory.
tar -xvzf simplethreads-1.02.tar.gz
cd simplethreads-1.02
./configure
make
Run the programs in test/

To Add a Source File

If you add a new source file, do the following:

Edit the Makefile.am in the directory containing the new file, adding it to the _SOURCES for the library/executable the file is a part of. E.g., to add a new file in the lib/ directory that will become part of the sthread library, add it to the libsthread_la_SOURCES line.
From the top-level directory (simplethreads-1.02), run automake.
Also from the top-level directory, run ./configure.
Your file is now added; run make as usual to build it.

To Add a New Test Program

Edit test/Makefile.am. Add your program to the list bin_PROGRAMS. Create new variables prog_SOURCES and prog_LDADD following the examples of the programs already there. For example, if the new program is named test-silly, add:
test_silly_SOURCES = test-silly.c other sources here
test_silly_LDADD = $(ldadd)
Follow steps 2-4 above.

The Assignment

Part 1: Implement Thread Scheduling

For part 1, we give you:

An implementation of a simple queue (sthread_queue.h).
A context-switch function that, given a new stack pointer, will switch contexts (sthread_ctx.h).
A new-stack function that will create and initialize a new stack such that it is ready to run.
Skeleton routines for a user-level thread system (sthread_user.c).

It is your job to complete the thread system, implementing:

Data structures to represent threads (struct _sthread in sthread_user.c).
A thread creation routine (sthread_user_create()).
A thread destruction routine (sthread_user_exit()).
A simple non-preemptive thread scheduler.
A mechanism for a thread to voluntarily yield, allowing another thread to run (sthread_user_yield()).
A routine to initialize your data structures (sthread_user_init()).

The routines in sthread_ctx.h do all of the stack manipulation, PC changing, register storing, etc. Rather than focusing on that, this assignment focuses on managing the thread contexts. Viewed as a layered system, you need to implement the grey box below:

At the top layer, applications will use the sthread package (through the API defined in sthread.h). sthread.c will then call either the routines in sthread_pthread.c or your routines in sthread_user.c (it chooses based on the value passed to sthread_init()). Your sthread_user.c, in turn, builds on the sthread_ctx functions (as described in sthread_ctx.h).

Applications (the top-layer) may not use any routines except those listed in sthread.h. They must not know how the threads are implemented; they simply request that threads be created (after initializing the library), and maybe request yeilds/exits. For example, they have no access to sthread_queue. Nor do they keep lists of running threads around; that is the job of the grey box, to encapsulate and hide from the applications.

Similarly, your grey box - sthread_user.c - should not need to know how sthread_ctx is implemented, nor should it modify any functionality in that file. Do not attempt to modify the sthread_ctx_t directly; use the routines declared in sthread_ctx.h.

Recommended Procedure

Figure out what each function you need to implement does. Look at some of the test programs to see usage examples.
Examine the supporting routines we've provided (primarily in sthread_queue.h and sthread_ctx.h).
Design your threading algorithm: When are threads put on the ready queue? When are threads removed? Where is sthread_switch called?
Figure out how to start a new thread and what to do about the initial thread (the one that calls sthread_user_init).
Talk to your fellow students and the TAs about your design.
Implement it.
Test it. The test programs provided are not adequate; for full confidence in your implementation, you'll need to create some of your own.

Hints

sthread_create should not immediatly run the new thread.
Use the routines provided in sthread_ctx.h. You don't need to write any assembly or try to directly manipulate registers for this assignment, nor is an understanding of the exact layout of the stack required (in fact, the routines provided go to great lengths to avoid requiring such knowledge).
If the routine passed to sthread_create() finishes (returns), you need to make sure the thread's resources are released (i.e. sthread_exit() is called).
The start routine passed to sthread_new_ctx does not take any arguments (unlike the routine passed to your sthread_user_create). So you can't create a new stack directly with the user-supplied routine; you need to supply a routine that takes no arguments but somehow winds up invoking the user's routine with the user's argument.
You should free any resources allocated when a thread exits (whether it exits explicitly, by calling sthread_exit(), or implicitly, because the start routine returns). However, you should not attempt to free the stack of a running thread (note: to free a stack, use shread_free_ctx()). This requires a few tricks.
Dealing with the initial thread is tricky. You need to make sure that an sthread_t struct is created for it at some point, so it can be scheduled like the other threads. However, it wasn't created by your sthread_user_create() function. Remember that, while a thread is running, the state stored in the sthread_t is garbage (though it is probably important that you have a such a struct, so you've got some place to put the state when you want to stop the thread).
Be very careful using any local variables after calling sthread_switch() as their values are quite likely different from what they were before (you're on a different stack).
While global variables are bad in general, there will be places in this assignment where they are necessary.
To debug with gdb(1) or any other debugger, run the configure script with --disable-shared (and then re-run make from the top level). This disables shared-libraries, which cause trouble for the debugger. When you run the project, gdb will stop each time you create a new thread with a message like "Program received signal SIGUSR1, User defined signal 1." This is OK - just type continue.

Part 2: Implement Mutexes and Condition Variables

For part 2, you'll use the thread system you wrote in part 1, extending it to provide support for mutexs (a.k.a. locks) and condition variables. Skeleton functions are again provided in lib/sthread_user.c. You need to:

Design data structures for mutexes (struct _sthread_mutex) and condition variables (struct _sthread_cond).
Implement the mutex operations (sthread_user_mutex_*()).
Implement the condition variable operations (sthread_user_cond_*()).

The non-preemptive nature of this assignment aids you greatly in your tasks (since it gives you atomic critical sections). However, you should think about what you would need to do if your threads were preemptive; add comments to the routines indicating precisely where the critical sections are and what appropriate protection might be applicable (e.g. how you would use a spinlock or other atomic operation).

Hints

Figure out how to block a thread, making it wait on some queue. How do you get the currently running thread? How do you switch out of it? How will you wind up back in it?
Locks do not immediately switch threads when unlocked.

Part 3: Build a thread pool in C

Before you begin, copy the following file into your directory:

/cse451/doug/server.tar.gz

This tar archive contains the source code to the single-threaded server that you will modify, as well as some other files that we will describe below. To extract the files from the archive, use the following command:

tar -xvzf server.tar.gz

Your first task is to design, implement, and test an abstraction called a thread pool. Your thread pool implementation should consist of two files: threadpool.h and threadpool.c. In the server.tar.gz archive, we've provided you with skeleton implementations of these files. You must not modify threadpool.h in any way. To build your thread pool, you should only add to the threadpool.c file that we've given you.

As you should see from threadpool.h, your thread pool implementation must support the following three operations:

threadpool_t create_threadpool(int num_threads_in_pool);

void dispatch(threadpool_t pool, dispatch_fn dispatch_to_here, void *arg);

void destroy_threadpool(threadpool_t pool);
The create_threadpool() function should create and return a threadpool. The threadpool must have exactly num_threads_in_pool threads inside of it. If the threadpool can't be created for some reason, this function should return NULL.
The dispatch() function takes a threadpool as an argument, as well as a function pointer and a (void *) argument to pass into the function. If there are available threads in the threadpool, dispatch( ) will cause exactly one thread in the pool to wake up and invoke the supplied function with the supplied argument. (Once the function call returns, the dispatched thread will re-enter the thread pool.) If there are no available threads in the pool, i.e. they have all been dispatched, then dispatch( ) must block until one becomes available by returning from its previous dispatch.
In either case, as soon as a thread has been dispatched, dispatch( ) must return.
The destroy_threadpool function will cause all of the threads in the threadpool to "commit suicide", after which any memory allocated for the threadpool should be freed. (I found this to be the trickiest function to get right.)

You will be using the pthreads thread library to build your multithreaded code. To learn about pthreads, type:

man -k pthread | less

This lists all of the routines present in the pthreads library (each one has its own man page). Don't worry, there are many more routines than you need to know about. In the server.tar.gz archive, we've provided you with some example multithreaded code that uses pthreads. All of the pthreads routines that you need to know about are used in the example program, which is called example_thread.c. To run the example, first type make, then run the program example_thread.

In addition to this example code, we've also provided you with some code (threadpool_test.c) that makes use of a thread pool. Type the command make from inside the server/ directory to compile all the code in there, including your threadpool source code and test code. An executable called threadpool_test should be created. When you have a working thread pool implementation, running the test code should result in output that looks like this. Note that just because you get this output while using your thread pool implementation, it doesn't mean that your thread pool works. Your implementation may have subtle race conditions that only show up occasionally. We will be reading your source code to look for such races.

Requirement #1: there should be NO BUSYWAITING in any of your code.

Requirement #2: there should be NO DEADLOCKS and NO RACE CONDITIONS in your code.

Hint #1: pthreads condition variables have the same operations as semaphores (signal and wait), but unlike semaphores, condition variables do not have history. If a thread calls signal, and there is no other thread blocked on wait, then the signal will be lost.

Part 4: Use your thread pool to turn the single-threaded server
into a multithreaded server.
In the server.tar.gz archive, we have provided you with a working implementation of a single-threaded network server. The server source code is in the file called server.c. The server implementation makes use of a network library that implements all of the routines needed to read and write data over the Internet. (We've provided you with the source code to this network library in case you're curious. The source code is in the SocketLibrary/ subdirectory. But you shouldn't have to modify an of that code.)

The server is designed to do the following:

Create a "listening socket" on a network port specified as a command-line argument to the server. Because the server has created a listening socket, clients can now open connections to the server and send it data. In fact, multiple clients can simultaneously open multiple connections to the server. As explained in the background at the top of this web page, the single-threaded server doesn't handle multiple connections in parallel, but just works on them one at a time.
Perpetually loop, doing the following:
- Accept a new connection from a client. (If there are multiple connections that have arrived, only one will be accepted. The rest will queue up inside the OS, and will be accepted by the server in subsequent iterations through this loop.)
- Read data from the new connection. (Our server reads 10 bytes.)
- Process the request. Our server simply does some mindless computation on the 10 bytes. (How much computation it does can be altered by changing the NUM_LOOPS constant in the server source code.)
- Write a response back to the client. (Our server writes 10 bytes.)
- Close the connection to the client.

What you must do is change the server to work in the following way:

Create a thread pool. (How big the pool is should be specified as a command-line argument to your server.)
Create a listening socket.
Perpetually loop, doing the following:
- Accept a new connection from a client.
- Dispatch the connection to a thread from the thread pool. (Yes, if the thread pool is empty, the server's main thread will block in the dispatch until a thread becomes available.)

The dispatch function should do the following:

Read data from the dispatched connection.
Process the request.
Write a response back to the client.
Close the connection to the client.

Thus, each time a new connection arrives at the server, the main thread dispatches a thread from the threadpool to handle the connection.

We've also provided you with an implementation of a network client. You should not need to modify or understand the implementation of this client. The client is single-threaded: it loops forever, and in each loop iteration, it opens a connection to the server, writes a request, reads a response, and closes the connection. Therefore, to fully test your multithreaded server, you will need to run multiple clients in parallel.

To launch the single-threaded server, give the following command (on spinlock, coredump, or inside VMware):

./server 4324

Here, 4324 is the "port" that the server listens to. A port is like a phone number: a given workstation can have many servers running on it, each listening to a different port. Because many of your classmates might be running their server code on coredump or spinlock, they might have already taken your port. Just try again with a different port; you can choose any number between 1025 and 65535.

To launch a client, give the following command (in another window):

./client servername 4324

Here, servername is the IP address or hostname of the workstation that the server is running on. For example, if you've run the server on the machine coredump.cs.washington.edu, you'd specify "coredump". The name localhost always refers to the local machine. So, if you're running the client on the same workstation as the server, you could launch a client like:

./client localhost 4324

To run multiple clients simultaneously, you could issue the following commands:

bash$ ./client localhost 4324 &
bash$ ./client localhost 4324 &
bash$ ./client localhost 4324 &
etc..

To fully test your server out, you'll need to launch at least as many clients as there are threads in your threadpool, but preferably more. (Think about why.)

Requirement #1: You must implement your multithreaded server by changing the source code to the single-threaded server (server.c).

Requirement #2: You must not create or modify any other source files (besides your threadpool implementation, of course).

Hint #1: You shouldn't need to change or add much code in the existing single-threaded server. We've nicely structured the single-threaded server so that you can reuse most of its functions.

Hint #2: You shouldn't need to understand how network programming works if you don't want to. By heeding hint #1, you will be shielded from all of the details of network programming. Don't be afraid to look under the covers, though!

Hint #3: The code we've provided doesn't ever print anything out. While you're debugging your code, you might want to add some print statements at various places in the client or server to help you figure out what's happening. Make sure you disable the print statements before doing part 5 of the assignment.

Part 5: Measure the performance of your multithreaded server

Now that you have a working multithreaded server, it's time to put it through its paces. You will measure the throughput of your server as a function of the number of threads in the threadpool, and the amount of computation performed in the server. Throughput is simply defined as the number of requests per second that the server can handle.
This means that you need to instrument your server with code that periodically displays the throughput that the server is handling. To do this, just have the main thread (i.e. the code that calls dispatch on the threadpool) increment a counter for every dispatch it issues. Every now and then (i.e., whenever the counter increases by some value, say 50 or 100), take a timestamp using the gettimeofday() system call. Use these timestamps to calculate and print out the server's throughput. (Don't worry about keeping more than 2 timestamps around.)
To change how much computation the server performs, just alter the value of the NUM_LOOPS constant in the server source code. Higher numbers mean more computation. In fact, I suggest you make this another command-line argument to your server.
What you need to do is measure and report the throughput of the server in the following 36 conditions:

# threads in pool = 1 and NUM_LOOPS=1, 100, 1000, 10000, 100000, 500000
# threads in pool = 2 and NUM_LOOPS=1, 100, 1000, 10000, 100000, 500000
# threads in pool = 4 and NUM_LOOPS=1, 100, 1000, 10000, 100000, 500000
# threads in pool = 8 and NUM_LOOPS=1, 100, 1000, 10000, 100000, 500000
# threads in pool = 16 and NUM_LOOPS=1, 100, 1000, 10000, 100000, 500000
# threads in pool = 32 and NUM_LOOPS=1, 100, 1000, 10000, 100000, 500000
(You should run at least as many clients as you have threads in the server's threadpool. For example, for # threads in pool = 4, you should run at least 4 clients. Report the number of clients used in each experiment too.)
To gather this data, run your clients and server inside VMware on a lab workstation. This will guarantee that nobody else is running their server on the same machine as you, thereby disturbing your experiment.
Using your favorite graphing package (such as Microsoft Excel), plot a graph that looks something like:
In other words, your graph should plot the throughput of your server as a function of the number of threads in the thread pool. You should plot a separate line for each different value of the NUM_LOOPS variable. You should play with logarithmic-scale axes (if necessary) in order to get your graph to reveal as much information as possible.

Turnin

Parts 1 and 2

Your writeup should include the following:

Design description of your user-level threads and synchronization primitives. Discuss the goals of each, how well your design meets those goals, and the tradeoffs that were made. Each person needs to put this in a readme-username.txt for the part 1 & 2 turnin.
Note: Each student (including those working together in a group) needs to write their own independent write-up.

In your top-level simplethreads directory, run make dist. This will produce a file named simplethreads-1.02.tar.gz. Submit this file and your two readme-username.txt using the turnin(1L) program under project name proj3a by 11:59pm on the day it is due. turnin will not work on coredump/spinlock, so you'll need to use one of the general-purpose machines (sumatra, fiji, ceylon, or tahiti).

If you have added any files, run tar -tzf simplethreads-1.02.tar.gz before you submit and check to make sure your new files are listed.

Parts 3, 4 and 5

The files that you will need to turn in for this portion are your modified threadpool.c and server.c files, along with two writeup-username.txt files (limited to 2 pages, single-spaced, with 11 pt font) for each person's writeup (see below). Follow the same instructions as for parts 1 & 2; use the project name proj3b (not to be confused with proj3a!).

Writeup

In a hardcopy write-up (a printout of what you have submitted electronically the night before) to be handed in in class on Wednesday, May 7, explain what you did. Make sure to include the same description in your electronic turn-in the night before.
Note: Each student (including those working together in a group) needs to write their own independent write-up.

Your writeup should include the following:

A verbal description of the structure of your threadpool implementation, including a list of any design decisions you made.
A list of critical sections inside your threadpool code (and why they are critical sections).
A description of the synchronization inside your threadpool. What condition variables did you need, where, and why? Under what conditions do threads block, and which thread is responsible for waking up a blocked thread?
The graph you produced in part 5 (make sure it is in a well known file format), as well as an explanation of what the graph shows and why it is shaped the way it is. Focus your discussion on the interesting features in your graph.
Discuss conclusions you have reached from this project. What recommendations do you have for designers of thread systems?