Thread-safe Libraries

In the beginning (and, really, up till a few years ago), UNIX machines only allowed a single thread per process. During this period, many libraries were written (including the standard C library) and the authors natrually assumed that processes had one thread each. So they stored data in global variables and didn't bother with synchronization.

Threads were introduced, and people started writing multi-threaded programs. Quite often, several threads would be using library functions at the same time - this caused problems; though the programmer had thought they were being safe, the library was corrupting things (remember that libraries are just pieces of code that you don't have to write - otherwise, they're the same as if you had written the code).

For example, lets imagine had been written scanf(3) as follows:

scanf()

static char buf[256]; int scanf(char *format, ...) { read(STDIN, buf, 255); work on buf to pull out matches return matches }

What's wrong? Well, buf is a static variable, and is thus shared across the entire program. If two threads call scanf at the same time, they'll wind up overwriting the buf with some incorrect values.

scanf()
`static char buf[256]; int scanf(char *format, ...) { read(STDIN, buf, 255); work on buf to pull out matches return matches }`

Now, this example is fairly trivial to fix (we can just allocate a new buffer on each call to scanf), but others were not. The simplest solution was often just to add a lock around the entire function:

scanf()

static char buf[256]; static lock_t scanf_lock; int scanf(char *format, ...) { acquire scanf_lock; read(STDIN, buf, 255); work on buf to pull out matches release scanf_lock; return matches }

scanf()
`static char buf[256]; static lock_t scanf_lock; int scanf(char *format, ...) { acquire scanf_lock; read(STDIN, buf, 255); work on buf to pull out matches release scanf_lock; return matches }`

The addition of the lock makes this function thread-safe: it will produce the correct results if it is accessed by multiple threads simultaneously.

However, it isn't very efficient; there isn't really any reason a thread should have to wait for all the other threads to be done before it can use scanf (well, actually, there might be interesting problems with read(2) - but we'll ignore those). This means the function is not MP-efficient.

pthreads vs. LinuxThreads

pthreads refer to "POSIX threads," meaning threads that behave as specified in the POSIX standard 1003.1c. LinuxThreads is a library that proivdes POSIX threads for Linux. It uses the clone(2) syscall to create new threads and also provides the synchronization functions needed to implement the POSIX 1003.1c specification. While I'll use functions from pthreads, the concepts here are general across all systems (in fact, I copied my notes from a another class which did not use pthreads and just substituted the pthread functions :).

Synchronization Primitives

We'll discuss two concepts that are independent parts of synchronization:

Mutual exclusion: Protection of some data from simultaneous access by multiple threads.
Inter-thread scheduling: Communication between threads that some event happened; blocking for events in other threads.

General points about synchronization:

Necessary when multiple thread have access to the same data.
Can't be used in interrupt handlers (except sem_post(3)), because interrupt handlers must not block (same goes for UNIX signal handlers)
Don't forget to release the lock, re-enable interrupts, etc. Tricky with multiple exits (e.g. exceptions).
Hardware generally provides some help. Disable/enable interrupts, test & set, etc.
Synchronization bugs can be very difficult to find - so make sure you understand your design, and try to keep it simple.

Locks/Mutexs

Provides mutual exclusion.
Simple to use: pthread_mutex_lock(3) and pthread_mutex_unlock(3).
Make a critical section (note that a single critical section may be discontiguous if a single lock is acquired/released in multiple functions).
One of the few constructs which is actually "held" by an identifiable thread.
Usually 1 lock associated with each data object or collection type (e.g. a lock per list).
Granularity of locking and order of acquiring/releasing locks is a issue in all systems because of deadlock.
What happens if you call pthread_mutex_lock with the same lock twice in a row?

Thread Exit/Join

Provides inter-thread scheduling.
Join waits for a particular thread to exit (either by explicitly calling pthread_exit(3) or by returning from the thread's initial function.
Example: Thread A creates thread B and sets B off computing some value. A then computes on it's own, but eventually needs the value B was computing. So it joins with B, to make sure B was actually finished.
pthread_exit(3) and pthread_join(3)

Semaphores

Provides both mutual exclusion and inter-thread scheduling.
Have memory of past (current value of counter).

Other sync. constructs only know current state.

In general, more complicated then worth. Why?
Because they mix two functions (scheduling and mutual exclusion), thus, you can run into deadlock/other problems easily.
Monitors split these functions making them easier to debug.
sem_wait(3) and sem_post(3).

Monitors/Condition Variables

Idea in monitors is to separate concerns: use locks for mutual exclusion and condition variables for scheduling constraints. (CS162 Lecture Notes, UC Berkeley)
Condition variable: a queue of threads waiting for something inside a critical section
condition variable(s) + lock = monitor.
Lock protects the data structure; typically 1 lock per object.
Condition variables help the scheduler synchronize the actions on the object efficiently. Possibly multiple CVs per object (all using the same lock).
Can only use the methods of the cv while holding the lock.
pthread_cond_wait(3): Atomically do [release lock and sleep].
pthread_cond_signal(3): Wake a single waiter, if any currently waiting.
pthread_cond_broadcast(3): Wake all currently waiting waiters.

Condition Variables - Why?

So, why do we need these condition variable things anyway? We'll use a common example, an unlimited size buffer, and attempt to solve it without the condition variable.

We'll assume the following code is used to initialize the system appropriatly:

Initialization

pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER; pthread_cond_t data_available = PTHREAD_COND_INITIALIZER; buffer = empty buffer;

Initialization
`pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER; pthread_cond_t data_available = PTHREAD_COND_INITIALIZER; buffer = empty buffer;`

The correct version, with condition variables. We assume that two threads are using the code, one calling AddToBuffer and another calling RemoveFromBuffer.

AddToBuffer RemoveFromBuffer

pthread_mutex_lock(&lock); put item in buffer; pthread_cond_signal(&data_available); pthread_mutex_unlock(&lock); pthread_mutex_lock(&lock); while (nothing in buffer) { pthread_cond_wait(&data_available, &lock); } remove item from buffer; pthread_mutex_unlock(&lock); return item;

AddToBuffer	RemoveFromBuffer
`pthread_mutex_lock(&lock); put item in buffer; pthread_cond_signal(&data_available); pthread_mutex_unlock(&lock);`	`pthread_mutex_lock(&lock); while (nothing in buffer) { pthread_cond_wait(&data_available, &lock); } remove item from buffer; pthread_mutex_unlock(&lock); return item;`

First, lets try just removing condition variable:

AddToBuffer RemoveFromBuffer

pthread_mutex_lock(&lock); put item in buffer; pthread_mutex_unlock(&lock); pthread_mutex_lock(&lock); while (nothing in buffer) { ; } remove item from buffer; pthread_mutex_unlock(&lock); return item;

Obviously no good - the while loop holds the lock, meaning no way to run AddToBuffer, so while will run forever....

AddToBuffer	RemoveFromBuffer
`pthread_mutex_lock(&lock); put item in buffer; pthread_mutex_unlock(&lock);`	`pthread_mutex_lock(&lock); while (nothing in buffer) { ; } remove item from buffer; pthread_mutex_unlock(&lock); return item;`

OK, so release the lock in the while loop (but remember to reacquire it before checking if anything is in the buffer):

AddToBuffer RemoveFromBuffer

pthread_mutex_lock(&lock); put item in buffer; pthread_mutex_unlock(&lock); pthread_mutex_lock(&lock); while (nothing in buffer) { pthread_mutex_unlock(&lock); pthread_mutex_lock(&lock); } remove item from buffer; pthread_mutex_unlock(&lock); return item;

This code will work (surprisingly), but very slowly - if one thread enters RemoveFromBuffer (with an empty buffer) and then another thread enters AddToBuffer, the only way to make progress is to context switch out of RemoveFromBuffer in-between the two lock statements. That's no good.

AddToBuffer	RemoveFromBuffer
`pthread_mutex_lock(&lock); put item in buffer; pthread_mutex_unlock(&lock);`	`pthread_mutex_lock(&lock); while (nothing in buffer) { pthread_mutex_unlock(&lock); pthread_mutex_lock(&lock); } remove item from buffer; pthread_mutex_unlock(&lock); return item;`

Well, how about we stick a sched_yield(2) in there to speed things up a bit - that way, at least the scheduler will have a better chance of context switching us there. (That the algorithm is correct, but very inefficient is a good sign we have an inter-thread scheduling problem, not a mutual exclusion problem).

AddToBuffer RemoveFromBuffer

pthread_mutex_lock(&lock); put item in buffer; pthread_mutex_unlock(&lock); pthread_mutex_lock(&lock); while (nothing in buffer) { pthread_mutex_unlock(&lock); sched_yield(); pthread_mutex_lock(&lock); } remove item from buffer; pthread_mutex_unlock(&lock); return item;

That is looking a little better, but a yield isn't quite right - it is quite likely we will have to go around the while loop many, many times before some thread actually adds anything to the buffer. What we really want is to sleep until there is data available, and have AddToBuffer wake us up.

AddToBuffer	RemoveFromBuffer
`pthread_mutex_lock(&lock); put item in buffer; pthread_mutex_unlock(&lock);`	`pthread_mutex_lock(&lock); while (nothing in buffer) { pthread_mutex_unlock(&lock); sched_yield(); pthread_mutex_lock(&lock); } remove item from buffer; pthread_mutex_unlock(&lock); return item;`

Lets try and do that by adding a queue to hold waiting threads (call it waiters). AddToBuffer will take the first thread off of the queue and wake it up. We can then change the sched_yield(2) to a sleep(3), since AddToBuffer should wake us up...

AddToBuffer RemoveFromBuffer

pthread_mutex_lock(&lock); put item in buffer; if (waiters not empty) waiters.next().sendAlarm(); // wakeup time pthread_mutex_unlock(&lock); pthread_mutex_lock(&lock); while (nothing in buffer) { 1 pthread_mutex_unlock(&lock); 2 waiters.enqueue(this thread); 3 sleep(1000000); // sleep a long time 4 pthread_mutex_lock(&lock); } remove item from buffer; pthread_mutex_unlock(&lock); return item;

OK, that looks good. Where is the problem? Consider a context switch from thread A between lines 1 and 2 in RemoveFromBuffer to thread B just entering AddToBuffer. Thread B would successfully add the item to the buffer, and then check if any waiters were available. Since thread A has not yet added itself to the waiters queue, it wouldn't find any, and it would return. At some point, thread A would run again, and would sleep - even though there is actually data available in the buffer.

AddToBuffer	RemoveFromBuffer
`pthread_mutex_lock(&lock); put item in buffer; if (waiters not empty) waiters.next().sendAlarm(); // wakeup time pthread_mutex_unlock(&lock);`	`pthread_mutex_lock(&lock); while (nothing in buffer) { 1 pthread_mutex_unlock(&lock); 2 waiters.enqueue(this thread); 3 sleep(1000000); // sleep a long time 4 pthread_mutex_lock(&lock); } remove item from buffer; pthread_mutex_unlock(&lock); return item;`

What we need is an atomic version of lines 1 through 4. Hmm, that happens to be exactly what a condition variable does. Note that we were able to solve the locking/consistency part of the problem without condition variables, but to get the scheduling part right, we needed them.

You may also be wondering: When would we ever need more than 1 condition variable? Here is a quick example where we add another CV, bufferEmpty:

AddToBuffer RemoveFromBuffer HandleAnEmptyBuffer

pthread_mutex_lock(&lock); put item in buffer; pthread_cond_signal(&data_available); pthread_mutex_unlock(&lock); pthread_mutex_lock(&lock); while (nothing in buffer) { pthread_cond_wait(&data_available, &lock); } remove item from buffer; if (buffer empty) bufferEmpty.wakeAll(); pthread_mutex_unlock(&lock); return item; pthread_mutex_lock(&lock); while (something in buffer) { pthread_cond_wait(&buffer_empty, &lock); } yell at roommate for hosing the network; pthread_mutex_unlock(&lock);

Basically, anytime you have multiple different conditions, use another condition variable - pretty simple, really.