CSE 374, Lecture 25: More Concurrency

Where we left off

Last Friday, we were talking about concurrency and the problems that arise with executing code on multiple "threads" of execution at once when they access shared memory. We had a BankAccount class and figured out that in order to prevent problems, we should use "locks" to allow only one thread at a time to use/modify the data.

In C++, a lock is called a "mutex". We "lock()" the mutex every time we enter a function, and "unlock()" it right before we return.
If one function that locks the mutex tries to call another function that locks the mutex, we will have a deadlock since the second function will not be able to lock the mutex until the first function returns, but the first function won't return until the second function finishes executing. We're stuck! To prevent this, we often add a private helper function that doesn't lock, which we can call instead.
We need to make sure to unlock the mutex before throwing an exception as well, since otherwise we will never release the lock.

All of this locking/unlocking is tricky, and it is easy to forget. As an alternative, C++ provides something called a "lock guard" which simplifies the act of using a mutex:

    void deposit(double amount) {
      std::lock_guard<std::mutex> lock(m_);       // locks mutex m_ in the lock_guard constructor
      // mutex is now locked
      setBalanceWithLock(getBalance() + amount);
      // When deposit() returns, the stack-allocated lock_guard will be deleted,
      // calling the destructor and releasing the mutex.
    }

A "lock guard" is a special type of object that locks the mutex in the constructor, and unlocks the mutex in the destructor. If we allocate the lock guard on the stack, then as soon as it is created, we can guarantee that we have locked the mutex, and the destructor will automatically be called when the lock guard goes out of scope. We don't have to remember to unlock in all cases! This even works for the exception case.

Race conditions

A "race condition" happens when the result of a computation depends upon scheduling of multiple threads, ie the order in which the processor executes instructions. But we've seen two types of race conditions:

Bad interleavings. A bad interleaving is when the code exposes bad intermediate state. This is like we saw with our bank account example: the getBalance() -> setBalance() calls exposed intermediate state. Bad interleavings are incorrect from the programmatic logical perspective: in the bank example, we lost money or allowed balances to go below 0.
Data races. Even if we can't have a line-by-line interleaving, we can still have race conditions through something called a "data race". Something like getBalance() - which so far we haven't locked - seems like it is fine: we may read an old value of the balance, but we can't come up with a bad interleaving that will do something incorrect. However, things can still go wrong! This is because what seems like an "atomic" operation, like setting "balance_ = amount" or "return balance_", is actually NOT guaranteed to be an atomic operation at the compiled machine-code level. The compiler can do any number of weird unexpected optimizations with odd temporary state, and thus we have the potential to read or write incorrectly if there is more than one thread doing so. Data races are a little hard to understand - because we don't think in terms of the optimized machine-level code - but the takeaway is that whenever you have the potential to read+write or write+write on different threads, you MUST synchronize access to the shared memory (with a lock or similar).

std::atomic

What about the static accountCount_ variable that we are using to generate account IDs? This is also a problem! What if two accounts are created at the same time? The ++ operation is not necessarily safe in a multi-thread setting - this is an example of a data race!. We could fix this by adding a static mutex to protect the static count:

    static std::mutex accountCountMutex_;
    static int accountCount_;

and then lock it in the constructor before we set the account id:

    BankAccount::BankAccount() {
      accountCountMutex_.lock();
      accountId_ = ++accountCount_;
      accountCountMutex_.unlock();
      balance_ = 0;
    }

This is kind of a pain, but luckily there is a simple tool in the standard libraries to facilitate this use case! We just want a lock around all of the read/modifications of the integer. The std::atomic wrapper object does just this! It makes any operations on the integer "atomic". The ++ operation will be done all in one, so we can guarantee that no one else gets the same value of accountCount_.

    // In h:
    static std::atomic<int> accountCount_;

    // In cpp:
    BankAccount::BankAccount() {
      accountId_ = ++accountCount_;
      balance_ = 0;
    }

Deadlocks

How about if we want to write a transferTo function to transfer an amount from account A to account B?

    void transferTo(double amount, BankAccount& other) {
      m_.lock();
      other.m_.lock();

      setBalanceInternal(getBalance() - amount);
      other.setBalanceInternal(other.getBalance() + amount);

      other.m_.unlock();
      m_.unlock();
    }

Since now we are dealing with two different accounts, we will have to lock the mutexes of both accounts before doing any balance transfer. So first we call lock() on the current object's mutex, then we call lock() on the other object's mutex.

Unfortunately, this logic can produce a deadlock. Why? Consider that we have account A and account B, and on thread T1, we are trying to transfer $50 from A to B and on thread T2, we are trying to transfer $20 from B to A. We can model this with a bad interleaving:

    Thread T1: A.transferTo(50, B);                    Thread T2: B.transferTo(20, A);

    m_.lock();  // Locks A's mutex
                                                       m_.lock();  // Lock's B's mutex

    other.m_.lock();  // Waits for B's mutex
                                                       other.m_.lock()  // Waits for A's mutex

Object A locks A's mutex, while Object B locks B's mutex. Then each of them wait for the other object's mutex to become available - which will never happen because they are both waiting for the other object to be finished! We have a DEADLOCK.

Solutions to this situation:

Use smaller critical sections. We might lock A's mutex only around the modification of A's balance, and lock B's mutex when modifying B's balance. But this is not ideal because then we expose an intermediate state in which A's account has been debited but the funds haven't been put in B's account yet - we've temporarily lost money, which isn't great.
Use larger critical sections. We could add a single lock for all bank accounts that must be acquired before doing multi-account transactions. This is correct, but it means that we can only do one transaction at a time throughout the entire bank, even if the accounts aren't related to each other. This is a performance loss.

Always lock mutexes in a specific order. For instance, in this case, we can choose to always lock the mutex of the account with the lower account id first, then lock the id of the higher account id. This works because account ids are unique and immutable, thus we can rely on them without synchronization. Something like this:

void transferTo(double amount, BankAccount& other) {
  bool thisIsFirst = getAccountId() < other.getAccountId();
  if (thisIsFirst) {
    m_.lock();
    other.m_.lock();
  } else {
    other.m_.lock();
    m_.lock();
  }

  setBalanceInternal(getBalance() - amount);
  other.setBalanceInternal(other.getBalance() + amount);

  if (thisIsFirst) {
    other.m_.unlock();
    m_.unlock();
  } else {
    m_.unlock();
    other.m_.unlock();
  }
}

Other types of synchronization primitives

There are other types of locks and primitives that are useful, besides the regular mutex, lock guard, and std::atomic:

Reentrant locks. We had a problem earlier where one function that locked the mutex tried to call another function that would lock the same mutex, but this didn't work because the first function already had the lock! There is a type of lock that allows this behavior, and it's called a "reentrant lock": the same thread may re-lock the same lock any number of times. The lock will be released to a different thread once all of the lock() calls have been correspondingly unlock()'ed. Re-entrant locks can make it difficult to reason about lock state, however, so we suggest using them sparingly.
Reader-writer locks. All of the problems that we've seen so far have resulted by read/write or write/write combinations of calls. If two threads are reading the same value, there's no problem, and there is no need for a lock; it's only the writes that make things complicated. To improve efficiency of your program, you might use "reader-writer locks": these special locks allow multiple threads to read the same data at a time, but if any thread tries to write, it will make sure that no other thread is either reading or writing at the same time. This improves the performance of reads (allowing them to happen at once) while still maintaining correctness of the program.
Condition variables. Let's say you are trying to dequeue from a queue, but there's no data in the queue at the moment. You want to wait until some other thread inserts into the queue, then you can wake up and dequeue that element! How can we implement this kind of waiting behavior? We can use something called a "condition variable": a primitive that can be used to block a thread until another thread notifies the condition variable that the waiting condition has been satisfied.

Wisdom

For every memory location, you should obey at least one of the following:

Make it thread-local. Whenever possible, avoid sharing resources between threads - make a copy for each thread. If threads do not need to communicate with each other through the shared resource (for example, a random-number generator), then make it thread-local. In typical concurrent programs, the vast majority of objects should be thread-local.
- Shared-memory should be rare - minimize it.
Make it immutable. Whenever possible, do not update objects; make new objects instead. If a location is only read (never written), then no synchronization is necessary. Simulataneous reads are not data races, and not a problem.
- In practice, programmers over-use mutation - minimize it.
Make access synchronized, ie use locks and other primitives to prevent race conditions.

When it comes to synchronization, there are several guidelines to follow:

No data races. Never allow two threads to read/write or write/write a location at the same time. In C, a program with a data race is almost always wrong.
Think of what operations need to be atomic. Consider atomicity first, then figure out how to implement it with locks).
Consistent locking. For each location that should be synchronized, have a lock that is ALWAYS locked when reading or writing that location. The same lock may (and often should) be used to guard multiple locations/pieces of memory. Clearly document with comments the mutex that guards a particular piece of memory.
Start with coarse-grained locking; move to finer-grained locking only if blocking for locks becomes an issue. Coarse-grained locking is the practice of having fewer locks: one for the whole data structure, or one for all bank accounts. It is simpler to implement, but performance can be bad (fewer operations can be done at the same time). But if there isn't a lot of concurrent access, then coarse locking is probably fine. Fine-grained locking is the practice of having more locks, each guarding less data: one lock per data element, or one lock per field in the bank account. Fine-grained locking is trickier to get correct, requires more programming, and has more overhead (more locks to lock), but it can be more performant if there is lots of concurrent access, since we can do more things at once. Move to fine-grained locks if performance is a problem.
Don't do expensive computations or I/O in critical sections, but also don't introduce race conditions. This balances performance with correctness.
Use built-in libraries whenever possible. Concurrency is extremely tricky and difficult to get right; experts have spent countless hours building tools for you to use to make your code safe.