CSE 374, Lecture 24: Concurrency

Intro to parallelism

All programming we've done so far has been sequential programming in which code progresses linearly, line-by-line, along a single command flow. And for a long time in the history of computing, this was just fine. Computer processing power increased exponentially, doubling about every two years, a phenomenon known as Moore's Law. However since about 2005, Moore's Law has slowed down. What happened? As the number of transistors packed onto a chip increases, the power used increases and heat that is produced increases, and we've arrived at a point at which we can't really increase the power much more. This means that the clock-frequency of the chip is stagnating, which means we can't go faster any more by adding more transistors. If we want our computations to go faster, we now have to divide the computations across multiple chips. This is what we call "multicore programming."

There are two main ways to use multiple cores:

  1. Run multiple programs at the same time, like one student running emacs on klaatu while another student runs their C benchmark program. We can technically also do this on a single core with a technique called "time slicing", in which programs take turns running, but it fits much more naturally on multiple cores.
  2. Do multiple things at once in a single program. We divide up the work or computation among the cores and do them in parallel. The cores will have to communicate with each other somehow (ie through memory). One caveat is that the speedup provided by multiple cores will not be perfect - there is overhead to managing cores, dividing the work, and combining results - so two cores will not be quite as fast as double one core.

There are many categories of uses of parallelism:

In programming, a "thread" is a single sequential execution path - the old concept of a sequential program that we've used before, only now there may be multiple "threads" running at the same time.

Concurrency

Parallelism and concurrency are slightly different things.

We can make an analogy to a kitchen. Our previous model of sequential programming was like a single cook who can do one thing at a time - chop vegetables, heat up the pan, add vegetables, stir, etc. In this analogy, parallelism would be the act of hiring sous-chefs and handing out potatoes and knives so they can all cut potatoes in parallel and speed up the operation of making mashed potatoes. Concurrency would be a situation in which we are not the only cook in the kitchen - there are many cooks making different things but only four burners on the stove. We need to manage the access to the burners so that we can use the burners as much as possible without spilling or burning any of the food.

Bank accounts

The canonical example of concurrency is the bank account.

Let's say we have a simple BankAccount class in C++:

    class BankAccount {
     public:          
      void deposit(double amount);
      void withdraw(double amount);
      double getBalance();
      void setBalance(double amount);

     private:
      double balance_;
    }

The withdraw function might be implemented like so:

    void BankAccount::withdraw(double amount) {
      double b = getBalance();
      if (amount > b) {
        throw std::invalid_argument();
      }
      setBalance(b - amount);
    }

The BankAccount class is used in a program belonging to a huge bank that services many, many customers, and in order to serve all of the customers, the bank has added multiple threads that are all performing transactions on the bank's accounts. Suppose we have a bank account x with a balance of $150. Suppose further that thread T1 calls x.withdrawal(100) and thread T2 calls x.withdrawal(100) right afterwards. These two transactions are attempting to happen on the same account, and what SHOULD happen is that one of the transactions succeeds in withdrawing 100, and the other throws an exception because the remaining balance of $50 is insufficient to satisfy the withdrawal.

However, these two threads are running at the same time, and therefore, because we cannot guarantee the exact speed at which each thread runs, we could get into a bad situation. We can model this bad situation as follows, with time indicated top-to-bottom:

         | Thread T1:                            Thread T2:
         |
         | double b = getBalance();
         |                                       double b = getBalance();
         |                                       if (amount > b) {
         |                                         throw std::invalid_argument();
    time |                                       }
     |   |                                       setBalance(b - amount);
     v   | if (amount > b) {
         |   throw std::invalid_argument();
         | }
         | setBalance(b - amount);

This kind of diagram is called an "interleaving" - the two threads running on different (or the same) core are arranged in time such that there is a conflict. What is that conflict? In this situation, T1 reads the balance (150) and stores it in variable b. Then thread T2 executes completely, deducting 100 from the account to leave a balance of 50. Then the rest of the function on T1 executes, comparing 150 with 100 (ok) and then setting the balance to $50. What has happened? We've lost a transaction! We withdrew 200 total but the account has a value of $50 - we created money! From the bank's perspective this is a very bad thing. This happened because we stored the balance in a temporary variable (b), which allowed the other thread to get in the middle and cause a problem.

Since the temporary variable caused a problem, we might try to solve the problem by removing the temporary variable:

    void BankAccount::withdraw(double amount) {
      if (getBalance() > b) {
        throw std::invalid_argument();
      }
      setBalance(getBalance() - amount);
    }

But this is still wrong! We can find an interleaving that puts us in a bad state:

         | Thread T1:                            Thread T2:
         |
         | if (amount > getBalance()) {
         |   throw std::invalid_argument();
         | }
    time |                                       if (amount > getBalance()) {
     |   |                                         throw std::invalid_argument();
     v   |                                       }
         |                                       setBalance(getBalance() - amount);
         | setBalance(getBalance() - amount);

What happens in this example? Thread 1 confirms that there is no exception (100 < 150), then T2 confirms that there is no exception and debits the account (resulting in a balance of 50), then T1 comes back and gets the new balance (50) and sets the balance to 50-100 or -50. In this case, we've got a negative balance! That violates the guarantees that we tried to make.

So how do we fix this problem? We want only one thread to be able to execute the withdraw function at a time! We might try adding a "busy" boolean to the BankAccount class. If the flag is "busy", then we will wait until it's not busy, then execute withdraw. We can think about this like a phone booth: If there's someone in the phone booth, we have to wait until the phone booth is no longer busy, then we can enter the phone booth. We do this by setting our "busy" flag to true.

    bool BankAccount::busy_ = false;
    void BankAccount::withdraw(double amount) {
      while (busy_) {}
      busy_ = true;
      if (getBalance() > b) {
        throw std::invalid_argument();
      }
      setBalance(getBalance() - amount);
      busy_ = false;
    }

This is a good idea but still wrong. We can see this by doing another interleaving.

         | Thread T1:                            Thread T2:
         |
         | while (busy_) {}
         |                                       while (busy_) {}
         |                                       busy_ = true;
         | busy_ = true;
         | if (amount > getBalance()) {
         |   throw std::invalid_argument();
         | }
    time |                                       if (amount > getBalance()) {
     |   |                                         throw std::invalid_argument();
     v   |                                       }
         |                                       setBalance(getBalance() - amount);
         | setBalance(getBalance() - amount);

We've still got a problem! Because T2 can get between T1's while loop and the point at which it sets busy_ = true, we can have a bad interleaving and result in the same situation as before.

It turns out that what we want to do is what's called an "atomic operation": we want to test whether busy_ is false, and then set it to true, but be want to do this as a single operation that can't be interrupted. When we execute instructions in C or C++, different lines (or even different parts of the same line) are not atomic. In order to accomplish this read-and-set operation, we actually need some hardware primitives. The hardware itself adds an operation to accomodate this get-and-set, and then the programming language will wrap this hardware primitive.

One kind of wrapping of the hardware primitive is called a lock. A lock is an object that can be locked and unlocked atomically. If you have called lock() successfully, you can be sure that NO ONE ELSE owns the lock. This allows us to write correct code. In C++ the lock is called a "mutex", and it will fix our problems:

    std::mutex BankAccount::m_;
    void BankAccount::withdraw(double amount) {
      m_.lock();
      if (getBalance() > b) {
        throw std::invalid_argument();
      }
      setBalance(getBalance() - amount);
      m_.unlock();
    }

The lock() call will wait until it can return successfully; we call this "blocking". The mutex blocks until it is next in line and can claim the lock. No other thread will be able to access the lock until unlock() is called. We now have no bad interleavings!

There are still some problems with our implementation, however:

We'll discuss more options and better options for locking next lecture.