Section Notes - 1/23/03 Section Outline:
Introduction to synchronization In past discussions, we've seen that resource sharing
(e.g. shared application-level data structures, shared memory) is one of
the key benefits of threading. We've also seen the potential for a
single misbehaving thread to corrupt the shared resources for every
thread. The same is true for cooperating processes, that is,
processes that can affect, or be affected by other processes in the
system. To ensure correct and (hopefully) efficient cooperation
between threads and/or processes, we introduce the idea of synchronization.
We'll now briefly introduce a few formalisms which are also covered in Ch.
7 and which will be in Friday's lecture. Synchronization exercises |
When dealing with any sort of asynchronous code (be it threads, hardware, or some other such system), there needs to be the concept of an atomic operation when dealing with shared data. Atomic operations are operations that cannot be interrupted part way through execution. Without these, it is pretty near impossible to ensure consistency inside a data structure (let alone correctness).
Assume we have a global variable declared:
int i;
Assume we also have 2 threads, A and B, that execute the following lines of code, A first, then B:
Thread A:What is the value ofi = 3;
Thread B:i = 4;
i
if read in a third thread C?
The problem here is that you're looking at the problem from "too high a
level." When you are doing with asynchronous code and you need to know
which operations can be assumed to be atomic when you are accessing an
otherwise unprotected (not in critical section) piece of shared data.
In this case, you want to assume that the line "i=3" is atomic.
That is, when I say "i=3", immediately after that line executes, I want to be
able to count on the fact that i
gets assigned the value 3 after
that line executes and
all other concurrent lines of execution will see 3 when they
read i
.
Unfortunately, on a computer, the granularity at which operations can be
considered atomic is a machine instruction and not a line of C source code.
Given the C source code line
"i=3
", you don't know which instructions this generates. So, the answer to the
question above is: "You don't know. It could be 3, 4 or some other value."
In short, the compiler may generate code that caches the value of i
in a register.
When we think of i
as a shared resource, we think of it as a
location in memory that two lines of execution may at any point read or write
data to and from. If it were true that when we said i=3
that
the compiler generated instructions that just wrote 3 directly to the memory
location every time, and if we decided to read the value from i
,
that the compiler generated instructions that read from the memory location
every time, we would find this model valid.
However, let's say the code looked something like this:
i=3;
a = i+2;
if (i == 5) {
...
Reading from memory is really slow compared to reading from a register. So,
the compiler may generate code that loads the value of i
into
a register and just refer to the value in the register for the subsequent
instructions. This means we now have 2 versions of i
, one in
memory, and one in a register.
If we have 2 threads accessing i
, then we want the changes from
one thread to be immediately visible to the other thread. That means that
we can't let the compiler cache the value of i
in a register.
If we do let it, then thread A may not see the changes to i
that thread B makes since the changes are stuck in thread B's register
context which gets swapped out when thread A loads. This means you never
really know what the value of i
is and that just sucks when
programming.
How do we do that? The best answer is, learn about the machine architecture,
and write the code for reading and writing to shared variables in assembly.
However, using gcc on an x86 architecture, you have another option.
volatile
to the rescue?
In the case of gcc on x86, yes.
There are very few keywords that force the compiler to listen to you;
volatile
, is one of these. This keyword tells the compiler
to keep its grubby little hands off (in terms of optimization) the variable
that it describes. That means that it can't assume the following if-block
will execute.
i=3;
if (i == 3) {
...
}
On x86, volatile
also has the side effect that values never
get cached in registers
as the compiler always wants to try and get the most recent version of
i
that it can.
Sounds ideal? Kind of. The problem with using keywords like this and making
assumptions above low-level code generation is that these assumptions may
or may not be correct depending on architecture and compiler implementation.
So, using gcc on an x86 processor, you should be safe to assume that a word
sized variable with a volatile type should be atomic. Other architectures and
other compilers, I don't know.
Thus, looking at the first case study, if the declaration for i
were:
volatile int i;
and Thread A executed first then Thread B, you could know that i
was 4.
Having Atomic reads and writes only allows you to react upon outdated values of an atomic variable in a read-modify-write cycle. This is often not enough since many times you want to conditionally assign to a variable based on the variable's current value. Such operations are called atomic Test-And-Set (TAS) operations. To be able to such an operation, you basically need hardware support. A test and set instruction usually takes the API:
bool test_and_set(compare, value, atomic_value)
where atomic_value gets assigned value if it is equal to compare. The function
returns true or false depending on whether or not atomic_value was assigned
"value".
There are 2 methods that are generally used in hardware for such a operation. In one method, they have an instruction that executes atomically on the processor (Intel Pentium architecture does this). In another method, they have a set of load and store instructions that are coupled so if the memory bus is used to modify a particular address between a load and a store, the store instruction fails (the PowerPC does this).
On i586, they have a cmpxchg (compare and exchange) instruction that takes a location in memory (atomic_value), compares it with a number (compare) stored in the register EAX, and if they are equal, assigns atomic_value <- value.
On PPC they have 2 instructions: lwarx (Load Word and Reserve Index) and stwcx (Store Word Conditional Index). If a value is loaded from memory using lwarx, the memory bus is instructed to "snoop" for accesses to that memory location. Thus, it knows if that value gets changed by someone else. If stwcx is executed, it checks to see if the location has been modified. If the location has not been modified, stwcx succeeds. Otherwise, it fails.
Atomic TAS operations are very powerful for synchronization. However, keep in mind that atomic operations are generally more complex than ones that are perhaps non-atomic. Thus, they are often slower (different instructions can take different amounts of time to execute). It may be worth your time to think about what class of functions/operations can be implemented with which atomic operations and which class can get away without using them.