Single Producer Consumer on a bounded array problem

Analyzing shared data problems.

When desinging a solution/convention for sharing data between any asynchronous execution enitities, the first thing to do is analyze who will touch which data when. Specifically, you want to know how many enities may concurrently attempt conflicting shared data operations.

The analysis of the problem should yield information as to what exactly are the issues involved with the concurrency. If most of your execution entities are doing reads and not modifying state, then you can often simplify your assumptions. If they are only doing writes, again you may be able to simplify things. If you can reduce things down to one producer or one consumer, your problem complexitry drops drastically. So before trying to solve any shared data problem, analyze where concurrency occurs and try see if you can reduce the number of concurrent operations.

Avoiding locks

Locking is an important concept and is often necessary to avoiding corruption of shared data in asynchronous systems.. If you have something that is dynamically sized, has more than one producer, has more than one consumer, or requires iteration, you are almost assured that you need to use locks. However, locks are slow and clunky. They also basically say "for this momment, we will stop all concurrency in this function." This causes problems in many areas including deadlock, realtime deadlines, responsiveness, etc.

If you ever try using Microsoft Outlook XP when it tries to sync with an Exchange server that has gone down, you'll notice that all your Office and IE windows will freeze and not refresh for the minute or so that Outlook is trying to find Exchange. This is because it is holding some global lock on resources. Why does it do this? There's probably a number of reasons including some combination of ensuring data consistency and having stupid programmers.

Rampant use of locks is bad. Try to keep your critical sections small. Try to make the number of steps required to update a structure small. It also cuts down the number of errors. If you can get your structure to run without locks, you can usually sidestep a large number of issues.

Single Producer Consumer on a bounded array problem analysis and solution

Problem statement

So, lets propose a simple problem. We have a bounded buffer that has one producer (that enques data) and one consumer (that deques data). These two execution entities run asynchronously. Your enqueue and dequeue functions need to ensure that they do not try to enqueue onto a full queue and dequeue onto a full queue.

Required State

The state of this queue requires at least three pieces of information:

All other data about the queue (like the size) can be inferred from the data above if we are a bit smart about our head and tail usage. The basic trick is if the empty state is head == tail and then to make the full state be (tail+1)%size == head which wastes one spot, but allows for the exclusion of a size variable.

Function implementation pseudo code

queue:
	atomic_t head
	atmoic_t tail
	T data[10]

enqueue(queue, item) {
	int tail = ATOMIC_READ(queue.tail);
	int head = ATOMIC_READ(queue.head);
	int newTail = (tail+1) % (sizeof(queue.data)/sizeof(T));

	if( ( newTail == head) {
		/* We're full. Handle appropriately */
	} else {
		queue.data[tail] = item;
		ATOMIC_SET(queue.tail, newTail);
	}
}

item dequeue(queue) {
	int tail = ATOMIC_READ(queue.tail);
	int head = ATOMIC_READ(queue.head);

	if( tail == head) {
		/* We're empty. Handle appropriately */
	} else {
		T item = queue.data[head];
		head = (head+1) % (sizeof(queue.data)/sizeof(T));
		ATOMIC_SET(queue.head, head);
		return item;
	}
}

Analysis of concurrency

There are only 2 execution entities involved with this structure, a producer and a consumer. The producer only enqueues and the consumer only dequeues. These two do not interfere with each other.

If you look at the "code" above, the enqueue function only ever reads the head pointer. It, however, reads and modifies the tail pointer. Thus the producer only reads the head state, but it modifies the tail state.

For the denqueue function it only ever reads the tail pointer, but it reads and modifies the head pointer. Thus the consumer only reads the tail state, but it modifies the head state.

All updates to head and tail are done atomically so there is never a worry of a variable being read when it is midway between updates. (We still have not proven that our updates don't cause multiple variables to have conflicting information. However, each variable will at least be self-consistent.)

The "data" item of the queue is only ever evaluated (read) between the head and tail pointers. Thus, the region outside of [head,tail) isn't really marked a valid and no one cares what we do with it. The data only has to be consistent between [head,tail).

Proof of correctness

If the enqueue function, the only error state occurs if the queue is full. We don't care if we jump the gun a bit (that is, if the queue becomes unfull in the middle of execution) because we will only corrupt data if we try to enqueue onto a full queue. So, as long as we can guarantee that our the part of our function that actually does the enqueuing only happens while the queue has space, we are good.

The full state is defined by

(tail+1) % (sizeof(queue.data)/sizeof(T)) == head

The code above reads head and tail when entering. We are the only ones who change tail (this is assured by us being the only producer and by enqueue being the only function that mutates tail), so the value we read will be valid until we decide to change it.

As for head, the dequeue function changes it. This function may execute at any point in our code. However, the dequeue function only moves the head closer to the tail (opening up more space in our queue). Thus, we don't care if our value for head is a bit old; that won't screw the consistency of our data structure up.

Knowing this, we can check for "fullness" and be sure that our enqueuing code only executes if there is space. Now, we just have to verify that our enqueueing code is valid.

We write to the position queue.data[tail]. This is okay because we are outside our valid region of [head,tail) and no one else is writing to this (we are the only producer). This line is safe.

We update tail with ATOMIC_WRITE(newTail) which atomically increases our valid region from [head,tail) to [head,tail+1) wrapping around where necessary. Thus, our update happens atomically and we are safe.

The proof for dequeue is essentially the same.

Proving non-locking synchronization

So there's a couple of basic principles here for creating a non-locking data structure. You have to prove that you, by code structure, implicitly have exclusive modification access to the variables you want to modify. If some other concurrent execution contexts read these variables, you need to prove that your update will not allow a read to happen midway through updating your variable (your update is atomic). And if you are reading variables that may be changed under your nose, you have to prove that your program can work correctly on outdated information. In list form:

  1. All variables you write to are modified exclusively by you
  2. You updates must be atomic if they are possibly read by another execution unit during your execution
  3. If you read a variable that may be modified by another execution unit, make sure your correctness can depend on an outdated value of that variable.