A note on thread synchronization in the face of the Timer

The presense and behavior of the Timer class has an interesting interaction with the use of long-lived threads using Java's thread synchronization primitives. Failure to understand what the timer is doing can lead to a variety of deadlock and livelock situations, in which simulated time is either blocked from advancing when it must (deadlock) or advances too fast, leaving other processing in the dust (which with many JVMs looks like a form of livelock). The fundamental rule for all threads in the face of the timer is that they must be in one of two states.
  1. Waiting for simulated time to advance (or for any event which will happen in the simulated future). In this case the thread must not block the advancement of simulated time or deadlock will result.
  2. Actively processing or runnable. In this case the thread must block the advancement of simulated time, or else the timeout processing will continue to run and tend to starve this thread.
In my experience there is no middle ground where it is unimportant whether a thread is or is not blocking the timer. Worse than that, attepts to fight case 2 often lead to inadvertant violations of case 1.

The primitives that the Timer provides client programs wishing to synchronize with it are the following:

These interfaces are designed for threads which simply wish to wait until a specific future simulated time, and then be woken. WaitFor and WaitUntil will block the caller until the specified time (specified either as a delta from the current time or absolutely). Before calling WaitFor or WaitUntil, a thread must register itself using RegisterThread. A registered thread must unregister itself before it dies. The meaning of registering a thread is discussed in the next section.

Registered threads

The current interface for blocking the advancement of simulated time is controlled by thread registration. The rule is that simultated time may only advance when all registered threads are blocked inside a WaitFor or WaitUntil call. Otherwise it is assumed that those threads are processing actions that need to take place before time can advance.

The usual problem case is a thread which wishes to synchronize not with the timer but with some other thread in the program, and therefore wishes to use the primitive Java wait() and notify() calls. Basically there are two things that can go wrong:

  1. The waiting thread might be registered when it calls wait(). This would prevent simulated time from advancing while the thread is waiting. Unfortunately, it is usually the case that the event being waited for cannot happen unless simultated time does advance. Thus you will generally need to call UnregiserThread (Thread.currentThread ()) before waiting.

  2. After the waiting thread is woken (i.e. when some other thread calls notify()), the waiting thread might not be registered. This would allow simulated time to continue to advance. Since Java uses Mesa-style monitors, notifying a waiting thread does not cause it to immediately run. However, if the timer thread gets to run first, it will continue to advance time, and the woken thread will not get a chance to run. Worse, it is not sufficient for the waiting thread to use the code:
             timer.UnregisterThread (Thread.currentThread ());
             wait ();
             timer.RegisterThread (Thread.currentThread ());
    
    because although this is logically correct, with Mesa-style monitors the woken thread does not get the change to immediately re-register itself, and therefore the simulated timing behavior is highly dependant on thread scheduling (i.e. a race condition for whether the timeout processing thread or the woken thread runs first).
The solution can be found by looking at the provided code which does interact with the timer. Note that this solution is something of a hack, and I will probably provide a cleaner interface for future assignments. The code for the SendBuffer in sender.java uses the following two code snippets to wait and wake the sending thread. For waiting, it does (inside a while loop which checks the wait condition, as is normal with monitors):
         try
            {
            waitingThread = Thread.currentThread ();
            timer.UnregisterThread (waitingThread);
            wait ();
            }
         catch (InterruptedException e)
            {
            }
and to wake that thread it does:
         if (waitingThread != null)
            {
            timer.RegisterThread (waitingThread);
            waitingThread = null;
            notifyAll ();
            } 
Note by the way that it is not an error to register a thread which is already registered, nor is it an error to unregister a thread which is not registered. Registering or unregistering a thread which is sitting inside a WaitFor or WaitUntil call however is a bad idea.

The idea here is that a thread should unregister itself before it blocks to make sure that it won't deadlock, but that the thread which wakes a blocked thread should be responsible for knowing that that thread needs to be re-registered. This ensures that the timer will not be runnable to conflict with the woken thread, and ensures correct simulated time behavior. It also eliminates some of the niceness of monitors, because the waking thread needs to know exactly which thread is being woken. Hopefully the next version will fix this up by encapsulating this metaphor in a set of timer-specific wait and notify calls.

The timeout processing loop

The process by which time is advanced in the timer is the action of the timeout processing loop. All future events are stored on a priority queue inside the timer. Whenever it is okay for time to advance, the timeout processing thread wakes up and advances the simulated time to the time of the next event. This is correct since all threads which care about time have blocked waiting for future events, so the next thing that can logically happen is the first event on the queue. Events are processed (i.e. waiting threads woken or timeout callbacks made) until it is no longer legal for time to advance (which usually happens after only one event has been dispatched).

The consequence of this behavior is that if a thread does not block the timer when it must, the timer will always have work that it can do, because in general there is always another event on the queue. Since this is a simulated timer, simulated time can advance arbitrarily fast, which is why all processes which wish to do even partially realistic processing need to block the timer to make sure that they compute in zero simulated time. It is this property that makes programming the timer so tricky: basically threads which don't register themselves correctly break the assumption that is being made here that time may be advanced in large increments because nothing interesting was going to happen in the skipped region anyhow. Any thread which assumes that the timer thread will ever simply be idle is mistaken: if it is not explicitly blocked then it is very much runnable, always.

A note on this document

This document is basically a cross between trying to convince you not to do threading yourself and to use the primitives that I have given you, and an attempt to help people who do understand synchronization to get up to speed on how the Timer works. Everything I have said here is hard-learned experience gained by writing the two buffers and the network interface classes.

If you want to do low-level programming of the simulator, there are two resources to help you:

Good luck and have fun.

Andy


acollins@cs.washington.edu