A note on thread synchronization in the face of the
Timer
The presense and behavior of the Timer class has an
interesting interaction with the use of long-lived threads using
Java's thread synchronization primitives. Failure to understand what
the timer is doing can lead to a variety of deadlock and livelock
situations, in which simulated time is either blocked from advancing
when it must (deadlock) or advances too fast, leaving other processing
in the dust (which with many JVMs looks like a form of livelock). The
fundamental rule for all threads in the face of the timer is that they
must be in one of two states.
- Waiting for simulated time to advance (or for any event which
will happen in the simulated future). In this case the thread
must not block the advancement of simulated time or
deadlock will result.
- Actively processing or runnable. In this case the thread
must block the advancement of simulated time, or else the
timeout processing will continue to run and tend to starve this
thread.
In my experience there is no middle ground where it is unimportant
whether a thread is or is not blocking the timer. Worse than that,
attepts to fight case 2 often lead to inadvertant violations of case
1.
The primitives that the Timer provides client programs
wishing to synchronize with it are the following:
- public void RegisterThread (Thread t)
- public void UnegisterThread (Thread t)
- public void WaitFor (long time, TimeUnit units)
- public void WaitUntil (long time, TimeUnit units)
These interfaces are designed for threads which simply wish to wait
until a specific future simulated time, and then be woken.
WaitFor and WaitUntil will block the caller until
the specified time (specified either as a delta from the current time
or absolutely). Before calling WaitFor or
WaitUntil, a thread must register itself using
RegisterThread. A registered thread must unregister itself
before it dies. The meaning of registering a thread is discussed in
the next section.
Registered threads
The current interface for blocking the advancement of simulated time
is controlled by thread registration. The rule is that simultated
time may only advance when all registered threads are blocked inside a
WaitFor or WaitUntil call. Otherwise it is assumed
that those threads are processing actions that need to take place
before time can advance.
The usual problem case is a thread which wishes to synchronize not
with the timer but with some other thread in the program, and
therefore wishes to use the primitive Java wait() and
notify() calls. Basically there are two things that can go
wrong:
- The waiting thread might be registered when it calls
wait(). This would prevent simulated time from advancing
while the thread is waiting. Unfortunately, it is usually the case
that the event being waited for cannot happen unless simultated time
does advance. Thus you will generally need to call
UnregiserThread (Thread.currentThread ()) before waiting.
- After the waiting thread is woken (i.e. when some other thread
calls notify()), the waiting thread might not be
registered. This would allow simulated time to continue to advance.
Since Java uses Mesa-style monitors, notifying a waiting thread does
not cause it to immediately run. However, if the timer thread gets to
run first, it will continue to advance time, and the woken thread will
not get a chance to run. Worse, it is not sufficient for the waiting
thread to use the code:
timer.UnregisterThread (Thread.currentThread ());
wait ();
timer.RegisterThread (Thread.currentThread ());
because although this is logically correct, with Mesa-style monitors
the woken thread does not get the change to immediately re-register
itself, and therefore the simulated timing behavior is highly
dependant on thread scheduling (i.e. a race condition for whether the
timeout processing thread or the woken thread runs first).
The solution can be found by looking at the provided code which does
interact with the timer. Note that this solution is something of a
hack, and I will probably provide a cleaner interface for future
assignments. The code for the SendBuffer in
sender.java uses the following two code snippets to wait and
wake the sending thread. For waiting, it does (inside a while loop
which checks the wait condition, as is normal with monitors):
try
{
waitingThread = Thread.currentThread ();
timer.UnregisterThread (waitingThread);
wait ();
}
catch (InterruptedException e)
{
}
and to wake that thread it does:
if (waitingThread != null)
{
timer.RegisterThread (waitingThread);
waitingThread = null;
notifyAll ();
}
Note by the way that it is not an error to
register a thread which is already registered, nor is it an error to
unregister a thread which is not registered. Registering or
unregistering a thread which is sitting inside a WaitFor or
WaitUntil call however is a bad idea.
The idea here is that a thread should unregister itself before it
blocks to make sure that it won't deadlock, but that the thread which
wakes a blocked thread should be responsible for knowing that that
thread needs to be re-registered. This ensures that the timer will
not be runnable to conflict with the woken thread, and ensures correct
simulated time behavior. It also eliminates some of the niceness of
monitors, because the waking thread needs to know exactly which thread
is being woken. Hopefully the next version will fix this up by
encapsulating this metaphor in a set of timer-specific wait and notify
calls.
The timeout processing loop
The process by which time is advanced in the timer is the action of
the timeout processing loop. All future events are stored on a
priority queue inside the timer. Whenever it is okay for time to
advance, the timeout processing thread wakes up and advances the
simulated time to the time of the next event. This is correct since
all threads which care about time have blocked waiting for future
events, so the next thing that can logically happen is the first event
on the queue. Events are processed (i.e. waiting threads woken or
timeout callbacks made) until it is no longer legal for time to
advance (which usually happens after only one event has been
dispatched).
The consequence of this behavior is that if a thread does not block
the timer when it must, the timer will always have work that it
can do, because in general there is always another event on the
queue. Since this is a simulated timer, simulated time can advance
arbitrarily fast, which is why all processes which wish to do even
partially realistic processing need to block the timer to make sure
that they compute in zero simulated time. It is this property that
makes programming the timer so tricky: basically threads which don't
register themselves correctly break the assumption that is being made
here that time may be advanced in large increments because nothing
interesting was going to happen in the skipped region anyhow. Any
thread which assumes that the timer thread will ever simply be idle is
mistaken: if it is not explicitly blocked then it is very much
runnable, always.
A note on this document
This document is basically a cross between trying to convince you not
to do threading yourself and to use the primitives that I have given
you, and an attempt to help people who do understand synchronization
to get up to speed on how the Timer works. Everything I have
said here is hard-learned experience gained by writing the two buffers
and the network interface classes.
If you want to do low-level programming of the simulator, there are
two resources to help you:
- Debugging stuff built in. If the timer deadlocks, it will print
out a status report including the queue and a list of registered
threads indicating which are inside timer wait functions. In
addition, several of the modules have debug messages you can turn on
to trace the functions being called. From this, with a fair chunk of
thought, you can usually determine who is at fault for a deadlock.
- If you get into trouble doing stuff like this, go ahead and send
me email, I should be able to help you out.
Good luck and have fun.
Andy
acollins@cs.washington.edu