Section Notes - 1/30/03

Section Notes Outline:


Things to note about C in this project

You will see a lot of "typedef struct _sthread* sthread_t" and stuff like that all over the place. This is done to produce an abstraction layer in the API and is a standard technique in Object Oriented C. If you're curious about how/why/when these constructs should be used, drop by office hours and bug us.

All operations in C should be considered non-atomic unless you have reason to believe otherwise. Read the section notes from last week to get a feel for how to force C to do something atomically. Also, remember, atomic_t is your friend. Look at the functions in atomic.h. You do not have an atomic_test_and_set in this part of the assignment!!! If you _really_ want one, come talk to us about it.

Don't use C++. Why? Well...I don't know what interaction that thing may have with stuff.

Fundamental components and concepts of a scheduler

You are writing what is essentially an operating system scheduler. Thus you can view the scheduler as kind of a pseudo-kernel. A scheduler always needs these fundamental items:

The scheduling unit representation

In our assignment, the scheduling unit is a thread and it is represented by a struct _sthread. Its execution context (PC, registers, and stack) are all handled for you by the functions provided in sthread_user_ctx.h. You should not need to look in the implementation file for the ctx functions.

The timer construct

The preemptive nature of the scheduler itself is implemented via the Unix signal mechanism. Specifically, it is implemented using SIGVTALRM which is a signal that can be scheduled to go off at after some time interval has passed (look at the man pages for setitimer). This signal has the handler schedule_handler register to be called when this signal comes in. Imagine this as having the "schedule_handler" function called during the execution of your code. It goes on your stack and when it returns, your code continues execution like normal. Now, if you switch contexts in the middle of your signal handler, the same concept applies...but you need to be careful about how you think about things. Remember, after you switch contexts, have a different stack and execution state.

We use SIGVTALRM because this "virtual alarm" only decrements while your program is running. That means if the process using your thread library is swapped out by the operating system, the timer freezes until it is swapped back in. This makes it so that you don't have to care about what the OS scheduler is doing to you; if you give a thread a 5 millisecond quanta, it will get at least 5 milliseconds of cpu time (SIGVTALRM isn't exact. It just guarantees that you will get interrupted after 5 milliseconds has passed). You will probably need to use setitimer at least once in your init function to get the scheduler running.

Data structure representing the ordering of these units

There has to be a data structure that represents what threads exist in the system. This data structure depends on what scheduling algorithm you want to use. Essentially, the scheduler core will boil down to something like:

newThread = getNextFromDataStructure()
insertThreadIntoStructure(current);
switch context

To simplify things, let's just make it a round robin scheduler, so this data structure is just a queue of sthread_ts. Heh, we even give you an implementation of a link-list based queue of these guys. I (albert) even added an iteration construct in it so you could implement join easier using this data structure. Use the sucker.

The main thing is that this data structure should only be manipulated by the scheduler. Locks are expensive. Not sharing data is a good way to avoid using them. Besides, it's just cleaner. There is nothing wrong with global variables except that it encourages people to do stupid stuff like modify them everywhere with abandon. The beauty of local variables is that you know, by constraint of language, which pieces of code can change the variable. For globals, you should rely on convention. Even if you use global variables, you want to limit the places these variables get modified.

You don't need your create functions to modify this run queue. You don't even want it to ever touch the stupid thing. Let the scheduler be the only one that touches it. Actually, what you really want is a static local variable. If you know what that means, good. If you don't, just make a global queue and don't touch it anywhere other than the scheduler.

Persistent state over invocations of scheduling code

If you think of the scheduler, it is just a piece of code that executes periodically outside of your system. It isn't a thread. It doesn't get its context switched in and switched out quite the same as other threads. But, it is still just code that runs.

A problem with the scheduler is that it needs to preserve information across executions of its code. This isn't too easy as there is no "main" function that it can ask to store data for it; you can't keep the state of the scheduler on the stack between calls -- especially since the scheduler changes what the current stack is when it does a context switch. Persistent state is represented using static or global or even static global variables.

Concept of the current thread

The scheduler must have some conception of what thread is currently running. If nothing else, it must know what its swapping out during context switch. This data must be kept in some persistent form. Let's assume it is a global variable that is a sthread_t name current (hint hint).

There are some very interesting properties about this variable. First off, keep in mind that its mutation (that is, changing of which object it points to) is handled exclusively by the scheduler. Second, notice that for every thread, this variable refers to a different object. Third, all threads can access this variable.

These properties mean that the data structure pointed to by current is a kind of "thread global" context. That is, it is a repository of global variables that only your thread (and the scheduler) can see. That makes it ideal for passing of messages between the scheduler and the thread.

Issues involved in design

Synchronization issues

All shared data must be synchronized. This means all operations on this data must ensure mutual exclusion until the data is put back into a determinate state. In terms of non-locking operations, that means each operation must cause a complete, but atomic, update of shared data.

Asynchronous communication: message passing

When you are dealing with asynchronous code, you no longer have the clean semantics of "I call you, and you return to me when you are done" that a function call gives you. Instead, you're stuck with a mechanism akin to smoke signals. "I hold up a big sign to let you know of some condition and wait until I see you hold up a big sign back to tell me your reaction was." How do you ensure that all signs are handled? That you don't try to read something while I'm half-way through painting an old sign? That you don't take the sign away before I've read it? You do the same things that you do int all good programming; you keep good, well-defined, and strict conventions. Too often, people rely on language facilities to catch mistakes. Stop that. It's a bad habit.

So, let's say you use a shared variable to communicate state. You must ensure that this value can be updated in mutual exclusion of both readers and writers. One very simple, uncomplicated solution is to use atomic operations on this variable and make its state fully changeable with one atomic operations. This does limit the types of messages you can pass, but in many circumstances, it is enough. You will need this concept to build up more complex synchronization primitives like locks, etc.

Where does the work gets done?

It's a funny thing to deal with global variables for the first time. A temptation is to manipulate them everywhere. Bad idea. In all coding, you usually want to centralize your core of manipulations in an engine somewhere. That way, one thing controls your whole program. If you do it differently, you usually just make things more complicated.

Do most of your stuff in the scheduler. Your other functions should do no more than pass messages and data that the scheduler needs to do work (like sthread_user_create should just pass the scheduler the newly created and initialized thread struct for the new thread. It should NOT try to manipulate scheduler data structures).

The main thread

The main thread is, well, essentially your "main" function (int main(void)). This thread is special. It wasn't created by sthread_create. Should it be killable with sthread_cancel? What happens if it calls sthread_exit? What if another thread attempts to join it? What if it returns? Issues to think about.

Wherever possible, reduce concurrent access to data

If you want to pass data from one thread to the scheduler, do it via the object referred to by the current variable rather than a global. Now you've constricted access to only one 2 things: your thread and the scheduler.

Use invoke_scheduler

This function was meant for you to be able to explicitly call the scheduler. There are some oddities while suing this function. Remember that if you write something like:

current->state = NewState; invoke_scheduler(); fprintf(stderr,"Hi!\n");

That this may end up calling the scheduler more than once before Hi is reached. This shouldn't be too hard to work around, but keep things in mind.

Context switches and complications during thread startup

When you switch contexts, you change your stack. So, don't rely on local variables having the values before and after you call switch_context.

Also, this is going to be a bit confusing to think about first, but you should work out how the code gets executed during context switches. So, in my do_schedule code, I call switch context between two threads A and B. B is now current running. The scheduler invokes sometime later and I switch back to A, where does the code resume in A? Later, the scheduler invokes again and I switch back to B. Where does the code resume in B, and where is the code frozen (again) in A?

The above scenario represents the standard execution path across a context switch. If you work it out, during all context switches in "normal" operation source and return to the middle of do_schedule. The return of do_schedule then goes to schedule_handler which calls setitimer. Setitimer is what sets up the next invocation of the scheduler. If the context_switch ever goes to a place other than do_scheduler, then setitimer isn't called and the scheduler effectively freezes. There is one place that this happens: sthread_user_create. A new thread didn't get its state saved in the scheduler. When the scheduler swaps it in, it starts in its starter routine. Uh oh. Keep that in mind when coding up the starter routine. You need to call setitimer somewhere there.