From: Raz Mathias (razvanma_at_exchange.microsoft.com)
Date: Mon Jan 26 2004 - 16:43:04 PST
This paper sought to provide a solution for a very clear technical
problem. Kernel threads have the advantage of being able to block
independently of each other on I/O and page faults. User threads all
block simultaneously when the underlying kernel thread blocks.
Therefore kernel threads are advantageous over user threads in low
memory and high I/O situations. User mode threads are an order of
magnitude faster than kernel threads in all other situations. The paper
investigated an interesting compromise between the two.
The paper makes the point clear that user-level threads are faster than
kernel threads for most operations for a number of reasons. Kernel
threads need to employ heavy-handed protection mechanisms to insure the
kernel's integrity. Furthermore, kernel threads pay a cost for their
generality. In absolute terms, neither of these overheads present a
problem. Relatively speaking though, user-level threads are close
enough to procedure calls in terms of their overhead that kernel threads
are more than an order of magnitude slower.
The key to understanding the solution presented in the paper (I thought)
lies in the following two phrases: "...the kernel needs access to
user-level scheduling information" and "the user-level support software
needs to be aware of kernel events." In other words, more communication
(one might even go so far as saying coordination) needs to occur between
user and kernel modes with regards to thread scheduling.
The mechanism is fairly straightforward. The kernel maintains a
"scheduler activation," which is a pair of kernel/user mode stacks that
the kernel uses to notify the user-mode scheduler when user threads call
into the kernel or when the kernel processes an event (e.g. an I/O or a
page fault completes). Each of these scheduler activations represent a
"virtual processor" on top of which the user-level thread scheduler
schedules user threads. When a user-mode thread issues a blocking I/O,
the kernel executes an "up-call" to the user-mode scheduler who can then
decide to move all the other user mode threads (which would have
otherwise also blocked on the I/O) onto a different "virtual processor."
In this way, the user scheduler has the opportunity to intercept kernel
calls and keep user threads scheduled on unblocked kernel threads.
Likewise, when an I/O completes, the user-mode scheduler is up-called
again, which has the option of continuing the thread that issued the I/O
or continuing another thread. In addition, the user-mode scheduler can
also request more virtual processors from the kernel, allowing for
information-passing the opposite direction.
Some interesting variations on the idea also appeared in the paper. For
example, the kernel may decide to take away idle virtual processors from
a process or may redistribute the virtual processor to other processes
that may have higher priorities. These are accomplished using the same
up-call mechanism to inform the user-mode scheduler that a processor is
disappearing. Another interesting nuance was the fact that a virtual
processor may be preempted, causing a switch into the user-mode
scheduler, which allows the scheduler to implement priority scheduling
amongst threads. A fair amount of the paper dealt with the problem of
locks in user-level threads. I found following line particularly crisp
and intuitive in its explanation of why blocking can cause deadlocks
even though there may be other threads running: "because a kernel thread
blocks when its user-level thread blocks, an application can run out of
kernel threads to serve as execution contexts, even when there are
runnable user-level threads and processors."
I believe it is interesting to compare the solution presented here with
the concept of I/O completion port queues and fibers (for scheduling
user threads) in Windows. The Windows solution is to simply create a
thread per virtual processor waiting on an asynchronous I/O completion
queue, and to run the user-mode scheduler (using the light-weight fibers
to switch between user threads) from the kernel threads. Whereas the
paper states that the kernel decides how many virtual processors are
available to a process, I/O completion port queues allow the process
itself to determine how many kernel-threads wait on a queue. With both
solutions, an underlying queue must exist to hold onto I/O completions
for an indefinite period of time. The method presented in the paper
asserts that all sorts of kernel transitions (page faults, etc.) can
cause an up-call into the user-mode scheduler. Such is not the case (in
a general sense) with IOCP. Despite their differences though, their
similarities are astounding.
I could see the inspiration for Window's I/O completion port queues
possibly originating from this paper. In our class discussion, I would
like to find out if such was the case.
This archive was generated by hypermail 2.1.6 : Mon Jan 26 2004 - 16:43:08 PST