From: Praveen Rao (psrao_at_windows.microsoft.com)
Date: Mon Jan 26 2004 - 17:38:29 PST
Authors argue that user level parallelism management is inherently
better than kernel-level management of parallelism and that it is not an
artifact of current kernel implementation. But if parallelism management
is done at the user-level without kernel involvement, the performance is
reduced and even correctness is jeopardized.
Systems with user level threads treat process as virtual processor. This
hides the underlying realities of page fault, I/O, multi-programming
etc. and causes the system to have poor performance and even incorrect
behavior. Why user-level threads can't be integrated well with the
system (or be built on top of kernel threads):
- kernel threads block/pre-empt/resume without notification to user mode
- kernel threads are scheduled obliviously wrt user-level threads state
In contrast, kernel threads are aware of all these realities but are
heavyweight and inherently have worse performance than user-level
threads. Kernel threads can't be made as efficient as user-level threads
because of
- cost of accessing thread management functions which are in kernel
- cost of generality - e.g. kernel implements pre-emptive priority
scheduling as a general solution but application may benefit from LIFO
scheduling
The solution proposed integrates above two solutions to form a best fit
- user-level threads with relevant kernel events exposed to user-level
thread management.
The proposed system has the following features:
- kernel has control over how many physical processors to give to a
process's virtual multi-processor
- user-level thread system has control over which threads to run on the
allocated v-processors
- kernel notifies the user-level thread system whenever kernel changes
no. of processors assigned or an I/O event occurs
- user-level thread system notifies kernel when it needs more or fewer
processors (doesn't happen very often)
- app programmer sees no difference except perf when working directly
with kernel threads
The issue that a user level thread could be holding a critical section
when it is pre-empted or blocked is discussed. This can cause poor perf
and even deadlock. The proposed solution is to allow the thread to run
via a user-level ctx switch unitl it releases the lock.
The implementation modified Topaz kernel to support user-level thread
management and FastThreads user-level thread package to exploit the
kernel support.
The modifications were:
Topaz modifications: upcall where it formerly blocked/resumed/pre-empted
- explicit allocation of processors to processes
FastThreads modification:
-process upcall
- resume interrupted critical sections
- provide Topaz information about its processor allocations
There were also perf optimizations made - no cost lock check and re-use
of pre-allocated and discarded scheduler activations.
I liked how even small degradation in perf (as compared to unmodified
FastThreads) was carefully explained. Upcall perf was worse than authors
expected. Authors attributed it to - not using hand tuned assembly, and
reuse of existing kernel code (as opposed to starting from scratch).
This archive was generated by hypermail 2.1.6 : Mon Jan 26 2004 - 17:38:41 PST