From: David Coleman (dcoleman_at_cs.washington.edu)
Date: Mon Jan 19 2004 - 21:33:01 PST
Resource management has traditionally been considered a basic kernel
function and resided in the kernel. There are 2 basic reasons for this
design: the kernel is a protected environment allowing it to enforce
security policy and the kernel will provide a fair and balanced approach
to multiplexing resources in theory preventing individual applications
from hogging resources. Unfortunately this design does not allow for
application-specific optimization of resources. An exokernel (Xok)
system is one in which resources are protected by the kernel but managed
by the applications using them. The kernel provides the protection
system to ensure that unintentional sharing or corruption does not take
place but leaves the management of resources up to the individual
applications. An additional benefit or service that an operating system
/ kernel typically provides are abstractions such as files and virtual
memory which free the applications from these burdens if unneeded.
Unfortunately, in current operating system design, if this functionality
can be better provided by the applications, it does not have access to
it. To alleviate the necessity for all applications to implement
operating system abstractions they do not wish to, libraries (LibOSes)
can be built to provide the relevant operating system abstractions.
Presented in this paper is a library providing the UNIX operating system
abstractions called ExOS.
Protected sharing is implemented via four mechanisms: software regions
(analogous to segments in earlier readings), hierarchically named
capabilities, wakeup predicates, critical sections. The capabilities are
more like UNIX protection mechanisms than traditional capabilities.
Wakeup predicates are interesting in that they are done in a simple
Xok-created programming language and downloaded and compiled in the
kernel. Thus the kernel can, in theory, ensure that they are safe to
operate. Critical sections are used instead of locks because locks
require mutual effort. I’m not sure how much thought was given to
multiprocessor issues when critical sections were chosen over locks.
The disk subsystem, XN, is described in some detail. It appears that
basic disk block access is a two step process: bind a disk block to a
memory page via a Buffer Cache Registry and then map that page into your
address space. I/O to that block is then handled via reads/writes to the
page, in effect making all disk access memory-mapped. I thought the
concept of template-based descriptions of metadata in place of a trusted
kernel file system was an interesting approach to providing protection
of disk blocks. However, it requires (I believe) that the entire tree
above the disk block be in-memory and previously parsed. Also, I felt
the Multics rule of checking rights on every access was more secure
(albeit significantly slower) than only checking on binding. The rules
for preventing file system corruption are good rules to always use when
building a file system. However, it actually seems to be exceedingly
difficult to actually implement these rules. The tainted block approach
seems to require that virtually the entire directory structure to be in
memory in order to determine if a block is tainted. Because there seems
to be a block-level reference count, it is possible this is used to
determine tainted blocks. If so, that would significantly simply
identifying tainted blocks.
The concept of a LibOS is also an interesting one. The burden it removes
from the traditional OSes is protecting multiple processes from one
another. Because the OS is basically statically linked, it is really
only supporting a single process. Because of that, the abstractions it
provides can be much more efficient. In reality, the LibOSes are
dynamically linked, but by putting the management of the abstractions in
the private process address space, each process appears to be the only
process on the machine. This turned out to be a significant performance
gain.
Physical resource sharing becomes much more difficult on devices that
require parameter settings prior to use. Examples of these types of
devices include serial ports, sound cards (recording/input),
optical/magneto-optical drives, most other removable media devices, and
others. These types of devices generally require exclusive access and
thus cannot be shared.
A common optimization in file systems is to have separate data areas and
metadata areas, or, more generally, to optimize the layout of files on
disk. Because the layout can no longer be guaranteed with multiple file
systems operating on the same physical disk (the file system can no
longer assume it has a large fairly contiguous extent of blocks to work
with), I suspect pre-allocation of blocks would take place. This would
be a de-facto partitioning of the disk.
While I was skeptical of the concept at the beginning of the paper, I
could see some advantages for limited classes of resources. I can
actually see where we implemented an extension to our kernel file system
driver that would have been better served being managed by an
application. Unfortunately, it wouldn’t have fully worked for a variety
of reasons, but the concept might have been able to at least improve our
implementation. The performance gains described were significant.
However, performance often drops, sometimes by a large percentage, when
a complete implementation is produced. As such, final judement on the
performance of the Xok system must be withheld until a more complete
product is produce.
This archive was generated by hypermail 2.1.6 : Mon Jan 19 2004 - 21:33:24 PST