From: Chuck Reeves (creeves_at_windows.microsoft.com)
Date: Wed Jan 21 2004 - 15:17:44 PST
The paper "Improving the Reliability of Commondity Operating Systems",
was submitted ot SOSP in October of 2003. It was writtent by a number of
researchers from the University of Washington. It's content describes
the design and function of a reliability system called Nooks. Nooks was
designed to improve the reliability of an OS by isolating the execution
of device drivers into a sphere of execution that protects the kernel.
The approach establishes backward compatability with the large base of
existing device drivers as a requirement right up front, making this
pertinent for consideration in commercial technology. The general
approach is to isolate the execution of extensions (drivers), detect
improper behavior before it has a chance to impact the integrity of the
kernel and then take configured corrective action to restore the
extension to a functioning state. The system is designed to detect and
correct coding mistakes that exist in extensions. It is not designed to
prevent malicious action on the part of extensions.
To accomplish all this the designers constructed a library of wrapper
stubs to proxy the interaction between the kernel and extensions. These
wrappers provide binarily compatible interfaces for kernel functions and
extension callbacks. When called they perform validated and audited
marshalling of references to kernel objects and memory references. If an
error is detected they initiate the configured recovery action for the
violating extension.
Recovery of resources associated with failed extensions is aided through
the maintenance of a object-tracking table. Populated by the wrapper
functions, this facility tracks the list of kernel resources in use by
each extension and the corresponding copies in the extensions protection
domain. The recovery facility does require that many of the kernel
resources (disk, file system) behave properly.
The test results presented in this paper indicate that this approach is
quite effective in detecting spurious driver faults and avoiding system
crashes. The overall performance measurements indicated mixed results.
The overhead of the wrapper functions and extension procedure calls
executing in the kernel had significant impact on the performance of
both compilation and the kernel level web server.
The effectiveness of this effort in reducing system crashes is
remarkable. There must certainly be something here to help avoid the
dreaded BSOD. Additionally, the fact that most of the work was
accomplished in 18 man months and was mostly automated makes this effort
notable.
This archive was generated by hypermail 2.1.6 : Wed Jan 21 2004 - 15:17:36 PST