Improving the Reliability of Commodity Operating Systems Review

From: Chuck Reeves (creeves_at_windows.microsoft.com)
Date: Wed Jan 21 2004 - 15:17:44 PST

  • Next message: David Coleman \(Roxio Inc\): "Nooks review"

    The paper "Improving the Reliability of Commondity Operating Systems",
    was submitted ot SOSP in October of 2003. It was writtent by a number of
    researchers from the University of Washington. It's content describes
    the design and function of a reliability system called Nooks. Nooks was
    designed to improve the reliability of an OS by isolating the execution
    of device drivers into a sphere of execution that protects the kernel.
    The approach establishes backward compatability with the large base of
    existing device drivers as a requirement right up front, making this
    pertinent for consideration in commercial technology. The general
    approach is to isolate the execution of extensions (drivers), detect
    improper behavior before it has a chance to impact the integrity of the
    kernel and then take configured corrective action to restore the
    extension to a functioning state. The system is designed to detect and
    correct coding mistakes that exist in extensions. It is not designed to
    prevent malicious action on the part of extensions.

    To accomplish all this the designers constructed a library of wrapper
    stubs to proxy the interaction between the kernel and extensions. These
    wrappers provide binarily compatible interfaces for kernel functions and
    extension callbacks. When called they perform validated and audited
    marshalling of references to kernel objects and memory references. If an
    error is detected they initiate the configured recovery action for the
    violating extension.

    Recovery of resources associated with failed extensions is aided through
    the maintenance of a object-tracking table. Populated by the wrapper
    functions, this facility tracks the list of kernel resources in use by
    each extension and the corresponding copies in the extensions protection
    domain. The recovery facility does require that many of the kernel
    resources (disk, file system) behave properly.

    The test results presented in this paper indicate that this approach is
    quite effective in detecting spurious driver faults and avoiding system
    crashes. The overall performance measurements indicated mixed results.
    The overhead of the wrapper functions and extension procedure calls
    executing in the kernel had significant impact on the performance of
    both compilation and the kernel level web server.

     

    The effectiveness of this effort in reducing system crashes is
    remarkable. There must certainly be something here to help avoid the
    dreaded BSOD. Additionally, the fact that most of the work was
    accomplished in 18 man months and was mostly automated makes this effort
    notable.

     


  • Next message: David Coleman \(Roxio Inc\): "Nooks review"

    This archive was generated by hypermail 2.1.6 : Wed Jan 21 2004 - 15:17:36 PST