Review: Improving the Reliability of Commodity Operating Systems.

From: Sellakumaran (ksella_at_hotmail.com)
Date: Wed Jan 21 2004 - 22:41:35 PST

  • Next message: Gang Zhao: "On behalf of David Winkler --Review: Improving the Reliability of Commodity Operating Systems"

    This paper describes a reliability subsystem Nooks which attempts to improve
    the reliability of Linux by isolating OS from failures. The approach is
    simple and effective and the paper describes in simple and readable
    language. The paper quickly catches reader's attention by stating some
    interesting facts about windows XP and Linux and a practical approach/goal
    to the problem. In the current popular OSes, the system crashes if there
    are any faults in OS extensions. Nooks isolates failures by running these
    extensions in a lightweight kernel protection domain.

     

    The Nooks architecture is based on two core principles:

    1) Design for fault resistance, not fault tolerance

    2) Design for mistakes, not abuse

    And with these principles, Nook's architecture has the following goals:

    1) Isolation (of failures)

    2) Recovery and

    3) Backward Compatibility

     

    These goals are achieved by creating a reliability layer which separates
    Kernel extensions from Kernel.

    Isolation is achieved by

    1) Light weight protection domains

    2) Extension Procedure calls (XPC).

    3) Copy-in/Copy-out

    4) Wrappers

     

    Recovery in Nooks consists of two parts. After a fault occurs, the recovery
    manager releases resources in use by extension and the user-mode agent
    coordinates recovery and determines what course of action to take.

    Nooks achieves transparency for extensions by

    1) Wrapper stubs for every function call in the extension-kernel
    interface

    2) Object-tracking code for every object type that passes between the
    extension and the Kernel.

     

     

    The paper then describes the test methodology used to prove the system
    reliability which is at an excellent level of automatic recovery from 99% of
    faults due to modules in Linux. The experiments use synthetic fault
    injection to rapidly insert faults in Linux kernel extensions.

    There is a performance hit because of Nooks but considering the increased
    reliability I think that it does a wonderful job.

     

    Nooks has taken a practical approach, modest goals and has come out with
    excellent reliability .

     


  • Next message: Gang Zhao: "On behalf of David Winkler --Review: Improving the Reliability of Commodity Operating Systems"

    This archive was generated by hypermail 2.1.6 : Wed Jan 21 2004 - 22:41:33 PST