Review of Reliability of Commodity OS

From: Honghai Liu (liu789_at_hotmail.com)
Date: Mon Jan 19 2004 - 13:54:01 PST

  • Next message: Manish Mittal: "Improving the Reliability of Commodity Operating Systems"

    Reviewer: Honghai Liu

     

     

    One of the most annoying things people experience with their desktop computers is that it could accidentally crashes. This is mainly caused by the extension failure, such as device drivers. The paper tries to address one of the most important issues in today's commodity OS: how to improve reliability of operating system, isolate and recover from failure (which would course failure otherwise) without scarifying the backward compatibility.

     

    It is surprising to see so many distinct approaches to address the issue, including capabilities, microkernels, language support and etc. However, Nooks, which the paper is based on, is very unique: trying to recover from the failure while minimizing the change to the existing extension codes.

     

    Nooks is implemented on Linux, a monolithic system. The architecture is to insert the Isolation Manager layer between OS kernel and kernel Extensions. Inside the isolation manager, there are four components: Isolation ( prevent extension errors from damaging the kernel), Interposition(transparently integrate existing extensions into Nooks environment), Object Tracking(oversee all kernel resources used by extension) and finally recovery(detect and recover from extension faults).

     

    It is interesting to see that for extensions, they can be classified into two categories: interrupt oriented and process oriented. In Linux, errors in interrupt-oriented code are treated as fatal and lead to crashes. Errors in process-oriented code (initialed by system calls) are not regarded as fatal. Results show that Nooks can substantially reduce crashes caused by the errors in interrupt-oriented extensions, as well as recover from errors and unstable status caused by process-oriented extensions. Specifically, Nooks can automatically recover from 99% faults that caused Linux to crash, with performance degrade between 10% and 60%.

     

    The success of the Nooks on monolithic system like Linux clearly indicates its promising future in applying on other OS (i.e., Microkernel has separate address for extensions to talk to kernel).


  • Next message: Manish Mittal: "Improving the Reliability of Commodity Operating Systems"

    This archive was generated by hypermail 2.1.6 : Mon Jan 19 2004 - 13:54:17 PST