Review of "Improving the Reliability of Commodity Operating Syste ms"

From: Muench, Joanna (jmuench_at_fhcrc.org)
Date: Wed Jan 21 2004 - 12:30:38 PST

  • Next message: ahemavathy: "Nooks review"

    Swift et al. (2003) present Nooks as a subsystem designed to improve the
    reliability of an OS by isolating driver failures. Given that approximately
    85% of reported failures in Windows XP are due to drivers (with similar
    problems in Linux), it is easy to motivate the need for such a subsystem.
    While research over the past 30 years has developed many new ideas on
    increasing system reliability, few of these ideas have been incorporated in
    the most widely used operating systems, Linux and Windows. Hence it makes
    sense to create a system that will work within a popular OS. For this paper
    the authors have chosen to implement their ideas on the Linux platform.

    The paper presents the essential concept of Nooks which is to wrap a layer
    around each Linux extension. The wrapping code re-routes control flow
    between the extension and the kernel to flow through a the XPC, an
    asymmetric control transfer mechanism. It also manages data transfer between
    the extension and the kernel with an object-tracking mechanism. In addition
    to the wrapping, or interposition, and object tracking, the reliability
    layer also includes isolation mechanisms to prevent damage to the kernel and
    recovery functions to enable the recovery from extension faults. The authors
    mention that the code to wrap each extension can be generated
    semi-automatically and the overall implementation time was relatively quick.

    Extensive testing proved that the Nooks subsystem significantly decreased
    the number of crashes, but at some performance penalty. The authors note
    that this cost must be weighed against how heavily the CPU is utilized,
    although clearly the dependability requirements of the system are another
    factor to be considered.

    I was interested in some of the design decisions discussed in the paper.
    Clearly deciding to protect against bugs but not malicious code was an
    important decision and contributed to the relatively minor performance hit
    of using Nooks. Rather than sniffing out every possible way in which an
    extension could damage a kernel, the authors use only lightweight protection
    domains. While this is satisfactory in the domain the authors have chosen to
    address, it does leave open the issue of malicious damage.

    This paper gives a satisfyingly practical approach to improving reliability
    with a good discussion of the trade-offs involved. While 100% fault
    reduction would probably be prohibitively expensive to implement on a Linux
    system (both in terms of development and performance), the paper presents a
    99% solution. Given the focus on bugs within extensions, I would have liked
    to see some discussion of the implications of adding bugs into the wrapper
    code, since presumably this code would be generated by the same cadre of
    less experienced extension programmers.


  • Next message: ahemavathy: "Nooks review"

    This archive was generated by hypermail 2.1.6 : Wed Jan 21 2004 - 12:32:05 PST