Review of Nooks paper

From: Praveen Rao (psrao_at_windows.microsoft.com)
Date: Wed Jan 21 2004 - 00:50:07 PST

  • Next message: Ian King: "Review: Swift, Bershad & Levy, Improving the Reliability of Commodity Operating Systems"

    This paper discusses an approach to make commodity operating systems
    more reliable by managing execution environments for kernel extensions,
    namely, drivers. It mentions that driver related bugs are major cause of
    system failures (85% in case of XP).

     

    Authors contrast their approach with some of the other approaches taken
    to address this issue. These are:

    * Hardware based approaches: protection rings, capability based
    architectures etc. - the latter requires hardware changes (as currently
    popular architectures do not use it) and in general these approaches
    don't address recovery.
    * Micro-kernel approach to isolation - separate address space for
    drivers. These do not address recovery either. In general, perf concerns
    keep commodity OSes from adopting this approach despite improvements in
    IPC.
    * Type-safe languages and runtime: these haven't found acceptance
    for system code and require starting over.

    In the past virtual memory techniques have been used to protect database
    and file systems. Nooks, the system discussed in this paper, attempts to
    extend it to OS. Nooks takes an approach of virtualizing interface
    between kernel and extensions.

     

    It is designed for

    * Fault resistance not tolerance
    * Protecting against mistakes not malacious code

    Authors state the following goals for nooks:

    1. Isolation
    2. Recovery
    3. Backward compatibility

    To achieve these goals nooks performs the following functions:

    1. Isolation of extensions: isolation of address space and remoted
    (XPC) calls to the kernel
    2. Interposition: integrating existing extensions into nooks
    environment using interposition
    3. Object Tracking: keeping track of data structures that the
    extension touches
    4. Recovery: on software faults (bad params, excessive resource
    usage), hardware faults (processor generates exception)

    Isolation

    Isolation has two aspects:

    1) memory management - to implement lightwight protection domains with
    virtual memory protection
    2) XPC - to transfer control safely between extension and the kernel

    Inetrposition:

    * of control transfers - loader modification for kernel and
    function pointer fixup for extension
    * and of some data references - shadow copy optimization is used
    for frequently touched data
    * wrappers do the following 3 tasks:

                    1. parameter validation
                    2. provide call-by-value semantics
                    3. facilitate XPC

    Object Tracking:

    * Manages writes to kernel done by extensions
    * Differentiates between objects of single XPC calls and long term
    objects, most long term objects have predictable pattern - alloc/dealloc
    of extension are known, in some cases semantics are known

    Recovery

    Recovery manager releases resources in case of failures and restarts the
    extensions based on policy.

    The paper states that nooks works well when there is a narrow
    well-defined interface for interaction. This makes extensions suitable
    for nooks. Extensions also deal with opaque data and often batch calls,
    which matches nooks' implementation furthermore.

    Nooks is better at dealing with fatal failures (which are easy to
    detect) than non-fatal failures. The paper argues that in case of
    non-fatal failures (e.g. deadlock/data corruption) the failure is
    localized anyway. I am not convinced of this. The extension could
    corrupt system structures and hang. Also, deadlock in an extension needs
    to be recovered from as it might disable a significant functionality
    (e.g. what if my display driver hangs) of the system and keeping system
    alive may not server much purpose.

    There is detailed discussion on perf impact of nooks (something that
    system implementers would like to know if they were to consider
    incorporating it). It is argued that most perf impact (when there is) is
    because of TLB flushes in case of frequent XPC calls.

    The paper makes the point that this approach requires no modification to
    the extensions, which is very useful given the number of existing
    extensions.

    I contract the approach of Nooks with the approach of streamlining
    drivers such that minimal work is done in kernel-mode and most logic is
    in user-mode. In user-mode process isolation can be provided (which is
    much easier to achieve given the current state of commodity OS's) and
    such approach would work well for most non-performance critical drivers.

    Though it is argued that Nooks is easy to implement, I feel that it does
    require moderately complex tinkering of the OS.

     


  • Next message: Ian King: "Review: Swift, Bershad & Levy, Improving the Reliability of Commodity Operating Systems"

    This archive was generated by hypermail 2.1.6 : Wed Jan 21 2004 - 00:52:22 PST