From: Gail Rahn (grahn_at_cs.washington.edu)
Date: Wed Jan 21 2004 - 15:53:55 PST
Review of Nooks
This paper introduces Nooks, an OS subsystem that enhances reliability by
isolating device drivers from other operating system resources. Using Nooks,
device driver failures are protected from damaging the reliability of the
running operating system. A device driver crash no longer jeapordizes the
running environment. Nooks assumes a generally well-intentioned operating
system extension, that extensions are well-behaved but may fail due to
errors in design and implementation. Nooks does not protect against an
extension process directly addressing kernel memory. At about 20,000 lines
of code, Nooks does not introduce extensive complexity into the operating
system.
Device drivers are faulty because they are, generally, written by
third-party developers inexperienced in operating system design. These
developers are good at interfacing with a specific piece of hardware, but
not necessarily experienced at producing software with the reliability
desired of operating-system components. For a solution to effectively handle
and recover from device driver crashes, it must run efficiently and
integrate into existing systems. Nooks is a unique because it attempts to
manage OS extension failures in existing systems, using existing drivers
and the sofware development environment (C). Previous solutions force
"newness" somewhere in the cycle - a new architecture, a new programming
language, etc.
The architecture of Nooks isolates the kernel from driver failures. Nooks
implements an isolation manager (NIM) that handles isolation, tracking and
recovery by intercepting all communication between the kernel and its
extensions. Nooks implements the Extension Procedure Call (XPC) to manage
this control transfer. Managing extension/kernel communication like
marshaling and transporting RPCs, Nooks is able to capture and track the
calls made between OS and extension. Nooks also inserts itself as an
intermediate layer between references to kernel data structures.
Because Nooks tracks function calls and data structure references between
the operating system and its kernel, Nooks can attempt to recover resources
in use by the extension at the time of its failure.
An interesting part of Nooks' recovery is its set of configuration files,
files that determine how to recover from a specific extension failure. Nooks
knows about the default recovery policy for an extension, which can include
system manager notification. Nooks can also notice when a driver fails too
frequently, and escalate the recovery-handling policy, even including
disabling of the system extension.
A compelling point made about this transparent protection mechanism is its
highly variable performance benchmarks. Nooks has a unique footprint for
each protected extension. Using Nooks, the performance of some extensions
dropped by less than 10%. With others, including a kernel-mode webserver,
the performance penalty was almost 60%. That performance implies that
deciding to protect a specific kernel extension using Nooks should be made
on a case-by-case basis and after thorough. Couldn't this performance
penalty be a drawback for acceptance? Is there any evidence that extensions
that are expensive to protect are precisely the set of processes that should
be managed?
Gail Rahn
grahn_at_cs.washington.edu
This archive was generated by hypermail 2.1.6 : Wed Jan 21 2004 - 15:54:18 PST