From: Ian King (iking_at_killthewabbit.org)
Date: Wed Jan 21 2004 - 09:36:51 PST
This paper describes Nooks, an infrastructure to enhance the stability of an
operating system by protecting the OS core from "extensions", such as device
drivers and file systems. Nooks is designed to supplement an existing operating
system and extensions with minimal or no modification. For this discussion,
Nooks is implemented on the Linux operating system.
The authors cite a well-known problem in the software industry: extensible
operating systems are rendered vulnerable primarily through their extensions.
For instance, in the Windows operating system, this paper claims that 85% of
system failures are because of defective device drivers; this matches my
understanding and experience. The observation is made that writers of device
drivers are not typically as skilled in the programming art as those who write
operating systems; one inference is that adding complexity to the task of
writing device drivers is not likely to improve the resulting quality. Further,
while some have advocated the use of type-safe languages as the solution (and
shown it can be effective), C is the most commonly used programming language for
system-level components; Nooks is targeted at this environment.
Nooks sets itself modest goals: rather than "making everything perfect
everywhere", the goal of Nooks is to provide "better" reliability. This is a
distinct departure from many papers, which seek to attain an ideal goal, and
identify how the resulting work product falls short of that goal. The Nooks
approach is intended to provide protection in the event of simple mistake, and
makes no effort to guard against intentional acts. This paper is quite pragmatic
in its goals and its approach.
An important element of the tool set employed is the concept of protection
domains, and the implementation is similar to that of Opal; however, the
intentions are more modest, and the feature set is likewise less comprehensive.
Rather than expressing rich sharing semantics through protection domains (as in
Opal), Nooks uses the principle primarily to restrain extensions from abusing
their privileged position in kernel space.
Nooks also employs protective entry points, which are added to the kernel and
the extensions by means of "wrappers." At its simplest, this means that an
extension wishing to communicate with the kernel instead calls a wrapper
function; the wrapper validates the call and passes it to the kernel. This
mechanism is used in both directions of communication. By validating
parameters, such common errors as passing null pointers or dangling references
are caught before they can lead to serious damage to kernel structures.
Nooks policy can be decided on a per-extension basis; while some extensions
might be safely restarted after a flaw (for instance, a serial port driver),
while others may require further action before they can be safely used (imagine
the situation where a driver defect has corrupted a filesystem). One of the
benefits of Nooks' wrapper structure is that the wrapper entry points also
provide an opportunity for tracking of all resources passed across that
boundary; upon failure of a driver, the Nooks infrastructure can use that data
to clean up resources that might otherwise be used in a corrupted state or lost
(leaked).
The authors point out that the intent of this protection is to allow the kernel
to remain functional; there is really nothing inherent in the Nooks strategy
that protects the objects accessed through the extensions, such as filesystems
or network streams. However, by allowing the kernel to maintain its integrity,
the opportunity exists for the kernel to recover the system in many cases.
(Note that this strategy is also used in certain existing operating systems such
as QNX, which is a microkernel architecture.)
The test strategy was discussed in considerable detail (a pleasant happening)
and was quite clever. The introduction of deterministic, pseudo-random
amendments to the test applications allowed for considerable variety in failure
modes. Recognizing that there are in fact commonly observed classes of defects,
the authors also introduced examples of such defects outside the automated
environment. The results were significant, with a large number of crashes being
avoided. There is still some question as to whether the system was left in a
truly stable state after the recovery, but elsewhere the paper discusses this
problem and states that Nooks as implemented cannot fully solve it.
One might be concerned that all this isolation and validation would impact
performance, and the authors set out to determine that impact with real-world
scenarios. There was always some degradation of performance, from negligible to
significant, with a 60% decrease in throughput for a Web server application
being the most severe. While this scenario was probably the most contrived (a
Web server IN the kernel?), it also demonstrated that Nooks could not make a bad
performance scenario better; the CPU utilization in this case was already very
high, and there is nothing inherent in Nooks that addresses performance.
On the positive side, Nooks definitely accomplished its goal of minimal impact
to the existing code base. Nearly all drivers were run as written, and the
modifications to the kernel were small and well-bounded. Moreover, for those
exceptionally inefficient scenarios, there is no reason to run them within the
Nooks protection scheme; unprotected extensions can coexist with Nooks-wrapped
extensions, with no effect of the Nooks modifications on the unprotected
modules.
Nooks presents a set of strategies to reduce the impact of defective device and
service drivers on an operating system. As with most protection or security
strategies, there is a tradeoff between safety and performance.
This archive was generated by hypermail 2.1.6 : Wed Jan 21 2004 - 09:57:13 PST