From: Muench, Joanna (jmuench_at_fhcrc.org)
Date: Wed Jan 21 2004 - 12:30:38 PST
Swift et al. (2003) present Nooks as a subsystem designed to improve the
reliability of an OS by isolating driver failures. Given that approximately
85% of reported failures in Windows XP are due to drivers (with similar
problems in Linux), it is easy to motivate the need for such a subsystem.
While research over the past 30 years has developed many new ideas on
increasing system reliability, few of these ideas have been incorporated in
the most widely used operating systems, Linux and Windows. Hence it makes
sense to create a system that will work within a popular OS. For this paper
the authors have chosen to implement their ideas on the Linux platform.
The paper presents the essential concept of Nooks which is to wrap a layer
around each Linux extension. The wrapping code re-routes control flow
between the extension and the kernel to flow through a the XPC, an
asymmetric control transfer mechanism. It also manages data transfer between
the extension and the kernel with an object-tracking mechanism. In addition
to the wrapping, or interposition, and object tracking, the reliability
layer also includes isolation mechanisms to prevent damage to the kernel and
recovery functions to enable the recovery from extension faults. The authors
mention that the code to wrap each extension can be generated
semi-automatically and the overall implementation time was relatively quick.
Extensive testing proved that the Nooks subsystem significantly decreased
the number of crashes, but at some performance penalty. The authors note
that this cost must be weighed against how heavily the CPU is utilized,
although clearly the dependability requirements of the system are another
factor to be considered.
I was interested in some of the design decisions discussed in the paper.
Clearly deciding to protect against bugs but not malicious code was an
important decision and contributed to the relatively minor performance hit
of using Nooks. Rather than sniffing out every possible way in which an
extension could damage a kernel, the authors use only lightweight protection
domains. While this is satisfactory in the domain the authors have chosen to
address, it does leave open the issue of malicious damage.
This paper gives a satisfyingly practical approach to improving reliability
with a good discussion of the trade-offs involved. While 100% fault
reduction would probably be prohibitively expensive to implement on a Linux
system (both in terms of development and performance), the paper presents a
99% solution. Given the focus on bugs within extensions, I would have liked
to see some discussion of the implications of adding bugs into the wrapper
code, since presumably this code would be generated by the same cadre of
less experienced extension programmers.
This archive was generated by hypermail 2.1.6 : Wed Jan 21 2004 - 12:32:05 PST