From: Richard Jackson (richja_at_expedia.com)
Date: Wed Jan 21 2004 - 16:07:52 PST
This 2003 paper by Swift, Bershad and Levy describes a novel method for
preventing device drivers from causing system failure. The system is
called Nooks, and is implemented within the Linux operating system.
While similar methods have been described in other papers, the method
presented here uses backwards compatibility as a key design constraint.
Therefore, this system is extremely interesting as a practical solution
to current problems.
The authors take an interesting approach to the huge problem of buggy
device drivers. That is, instead of fixing the bad drivers, create a
software layer that handles the failures gracefully and also can restart
the drivers after failure. Overall, this concept seems much more likely
to succeed than just asking all the device driver developers to fix the
buggy code.
In general, this paper describes a method for protecting the kernel's
operation from device drivers that may contain bugs or even malicious
code. While device drivers are the main source of this problem, the
paper observes that the same methodology can be used for other software,
such as web servers or file systems.
The paper introduces a concept called Nooks Isolation Manager(NIM),
which is a software component that isolates the kernel from kernel
extensions/interfaces. Then, clients(such as device drivers), will call
these kernel extensions and communicate with the kernel via the NIM
abstraction. The mechanism for these calls is labeled as an
XPC(Extension Procedure Call), which the authors differentiate somewhat
from LRPC or PPC.
A large section of the paper covers the various implementation details.
The primary goal of this is to transparently integrate with the system,
such that user applications do not need to change. This section also
includes a discussion of weaknesses of Nooks. One of these seemed
especially bad: a malicious extension could deliberately corrupt system
state, due to the extension running in kernel mode. Here, the authors
argue that the system isn't perfect, but the reliability gains are still
significant.
A discussion of reliability shows that Nooks solves the main problem;
99% of faults were handled without causing system crash. A large number
of non-fatal failures were also handled, and the authors suggest the
creation of a 'nanny' service to futher improve the handling of
non-fatal errors.
Regarding performance, my first impression is that performance must be
much worse than a traditional device driver scheme. It turns out that
performance is generally acceptable, but is application-dependant.
Applications that do many XPC calls may have very poor performance, up
to 60% worse. Other applications will be much better, with performance
loss of 10% or less. The paper presents performance analysis for
various components across the spectrum of possible uses of Nooks.
Overall, I found this paper to be refreshing. I think this design would
work very well to prevent many of the errors that happen today. The
only major concern I have is that Nooks could cause developers of device
drivers to become lazy and create even buggier code.
This archive was generated by hypermail 2.1.6 : Wed Jan 21 2004 - 16:08:07 PST