Reliability paper review

From: Reid Wilkes (reidwilkes_at_verizon.net)
Date: Sun Jan 25 2004 - 21:58:16 PST

  • Next message: Prasanna Kumar Jayapal: "Review of Nooks paper - Swift et al..."

    This paper described "Nooks", a technology to improve the reliability of
    what the paper terms "commodity operating systems". This paper was quite
    refreshing to read - mostly I think because it deals with current and
    familiar technologies and also because it takes a decidedly "industry"
    approach. Rather than devising a solution to the problem of reliability by
    creating an entirely new system architecture or programming language, the
    authors in some sense "hack" a solution into existing operating systems.
    This seems to be a much more realistic approach to problems in today's
    commercial computer industry, and one much more likely to gain acceptance in
    industry. The idea behind Nooks is to create a protected area in which
    loadable kernel modules (a.k.a. drivers) can be run where it is less likely
    that their malfunctions will bring down the entire system. Clearly the need
    to help isolate driver failures from the rest of the system is huge, given
    some of the data the authors present on the proportion of crashes on Windows
    XP and Linux which are caused by driver malfunction. The authors also take
    another quite refreshing stand in the paper by stating quite clearly at the
    outset that goal of the project was not to prevent drivers from ever being
    able to bring down the system, but rather to help improve on the common
    failure cases. This again seems to be a highly practical approach and one
    which in my experience is more likely to be successful (at least in
    industry). The architecture of the system seems quite straight forward as it
    essentially intercedes proxies in the communication paths between the
    modules and kernel. These proxies are then able to check parameters and
    detect when a driver is malfunctioning. The system then allows failing
    modules to be reloaded and restarted. Most interesting to me about the
    architecture is the assertion made that very little kernel or driver code
    has to be modified to employ the Nooks system. Again, this engineer feat is
    quite important for the realities of the commodity OS market. The final
    section on the paper presents quantitative results of reliability and
    performance testing. The reliability results are quite impressive and would
    certainly hold promise for the concepts of the system. The performance
    results are a little disappointing, but maybe not surprising overall. It
    goes without saying that adding extra code into these paths in the kernel is
    going to slow things down - although the performance slowdown in some cases
    was quite dramatic. It was interesting to see that the authors had isolated
    this slowdown to TLB refreshing; it then stands to reason that were a system
    like this devised which might integrate tighter into the kernel - thus
    requiring more substantial kernel code changes, that much of this
    performance hit could be removed.


  • Next message: Prasanna Kumar Jayapal: "Review of Nooks paper - Swift et al..."

    This archive was generated by hypermail 2.1.6 : Sun Jan 25 2004 - 21:58:18 PST