Review: Denali isolation kernel

From: Cem Paya 98 (Cem.Paya.98_at_Alum.Dartmouth.ORG)
Date: Mon Mar 01 2004 - 15:21:10 PST

    Paper review: Denali isolation kernel

    CSE 551P, Cem Paya


    This paper describes an architecture for sandboxing
    applications by running dozens or even hundreds of virtual
    machines on the same commodity hardware. In effect it
    pushes the containment mechanism of the sandbox into the
    lowest layer possible. Granularity of sandboxing in OS
    design has traditionally been at the process level.
    Mutually distrustful processes can be run and security
    policy attempts to isolate them from each other. One level
    up are sandboxes implemented at the programming language or
    runtime level, as in the Java virtual machine or MSFT
    common runtime library. Authors argue that the latter
    suffer from “layer-below” attacks where malicious code is
    able to sidestep the restriction by directly accessing a
    lower layer API than what is exposed in the sandboxed
    environment. A well-designed sandbox does not allow this,
    with the exception of bugs. Nevertheless there is a
    compelling argument here that by reducing the surface of
    what is below one can defend against such oversights.
    Denali goes directly to the hardware layer, emulating
    (approximately) the bare metal of hardware—that does not
    leave much room for hidden layers below with opportunities
    for overriding the sandbox policy.

    This is very promising but the risk model is left
    unspecified. Stated goal is “making it possible for
    untrusted software services to be pushed safely into 3rd
    party hosting infrastructure.” It’s not clear what assets
    are being defended against what threats here. For example
    running 2 services on different VMs may offer complete
    isolation within the physical machine boundary but does not
    prevent one from corrupting a backend database the other
    depended on, DoSing the other application by taking
    advantage of the high-speed emulated Ethernet. Neither is
    there any defense against bugs that allow an application
    running inside VM to be compromised and used as stepping
    stone for attacking other systems, including those that are
    on the same 3rd party infrastructure. This is exactly why
    OS-level sandboxing such as ability to restrict use of
    shared resources like memory, bandwidth and disk I/O are
    important. Isolation can effectively stop one VM from being
    able to even name the resource of another, but if they are
    running on same hardware there is indirect interaction,
    contention for shared resources, subliminal channels etc.
    Another assumption regarding cost-effectiveness of running
    services on dedicated hardware is too broad: sometimes
    infrastructure providers do in fact prefer to map unrelated
    services to distinct physical hardware. This can simplify
    adminstration, deployment and maintanance. Cost of
    acquiring an additional PC (recall these are headless
    servers without monitors) is rarely the limiting factor


    Architecturally Denali is like an operating system (in fact
    the authors point out Exokernel as an analogy) that
    multiplexes the hardware between multiple VMs. Each VM is
    free to run its own mini-operating system that is linked
    into the specific application, similar to library OSes in
    Xok. VMs are managed similar to the way a traditional OS
    managed processes, they can be paged out to disk. Scaling
    is very impressive: some of the perf bencmarks described
    involve upto 4000 (!!) VMs on the same 1.7GHz Pentium.
    Usual hardware interface is altered slightly with new I/O
    model that reduces number of port-IO calls for basic
    operations such as sending one Ethernet packet,
    semantically different interrupt model where multiple
    physical interrupts are batched when a VM is not executing
    and delivered asynchronously. Memory model is a little
    different (virtual MMU in the works but not implemented
    yet, each VM currently gets only 16MB that seems barely
    enough to fit the code segment for an Internet service
    these days), swap spaces are allocated to each VM. Denali
    is very economical with memory: only about 8.5K per VM is
    used by the isolation kernel most of it for the stack.
    There is also a virtualized that appears to be using MAC
    address spoofing to give the apparance of multiple Ethernet
    cards to multiplex network hardware. In performance
    benchmarks the system performs very well. In one
    representative scenario for HTTP web serving, the overhead
    for VMs is minor ~5% compared to native BSD.


    Denali allows no sharing between different VMs, trying to
    create the appearance of “air space” eg different physical
    machines on the same hardware. Any interaction happens
    through the network—which is emulated in memory but
    semantically looks like an Ethernet, just as if they were
    physically isolated PCs. In criticizing the other
    sandboxing approaches such as OS level, the authors argue
    that they still allow interaction because namespaces are
    shared. Seen from a different angle this points to a lack
    of inflexibility in the Denali sandbox: higher level
    approaches can express a wide spectrum of policies—
    including 100% isolation, as a special case—whereas Denali
    is hard-wiring one.


    Most controversial feature by far is explicit design choice
    to not emulate the underlying hardware with 100% fidelity.
    This is partly because reigning architectures are either
    not virtualizable (authors cite a paper published at 9th
    Usenix Security symposium, which this reviewer attended the
    presentation for, that focused on x86 instruction set and
    identified scores of instruction that could not be
    virtualized) or lead to serious performance inefficiency.
    For this reason Denali present an altered view of the
    hardware, with additional registers, special instructions
    (particularly the idle-with-timeout primitive) etc. which
    can not run existing OS. Authors describe this as giving up
    support for “legacy operating systems” problem but legacy
    is a misleading adjective here—that word implies migration
    towards new architecture and deprecation of an existing
    one. Contemporary OS developers have been perfectly content
    with architectures such as x86, Alpha or IA64 which were
    not designed for efficient emulation and are not moving to
    a different model compatible with Denali. Not even a brand-
    new OS written yesterday could be compatible with Denali.
    Granted not running commodity off-the-shelf OS (and
    applications for that matter) is not a problem in
    principle, but can severely limit the applicability of
    Denali to solving immediate problems such as app hosting
    for Internet services. It also means reinventing the wheel:
    all of the scaling experiments required building a new
    guest OS, porting the BSD TCP/IP stack etc. One good news
    here is that existing development tools such as gcc
    continue to work (but changes were required to the
    linked “ld”) so existing apps with source code can be
    ported trivially with simple recompilation. Modifying an
    existing OS kernel to support Denali is more involved and
    not for the faint-of-heart. There is a deeper problem that
    the VM is no longer transparent—applications can detect
    easily whether they are running on “real” x86 or the
    emulated Denali variant. It’s no longer obvious that an
    application working correctly on native hardware can’t have
    any subtle bugs that reproduce only in the VM—when the VM
    is transparent this is the a priori expectation. Malicious
    code or an attacker trapped in a honeypot can infer that
    they are not running on bare metal and alter their behavior.


