From: Cem Paya 98 (Cem.Paya.98_at_Alum.Dartmouth.ORG)
Date: Mon Mar 01 2004 - 15:21:10 PST
Paper review: Denali isolation kernel
CSE 551P, Cem Paya
This paper describes an architecture for sandboxing
applications by running dozens or even hundreds of virtual
machines on the same commodity hardware. In effect it
pushes the containment mechanism of the sandbox into the
lowest layer possible. Granularity of sandboxing in OS
design has traditionally been at the process level.
Mutually distrustful processes can be run and security
policy attempts to isolate them from each other. One level
up are sandboxes implemented at the programming language or
runtime level, as in the Java virtual machine or MSFT
common runtime library. Authors argue that the latter
suffer from “layer-below” attacks where malicious code is
able to sidestep the restriction by directly accessing a
lower layer API than what is exposed in the sandboxed
environment. A well-designed sandbox does not allow this,
with the exception of bugs. Nevertheless there is a
compelling argument here that by reducing the surface of
what is below one can defend against such oversights.
Denali goes directly to the hardware layer, emulating
(approximately) the bare metal of hardware—that does not
leave much room for hidden layers below with opportunities
for overriding the sandbox policy.
This is very promising but the risk model is left
unspecified. Stated goal is “making it possible for
untrusted software services to be pushed safely into 3rd
party hosting infrastructure.” It’s not clear what assets
are being defended against what threats here. For example
running 2 services on different VMs may offer complete
isolation within the physical machine boundary but does not
prevent one from corrupting a backend database the other
depended on, DoSing the other application by taking
advantage of the high-speed emulated Ethernet. Neither is
there any defense against bugs that allow an application
running inside VM to be compromised and used as stepping
stone for attacking other systems, including those that are
on the same 3rd party infrastructure. This is exactly why
OS-level sandboxing such as ability to restrict use of
shared resources like memory, bandwidth and disk I/O are
important. Isolation can effectively stop one VM from being
able to even name the resource of another, but if they are
running on same hardware there is indirect interaction,
contention for shared resources, subliminal channels etc.
Another assumption regarding cost-effectiveness of running
services on dedicated hardware is too broad: sometimes
infrastructure providers do in fact prefer to map unrelated
services to distinct physical hardware. This can simplify
adminstration, deployment and maintanance. Cost of
acquiring an additional PC (recall these are headless
servers without monitors) is rarely the limiting factor
here.
Architecturally Denali is like an operating system (in fact
the authors point out Exokernel as an analogy) that
multiplexes the hardware between multiple VMs. Each VM is
free to run its own mini-operating system that is linked
into the specific application, similar to library OSes in
Xok. VMs are managed similar to the way a traditional OS
managed processes, they can be paged out to disk. Scaling
is very impressive: some of the perf bencmarks described
involve upto 4000 (!!) VMs on the same 1.7GHz Pentium.
Usual hardware interface is altered slightly with new I/O
model that reduces number of port-IO calls for basic
operations such as sending one Ethernet packet,
semantically different interrupt model where multiple
physical interrupts are batched when a VM is not executing
and delivered asynchronously. Memory model is a little
different (virtual MMU in the works but not implemented
yet, each VM currently gets only 16MB that seems barely
enough to fit the code segment for an Internet service
these days), swap spaces are allocated to each VM. Denali
is very economical with memory: only about 8.5K per VM is
used by the isolation kernel most of it for the stack.
There is also a virtualized that appears to be using MAC
address spoofing to give the apparance of multiple Ethernet
cards to multiplex network hardware. In performance
benchmarks the system performs very well. In one
representative scenario for HTTP web serving, the overhead
for VMs is minor ~5% compared to native BSD.
Denali allows no sharing between different VMs, trying to
create the appearance of “air space” eg different physical
machines on the same hardware. Any interaction happens
through the network—which is emulated in memory but
semantically looks like an Ethernet, just as if they were
physically isolated PCs. In criticizing the other
sandboxing approaches such as OS level, the authors argue
that they still allow interaction because namespaces are
shared. Seen from a different angle this points to a lack
of inflexibility in the Denali sandbox: higher level
approaches can express a wide spectrum of policies—
including 100% isolation, as a special case—whereas Denali
is hard-wiring one.
Most controversial feature by far is explicit design choice
to not emulate the underlying hardware with 100% fidelity.
This is partly because reigning architectures are either
not virtualizable (authors cite a paper published at 9th
Usenix Security symposium, which this reviewer attended the
presentation for, that focused on x86 instruction set and
identified scores of instruction that could not be
virtualized) or lead to serious performance inefficiency.
For this reason Denali present an altered view of the
hardware, with additional registers, special instructions
(particularly the idle-with-timeout primitive) etc. which
can not run existing OS. Authors describe this as giving up
support for “legacy operating systems” problem but legacy
is a misleading adjective here—that word implies migration
towards new architecture and deprecation of an existing
one. Contemporary OS developers have been perfectly content
with architectures such as x86, Alpha or IA64 which were
not designed for efficient emulation and are not moving to
a different model compatible with Denali. Not even a brand-
new OS written yesterday could be compatible with Denali.
Granted not running commodity off-the-shelf OS (and
applications for that matter) is not a problem in
principle, but can severely limit the applicability of
Denali to solving immediate problems such as app hosting
for Internet services. It also means reinventing the wheel:
all of the scaling experiments required building a new
guest OS, porting the BSD TCP/IP stack etc. One good news
here is that existing development tools such as gcc
continue to work (but changes were required to the
linked “ld”) so existing apps with source code can be
ported trivially with simple recompilation. Modifying an
existing OS kernel to support Denali is more involved and
not for the faint-of-heart. There is a deeper problem that
the VM is no longer transparent—applications can detect
easily whether they are running on “real” x86 or the
emulated Denali variant. It’s no longer obvious that an
application working correctly on native hardware can’t have
any subtle bugs that reproduce only in the VM—when the VM
is transparent this is the a priori expectation. Malicious
code or an attacker trapped in a honeypot can infer that
they are not running on bare metal and alter their behavior.
Cem
This archive was generated by hypermail 2.1.6 : Mon Mar 01 2004 - 15:21:28 PST