From: Richard Jackson (richja_at_expedia.com)
Date: Mon Mar 01 2004 - 17:03:13 PST
This 2002 paper by Whitaker, Shaw and Gribble discusses Denali, which is
an isolation kernel. Like a VMM, the purpose of an isolation kernel is
to allow systems or applications to run within a virtual sandbox,
preventing problems within the sandbox from affecting the rest of the
system. Still, the paper differentiates an isolation kernel from
various other systems, including: VMM, exokernel, and microkernels. The
main difference between an isolation kernel and a VMM such as Disco is
that backwards compatibility is not important, and the design focuses on
scalability, performance and simplicity.
This paper is divided into the following sections: 1) case for
isolation kernels, 2) Denali isolation kernel overview, 3) test
results.
The first section describes the need for such a system. The main
arguments are: 1) supporting dynamic delivery of content in Content
Distribution Networks(CDN), 2) allowing new services to be pushed onto
existing internet infrastructure without risk, 3) a framework/container
for internet measurement using systems such as Chord. Of these, #2
seemed the most plausible, as I'm sure that most internet services would
like to run untrusted code in a sandbox-like environment to prevent
large-scale failures. The other points seemed to be variations of #2.
Some key design principles are discussed regarding isolation kernels.
These are: 1) low-level resources are exposed, as high-level services
are prone to "layer-below" attacks, 2) prevent sharing between
namespaces - each isolated kernel has its own namespace, 3) Zipf's law
must be considered when designing for scale - unpopular data is large
and will need to be accessed efficiently, 4) simplify typical VMM
architecture to reach goals of simplicity, scale and performance -
mainly by eliminating the backward-compatibility requirement.
In the second section, the Denali kernel was discussed. This section
described the instruction set, the memory architecture, and the IO
devices. For the instruction set, they used a subset of the x86
instruction set, and also added 2 key instructions. They are an
idle-with-timeout instruction, and an instruction that allows the VM to
be terminated. For memory, each virtual machine has a 32-bit virtual
address space and a swap region that is striped across multiple physical
disks. The kernel is mapped into the address space, and therefore
kernel accesses are not expensive. Regarding devices and interrupts, an
interesting feature of Denali is that interrupts are batched. That is,
if a VM has been swapped to disk, the interrupts are accumulated by the
VMM and delivered to the VM at once, after the VM has been started
again.
For testing, they used Denali to test a web server and a Quake II game
server. For the web server, they differentiated between in-core and
out-of-core behavior. For the former, all data fits into memory, and
the performance was very good. For the latter, data must be read from
disk, and performance is much worse. Still, using the out-of-core
method, they were able to scale to 10k virtual machines, which was one
of their main goals. While probably not very practical, I suppose that
this demonstrates the ability of their system to scale without major
problems aside from extreme slowness. Another key aspect of the
testing results was the idle-with-timeout instruction, which allows the
VMs to idle the processor much more efficiently. This feature seemed to
be key to the success of Denali.
The major benefits of this paper were: 1) an alternative VM design that
will work on commodity x86 hardware(unlike the Disco VMM), 2) better
integration with OS by eliminating backward-compatibility requirement
and building an idealized OS called Ilwaco.
The weaknesses of this paper were: 1) the first part of the paper spent
much time talking about using isolation kernels to run untrusted code in
a sandbox-like environment, but the rest of the paper seemed to diverge
from this premise and instead focus on VMM-specific issues. 2)
backward compatibility was not part of the design, 3) the Quake server
example was questionable because of the self-imposed 100ms delay that
the server added, 4) the achieved goal of 10k-services seems
questionable, as it was not clear how well the overall system performed
at this quantity of VMs.
This archive was generated by hypermail 2.1.6 : Mon Mar 01 2004 - 17:03:24 PST