From: Cliff Schmidt (cliff_at_bea.com)
Date: Mon Mar 01 2004 - 15:52:00 PST
Disco is a layer between hardware and commodity operating systems to
support scalable SMPs without the large costs required to rearchitect
existing operating systems or design new ones. It accomplishes this by
applying the virtual machine monitor idea and adding to it shared copy-
on-write disks (among other things).
One of the interesting ideas behind this paper, which I didn't think
was emphasized enough, was that this appears (unless I misunderstood)
to allow for multiple operating systems on the same node as well as
multiple nodes sharing the same microkernel and resources. The former
scenario allows for flexibility, fault isolation, and heterogeneity,
while the latter scenario is more focused on scalability. I also felt
like there wasn't a very strong distinction made between the application
of Disco on loosely-coupled vs. tightly coupled multi-processors; but
then, maybe there is no reason to make this distinction.
I know that we've talked briefly about NUMAs before, but I felt like my
understanding wasn't quite solid enough to understand the importance of
Disco hiding the "NUMA-ness" of a machine.
The paper points out each of the main challenges of virtual machines
and proceeds to explain how Disco handles these challenges. The issues
are mainly around the overhead of virtualizing hardware resources, of
managing resource problems, and handling sharing/communication. Disco
addresses these issues and then does so with a modest performance hit,
especially compared to traditional virtual machines. Some of the ways
Disco chooses to deal with these overheads are:
- special addresses for load and store instructions, in place of
trapping CPU interrupts and access of privileged registers.
- shared cache for read-only pages across virtual machines
- dynamic page migration and replication, and the use of familiar
concepts like the handling of DMA operations such that the page is
marked read-only, which causes write attempts to result in the handling
of a copy-on write fault.
- use of a process control block-like data structure to represent
virtual CPUs
- use of TLBs, and addition of a secondary TLB, to provide quick
physical address to machine address mapping. (note that the MIPS
architecture, which bypasses the TLB for some direct accesses required
relinking the OS).
- most of the changes needed within an OS were limited to the HAL.
However, I was a little skeptical about the statement that the changes
were "simple enough that they are unlikely to introduce a bug in the
software". I would expect the testing around this would be enormous,
and it wasn't mentioned much in the paper.
This archive was generated by hypermail 2.1.6 : Mon Mar 01 2004 - 15:52:02 PST