Review: Disco

From: Raz Mathias (razvanma_at_exchange.microsoft.com)
Date: Mon Mar 01 2004 - 17:28:23 PST

  • Next message: David V. Winkler: "Review: Disco: Running Commodity Operating Systems on Scalable Multiprocessors."

    Today's paper was about a system named Disco that creates a thin
    interface layer between the machine and the operating system to allow
    multiple operating systems to safely run concurrently on one machine.
    Disco is a Virtual Machine Monitor (VMM), meaning that it virtualizes
    all the underlying physical resources of a machine.

     

    The paper compares the side-by-side approach to running multiple
    operating systems with clusters, where each individual application need
    not know it is in the cluster. In the case of Disco, it's each
    individual operating system that doesn't know its hardware is actually
    being virtualized. I thought that this comparison placed the VMM on a
    convenient continuum extending from the last batch of papers we read on
    clustering. The underlying VMM system provides several advantages that
    are almost completely transparent to the operating system itself. The
    benefits include more effective sharing of data through a local virtual
    network, machine fault tolerance through an added layer of abstraction,
    the ability to run specialized operating systems that confer performance
    advantages side-by-side with commodity operating systems, and the
    ability to effectively utilize new hardware for which operating system
    support is not yet built in (e.g. NUMA systems). Their disadvantages
    involve the fact that a certain amount of memory overhead is required
    and losses in performance. The performance situation here is analogous
    with the one for user-level thread packages. There is not enough
    information passing from the higher level down to the underlying system
    for the system to effectively schedule the higher layer on the available
    resources (e.g. the problem of an OS using up valuable CPU cycles by
    spinning). Likewise there isn't logic in the higher level system to
    necessarily account for the thrashing of resources that may occur with
    the presence of other systems on the same VMM (i.e. the problem of
    making effective policy decisions w.r.t. resource management).

     

    All the resources are virtualized in the system. The VMM actually
    schedules the CPU between the different virtual machines and emulates
    the execution of privileged instructions. Disco creates a new layering
    between the machine and "physical memory" which the virtual machine then
    maps into virtual memory. Disco maintains its own physical-to-machine
    address translation and flushes the TLB on switching virtual processors.
    This turns out to cause large degradation in the system performance. On
    NUMA hardware, Disco abstracts the underlying architecture to make it
    look like a number of SMP's. An interesting aspect of the
    implementation is the fact that pages are either moved or replicated
    between different CPU's memories when they are touched by different
    CPU's in the NUMA system; this results in effective use of the
    architecture on operating systems that have no concept of NUMA at all.

     

    Virtual I/O intercepts all calls to programmed I/O, thereby multiplexing
    the underlying devices. A couple of interesting cases arise in which
    Disco can actually improve sharing amongst VM's. The first is the use
    of copy-on-write disks, in which pages from disk are mapped to a single
    location in machine memory even though multiple VM's may actually be
    holding references to them. This is especially useful for kernel and
    application images which don't necessarily require that each virtual
    machine maintain its own copy. Upon reading, the VMM makes a copy of
    the page for that operating system. Another interesting case is that of
    the virtualized network. High-speed sharing amongst different operating
    systems happens through a virtual Ethernet that is simply a memory
    transfer from one VM to another. This virtualization (together with the
    page replication/moving) is the key that can help facilitate building
    highly scalable systems on NUMA machines using commodity operating
    systems that may be oblivious with regards to the underlying memory
    architecture. The commodity operating system only needs to know how to
    belong to a cluster of machines.

     

    My big question for this paper is, "How can I/O multiplexing possibly
    work?" In the case of multiplexing the keyboard, if the first character
    goes to the first VM, the second goes to the second VM, etc., how can
    this be correct (it isn't), and what am I missing from this picture?

     


  • Next message: David V. Winkler: "Review: Disco: Running Commodity Operating Systems on Scalable Multiprocessors."

    This archive was generated by hypermail 2.1.6 : Mon Mar 01 2004 - 17:28:12 PST