% Singularity # kernel verification week ensure your program will behave as intended today: system/language co-design next: design for verification # OS/PL co-design "Historically, OS development and programming language development went hand-in-hand. Although C is the workhorse language of the systems community today, novel approaches to OS construction based on new programming language ideas continue to be an active and important area of research. The connection between OS development and programming languages is both significant and current." ---[PLOS workshop](http://plosworkshop.org/) # what's the goal of Singularity? safe extensions / kernel specialization recall from last week: browser plug-ins, kernel modules q: can we just use SFI? q: can we use hw page table protection? # approaches consider examples & safety/performance/programmability [draw table] run extensions directly within host run extensions in userspace / separate process write extensions in a domain-specific language write extensions in a general, safe language write proofs: if you don't trust the "safe" compiler # "safe" languages & systems Lisp machines, Ada Cyclone (safe C, Rust) Standard ML: FoxNet (SIGCOMM'94) Modular-3: SPIN (SOSP'95) Haskell: House (ICFP'05) OCaml: Mirage (ASPLOS'13) C# & Singularity # Singularity project long-term project at Microsoft Research inspired many other projects: Helios (SOSP'09), Verve (PLDI'12), ExpressOS (ASPLOS'13), Ironclad (OSDI'14), structure: microkernel, SIP, IPC # radical design single address space (no paging / segments) extensions in a separate process via IPC q: perf implications (e.g., syscall, IPC, context switching)? q: benchmarks to back up their claim (Table 1 and Figure 5)? q: why do they use linked stacks? # verification in Singularity recall compiler-verifier from SFI compile: source → bytecode (MSIL) install & verify: bytecode → machine code run: verified machine code . . . q: why bytecode? why verify at install time? q: what's the verifier doing? # SIP: Software-Isolated Process sealed: no self-modifying code, JIT, shared library - why? how to prevent SIPs from directly accessing each other? are manifests/contracts useful? # IPC channels & endpoints: capabilities? "exchange heap": shared memory for message payload risks: lifetime, concurrent reads/writes single pointer; no access after send - ensured by verifier # what's the TCB in Singularity compiler, verifier, GC? > Since Bartok is a large and highly optimizing compiler, it is likely > to contain bugs, and some of these bugs might cause the compiler > to translate safe MSIL code into unsafe native code. > ... each Singularity garbage collector is currently written as > unsafe Sing# code, and bugs in this code could undermine Singularity’s > security. q: how to improve (or should we worry about these)? # some OS verification work 197x-1980 UCLA Unix security kernel: "Specification and verification of the UCLA Unix security kernel", CACM, 1980 1973-1983 Provably Secure Operating System: "PSOS Revisited", ACSAC, 2003 198x Kit: "Kit: A Study in Operating System Verification", 1988 Recent work: Verisoft, seL4 (next class) # Kit ``` ---------------------------------- high-level language ---------------------------------- assembly language ---------------------------------- machine code interpreter ---------------------------------- gate-level register-transfer model ---------------------------------- ``` each level is compiled down to next & verified "verified": correspondence between the behavior of two FSMs using the Boyer-Moore theorem prover (ACL2's precursor) # example: Kit high-level language ``` procedure MULT (var ANS: int; I, J: int) = begin var K: int := 0; K := J; ANS := 0; loop if K le 0 then leave end; ANS := ANS + I; K := K - 1; end; end; {mult} ``` # example: Kit core-image ``` ... B00000000000011111000001001000001 B00000000000011111000000000100010 B00000000000011111000001001011011 B00000000000011111000001001011011 B00000000000011111000000010011000 B00000000000000000000000000000001 B00000000000000110000000010000010 B00000000000011111000001001101100 B00000000000011111000000010111011 B00000000000000010000000010100101 B00000000000011111000000010011000 B00000000000000000000010001001101 B00000000000011111000000010001100 B00000000000001110000000010000101 B00000000000011111000001001101100 B00000000000011111000000010011000 B00000000000000000000000000000000 B00000000000000110000000010000010 ... ``` # what's the guarantee? high level → assembly → machine code → gates hardware-level behavior is the equivalent to the high level . . . q: can buffer overflows happen at the low level? q: what if there's a bug in assembly model/spec? q: what's the TCB compared to Singularity?