% intro # today's plan course syllabus what's an operating system Mars code # staff Xi Wang Adriana Szekeres everyone in this room # course overview 30% paper reading & discussions 30% programming labs 40% research projects work in groups of 2-3 no exams no late days # paper reading & discussions focus on one paper per class two presentation slots post questions/comments before each class # programming labs week 3-4: node.js-like web server week 5-6: logging & replay open-ended # research projects week 2: proposal week 7: midterm report week 10: demo & final report research you are working on or leverage multiple classes examples: [Altair](http://pdos.csail.mit.edu/~xi/papers/altair-fse09.pdf), [LXFI](http://pdos.csail.mit.edu/papers/lxfi:sosp11.pdf), [more](http://courses.cs.washington.edu/courses/cse551/15sp/labs/proposal.html) * * * * * * # what is an (operating) system "A set of interconnected components that has an _expected behavior_ observed at the _interface_ with its environment" -- Principles of Computer System Design, Saltzer & Kaashoek . . . examples: Linux, browser, geo-replicated systems # why is building systems hard? complexity: [millions of lines of code](http://www.informationisbeautiful.net/visualizations/million-lines-of-code/) mitigation: modularity & abstraction # example: system calls enforce modularity: user space ↔ kernel programming abstraction: file descriptor expected behavior? # AMD64 ABI - The kernel interface uses `%rdi`, `%rsi`, `%rdx`, `%r10`, `%r8` and `%r9` - A system-call is done via the `syscall` instruction. The kernel destroys registers `%rcx` and `%r11` - The number of the syscall has to be passed in register %rax - System-calls are limited to six arguments, no argument is passed directly on the stack - Returning from the syscall, register `%rax` contains the result of the system-call. A value in the range between `-4095` and `-1` indicates an error, it is `-errno` # hello world ```c #include #include void _start(void) { int fd = 1; char buf[] = "hello world!\n"; size_t count = sizeof(buf) - 1; asm volatile ("syscall" : /* ignore output */ : "a"(__NR_write), "D"(fd), "S"(buf), "d"(count) : "cc", "rcx", "r11", "memory" ); asm volatile ("syscall" : /* no output */ : "a"(__NR_exit), "D"(0) ); } ``` "`gcc -nostdlib`" # specifications: more than ABI - `write(fd, buf, count)` - what if the machine crashes during/after the system call? - `open(pathname, flags, ...)` - "The file descriptor returned by a successful call will be the lowest-numbered file descriptor not currently open for the process." - `openat(fd, pathname, flags, ...)` - is file system the right abstraction for your application? # this quarter focus on interfaces often under-specified implications: scalability, performance, security, ... see [schedule](http://courses.cs.washington.edu/courses/cse551/15sp/) for details # optional reading: Mars Code general principles system-tool-language co-design # challenges hard to test: how to recreate the environment serious consequence: reputation, funding, ... do they apply to general systems? how to mitigate? # mitigation overview replication coding discipline code-review process formal verification # replication hardware: dual-CPU software: two different implementations dual-CPU boot control algorithm: formal verification # coding discipline tool checkable developer certification 6 levels # 6 levels - LOC-1: language compliance - C99, pass compiler/checker - LOC-2: predictable execution - loops: verifiable upper bounds - LOC-3: defensive coding - enough assertions: at least one within 10 loc, or 2% - still enable at runtime: define safe mode - LOC-4: code clarity - restrict use of preprocessors & pointers - LOC-5: safety-critical - LOC-6: human-related # code-review process tool-based: Coverity, Codesonar, Semmle, Uno "surprisingly little overlap in the output from the various tools" Scrub: unified interface owner responses: agree/disagree/discuss % response ----- ----------------------------------- ~84% led to code changes 12.3% disagree (33% overruled evetually) 6.4% discuss (60% changed) 2008-2012: 145 code reviews 10,000 peer comments/30,000 tool reports # formal verification high cost: critical code only bounded model checking using SPIN [multithreaded code](http://spinroot.com/dcas/) dual-CPU boot control algorithm flash file system: model checking after every code change data-management subsystem: 45K C → 1.6K SPIN manually # summary end-to-end goals system-tool-language co-design: Linux? Google?