% intro
# today's plan
course syllabus
what's an operating system
Mars code
# staff
Xi Wang
Adriana Szekeres
everyone in this room
# course overview
30% paper reading & discussions
30% programming labs
40% research projects
work in groups of 2-3
no exams
no late days
# paper reading & discussions
focus on one paper per class
two presentation slots
post questions/comments before each class
# programming labs
week 3-4: node.js-like web server
week 5-6: logging & replay
open-ended
# research projects
week 2: proposal
week 7: midterm report
week 10: demo & final report
research you are working on or leverage multiple classes
examples:
[Altair](http://pdos.csail.mit.edu/~xi/papers/altair-fse09.pdf),
[LXFI](http://pdos.csail.mit.edu/papers/lxfi:sosp11.pdf),
[more](http://courses.cs.washington.edu/courses/cse551/15sp/labs/proposal.html)
* * * * * *
# what is an (operating) system
"A set of interconnected components that has an _expected behavior_
observed at the _interface_ with its environment"
-- Principles of Computer System Design, Saltzer & Kaashoek
. . .
examples: Linux, browser, geo-replicated systems
# why is building systems hard?
complexity:
[millions of lines of code](http://www.informationisbeautiful.net/visualizations/million-lines-of-code/)
mitigation: modularity & abstraction
# example: system calls
enforce modularity: user space ↔ kernel
programming abstraction: file descriptor
expected behavior?
# AMD64 ABI
- The kernel interface uses `%rdi`, `%rsi`, `%rdx`, `%r10`, `%r8` and `%r9`
- A system-call is done via the `syscall` instruction. The kernel destroys
registers `%rcx` and `%r11`
- The number of the syscall has to be passed in register %rax
- System-calls are limited to six arguments, no argument is passed directly on
the stack
- Returning from the syscall, register `%rax` contains the result of the
system-call. A value in the range between `-4095` and `-1` indicates an error,
it is `-errno`
# hello world
```c
#include
#include
void _start(void) {
int fd = 1;
char buf[] = "hello world!\n";
size_t count = sizeof(buf) - 1;
asm volatile ("syscall"
: /* ignore output */
: "a"(__NR_write), "D"(fd), "S"(buf), "d"(count)
: "cc", "rcx", "r11", "memory"
);
asm volatile ("syscall"
: /* no output */
: "a"(__NR_exit), "D"(0)
);
}
```
"`gcc -nostdlib`"
# specifications: more than ABI
- `write(fd, buf, count)`
- what if the machine crashes during/after the system call?
- `open(pathname, flags, ...)`
- "The file descriptor returned by a successful call will be the
lowest-numbered file descriptor not currently open for the process."
- `openat(fd, pathname, flags, ...)`
- is file system the right abstraction for your application?
# this quarter
focus on interfaces
often under-specified
implications: scalability, performance, security, ...
see [schedule](http://courses.cs.washington.edu/courses/cse551/15sp/) for details
# optional reading: Mars Code
general principles
system-tool-language co-design
# challenges
hard to test: how to recreate the environment
serious consequence: reputation, funding, ...
do they apply to general systems?
how to mitigate?
# mitigation overview
replication
coding discipline
code-review process
formal verification
# replication
hardware: dual-CPU
software: two different implementations
dual-CPU boot control algorithm: formal verification
# coding discipline
tool checkable
developer certification
6 levels
# 6 levels
- LOC-1: language compliance
- C99, pass compiler/checker
- LOC-2: predictable execution
- loops: verifiable upper bounds
- LOC-3: defensive coding
- enough assertions: at least one within 10 loc, or 2%
- still enable at runtime: define safe mode
- LOC-4: code clarity
- restrict use of preprocessors & pointers
- LOC-5: safety-critical
- LOC-6: human-related
# code-review process
tool-based: Coverity, Codesonar, Semmle, Uno
"surprisingly little overlap in the output from the various tools"
Scrub: unified interface
owner responses: agree/disagree/discuss
% response
----- -----------------------------------
~84% led to code changes
12.3% disagree (33% overruled evetually)
6.4% discuss (60% changed)
2008-2012: 145 code reviews
10,000 peer comments/30,000 tool reports
# formal verification
high cost: critical code only
bounded model checking using SPIN
[multithreaded code](http://spinroot.com/dcas/)
dual-CPU boot control algorithm
flash file system: model checking after every code change
data-management subsystem: 45K C → 1.6K SPIN manually
# summary
end-to-end goals
system-tool-language co-design: Linux? Google?