Lecture: OS organization

class structure

prereq: CSE 451 (ugrad OS) or equivalent
- basic OS concepts
- C and assembly
goals
- OS abstractions and ideas
- low-level systems: code/manual & debugging
structures
- lectures: review basic concepts & discuss more recent ideas
- labs: based on xv6, a Unix-style teaching OS on RISC-V
- project: form groups of 2-4
- no exams
see the course website on grades & policies
the first xv6 lab: due this week
systems research at UW

overview

what’s an OS / why do we need an OS
- a reusable library
- a set of programming abstractions
- a mediator between applications/hardware
- concerns: safety, performance, scalability
the computer
- examples: IBM T42, Raspberry Pi, HiFive Unleashed
- abstract machine model: CPU, Memory, and I/O
  - CPU interprets instructions: IP (Instruction Pointer) to memory
  - memory stores instructions and data
  - I/O to external world: e.g., memory-mapped I/O (MMIO)
- in this class we use QEMU to emulate a RISC-V computer
how does printing “hello” work on this computer?
- example: a tiny hello “kernel” (tar and source code)
- run make qemu
  - UART (serial port) at physical memory address 0x1000_0000
  - perhaps the lowest-level “hello world”
- talk to devices through memory address space
  - example: chapter 4 (memory map) of the SiFive U74 manual
  - will see more examples in later labs
- what if we have two applications trying to print
  - how to avoid interleaved output - isolation & protection
  - how to print to a different device - abstraction (e.g., files)
a brief history
- Kirk McKusick - A Narrative History of BSD (play 2:22–9:50)
- Unix history diagram
- what’s the OS on your phone/laptop/…?

programming interface

recap: print to screen via MMIO (0x1000_0000)
history: punch cards - see Holzmann’s To Code Is Human
Unix system calls & files: we will use Linux as an example, via strace

example: hello

$ strace -f -- echo hello

output:

execve("/usr/bin/echo", ["echo", "hello"], [/* 43 vars */]) = 0
...
write(1, "hello\n", 6)                  = 6
...
exit_group(0)                           = ?

system calls: execve (first), write, exit_group (last)
what’s the first argument of write
what’s the first argument of exit_group

example: uptime

$ strace -- uptime

output:

execve("/usr/bin/uptime", ["uptime"], [/* 43 vars */]) = 0
...
open("/proc/uptime", O_RDONLY)          = 3
lseek(3, 0, SEEK_SET)                   = 0
read(3, "8827716.99 421503180.64\n", 8191) = 24
...
write(1, " 15:37:50 up 102 days,  4:08,  8"..., 72) = 72
...
exit_group(0)                                          = ?

system calls: open, lseek, read, …
file: /proc/uptime - is this a real file on disk? (source code)
you’ll implement a simpler version in lab fs
other pseudo files: e.g., procfs
- cat /proc/cpuinfo
- cat /proc/iomem

example: sh and uptime

$ strace -f -- sh -c uptime

output:

execve("/bin/sh", ["sh", "-c", "uptime"], [/* 43 vars */]) = 0
...
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f2813f27850) = 19224
strace: Process 19224 attached
[pid 19223] wait4(-1,  <unfinished ...>
[pid 19224] execve("/usr/bin/uptime", ["uptime"], /* 43 vars */) = 0
...
[pid 19224] exit_group(0)               = ?
[pid 19224] +++ exited with 0 +++
<... wait4 resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 19224
...
exit_group(0)                           = ?

system calls: clone (old days: fork), wait4
- what do they do
- how do they communicate
child vs parent processes: what are copied/shared
if you don’t see clone (possibly with old bash), try:

$ strace -f -- sh -c "uptime && true"

exercises

draw the system calls for the following:

$ uptime > file
$ uptime | tr '[a-z]' '[A-Z]'

hints: try strace -f -- sh -c "<cmd>".

summary

system calls (and “files”)
- why do we need them at all - just library functions? isolation & protection
- what is the “file” abstraction
  - as long as you can open/read/write/close
  - can support different “backends”: terminals, disk files, pipes, sockets
guess how strace is implemented

isolation

goal: isolation & sharing
- a process should not be able to
  - corrupt the memory of the kernel or another process
  - consume all the CPU time
- while still being able to communicate with other processes & hardware
  - otherwise, just use a separate machine for each application
real-world example: how do we manage a large building like Gates
- partitioning (who can access which rooms)
- virtualization (“Bezos Seminar” for Gates G04)
memory
- keys
  - split memory into chunks and assign a “key” to each chunk
  - each process holds one or more keys
  - check (who? when?) if the process has the key to access the memory chunk
  - examples: IBM S/360, Intel’s MPK (memory protection keys), RISC-V PMP (physical memory protection)
- virtual memory
  - suppose process A writes some value to a memory address
    - can process B read the value from the same address?
    - can process B overwrite the value in A? why not?
  - naming issue
    - hide physical memory addresses from programs
    - name memory using: <address space, virtual address> instead of
  - isolate two processes
    - need some virtual-to-physical address translation (by what?)
    - use a safe language: e.g., write process in Java (any other example?)
    - virtual machine: e.g., can xv6 in QEMU harm your laptop?
    - hardware: MMU - segfaults if violated
  - isolate kernel memory from processes
    - take a look at kernel/memlayout.h in xv6
    - can a process write to kernel? who checks?
time
- cooperative scheduling
  - each process voluntarily gives back the control (to whom?)
  - what if they don’t
- virtual CPUs
  - each process thinks it has a dedicated CPU
  - kernel runs processes in turn on physical CPUs: save/store state
    - example: take control back from a process that spins forever
    - timer interrupt & preempt
many plans to achieve isolation
- this class focues on isolation through kernel/user split
- will talk about other techniques in the second half of this quarter (think how Java/QEMU work)

kernel-user split

CPU support: kernel/user mode flag (ring)
- RISC-V privilege modes: see the priv spec, chapter 1
  - M (machine), S (supervisor), U (user)
  - draft: hypervisor modes
  - x86/arm have similar modes
- kernel run in S-mode & can execute privileged instructions
  - examples: change the address translation map
  - including changing the current privilege mode
- user processes run in U-mode and are unprivileged
how do unprivileged processes work
- they cannot directly access files, network, etc.
- the kernel can
- can they simply jump into an kernel address? no - what prevents them from doing so?
- system calls: controlled transfer
  - user → kernel: ecall instruction
  - kernel → user: sret instruction
what to put below/above the system call interface: a comparison
- monolithic kernel
  - big kernel (including file systems, network, etc.)
  - easy to develop applications
  - hard to isolate kernel components
- microkernel
  - kernel + user-space servers (e.g., file system, network)
  - applications talk to servers via IPCs
  - performance issues: IPCs may be slow
- exokernel (library OS)
  - end-to-end arguments
  - kernel: expose low-level abstractions to applications
  - applications can often do a better job
- most real-world kernels are mixed: Linux, macOS, Windows
- debate: example