Lecture: Concurrency and scheduling

preparation

read the xv6 book: §5, Locking and §6, Scheduling
take a look at lab lock

administrivia

review lab alloc

overview

multicore CPUs
how to use multicore CPUs correctly and efficiently
- often times things come down to this
- this class focuses on mechanisms: locks, threads, and scheduling

locks

example: multi-threaded hash table (ph.c)
- why missing keys
  - data races: concurrent put()s
  - all try to set table[0]->next - one winner, others lose
- race detection using ThreadSanitizer
  - add -fsanitize=thread to gcc/clang compile options
  - see the ThreadSanitizer paper and the Eraser paper
- how to fix the bug - use lock(s) to protect put()
  - coarse-grained: one lock for the entire table
  - fine-grained: per-bucket locks, or even per-entry
  - trade-off
- why __sync_fetch_and_add and __sync_bool_compare_and_swap for done?
  - would done += 1; while (done < nthread); work?
  - gcc -O2: infinite loop - why?
- how about get()?
locks
- mutual exclusion: only one core can hold a given lock
- “serialize” critical section: hide intermediate state
  - example: transfer money from account A to B
  - put(a + 100) and put(b - 100) must be both effective, or neither
lock implementation
- strawman
  - hw: draw cores, caches, bus, RAM
  - try it on ph.c: what can go wrong?

struct lock { int locked; };

void acquire(struct lock *l)
{
  for (;;) {
    if (l->locked == 0) { // A: test
      l->locked = 1;      // B: set
      return;
    }
  }
}

void release(struct lock *l)
{
  l->locked = 0;
}

atomic exchange: combine test and set into one atomic step
- gcc’s __sync builtins
  - __sync_lock_test_and_set(ptr, value): atomically write value to ptr and return old value
  - __sync_lock_release(ptr): write 0 to ptr
- show assembly code of acquire(): xchg instruction
  - if l->locked was 1, set it to 1 again & return 1
  - if l->locked was 0, at most one xchg would see & return 0
- alternatives
  - RISC-V: amoswap.w instructions
  - C11 atomics, assembly
- the problem is pushed down to hardware
  - guess how xchg is implemented
  - understand the performance overhead
  - memory consistency models next week

void acquire(struct lock *l)
{
  while (__sync_lock_test_and_set(&l->locked, 1) != 0)
    ;
}

void release(struct lock *l)
{
  __sync_lock_release(&l->locked);
}

spinlocks in xv6
- show kernel/spinlock.h, kernel/spinlock.c
- what’s pushoff()/popoff()? turn off/on interruptps in the kernel
- see Figure 2: U54 Interrupt Architecture Block Diagram of the SiFive U-54 manual

threads & scheduling

goal: virtualizing time
- thread = stack (state) + virtual CPU registers
- each thread thinks it has a dedicated CPU
- kernel runs each in turn on a physical CPU
- analogy: virtual memory vs. physical memory
scheduling in xv6
- 1 user thread and 1 kernel thread per process
- 1 scheduler thread per processor
- locks to protect shared data structures and resources
cooperative scheduling for kernel threads
- threads give up control by yielding
- two switches: thread 1 → scheduler → thread 2
- scheduler() in kernel/proc.c
- swtch() in kernel/swtch.S - what does ret return to?
- lab uthread
preemptive scheduling for user threads
- how to force a thread to give up control?
- per-processor timer interrupt (every 100 ms): user → kernel
- usertrap()/kerneltrap() (kernel/trap.c) → yield() (kernel/proc.c) → sched() (kernel/proc.c)
- switch to a different thread, then kernel → user
now you have a complete picture of how kernel works
- user space is running process p (say sh)
- p traps into the kernel upon timer (preemptive)
- the kernel switches from p to the scheduler (cooperative)
- the scheduler switches to q (say ls)
- q returns to user space to resume execution
scheduling policy
- round robin
- is the kernel always making the right decisions on when to preempt user space?
- The Linux Scheduler: a Decade of Wasted Cores

labs

lab lock
lab cow
- page refcnts
- reserved bits in the page table