Lecture: locking
preparation
administrivia
- anonymous feedback: office hours
overview
- multicore CPUs
- how to use them correctly and efficiently
- often times things come down to this
- this class focuses on mechanisms: locks (today), threads, and scheduling
example: multi-threaded hash table
- review pthread from CSE 333
- ph.c (html):
put()
, get()
- recall 333 lab: divide work for parallel speedup
- why missing keys
- data races: concurrent
put()
s
- example: all try to set
table[0]->next
- one winner, others lose
- detection using ThreadSanitizer
- how to fix the bug - use lock(s) to protect
put()
- coarse-grained: one lock for the entire table
- fine-grained: per-bucket locks, or even per-entry
- trade-off
- why
atomic_fetch_add
and atomic_load
for done
?
- would
done += 1; while (done < nthread);
work?
gcc -O2
: infinite loop - why?
- how about
get()
?
locks
- mutual exclusion: only one core can hold a given lock
- concurrent access to the same memory location, at least one write
- example:
acquire(l); x = x + 1; release(l);
- “serialize” critical section: hide intermediate state
- another example: transfer money from account A to B
put(a + 100)
and put(b - 100)
must be both effective, or neither
dead locks
- assume per-bucket lock
- acquire the lock for bucket 1 and then the lock for bucket 2
- write two values
- release both blocks
- deadlock
- thread 1: lock bucket 1; lock bucket 2
- thread 2: lock bucket 2; lock bucket 1
- approach
- programmers enforce partial order over locks
- always grab locks in pre-defined order
lock implementation
- hw: draw cores, caches, bus, RAM
- try it on
ph.c
: what can go wrong
- atomic exchange: combine test and set into one atomic step
- show assembly code of
acquire()
: xchg
instruction
- if
l->locked
was 1, set it to 1 again & return 1
- if
l->locked
was 0, at most one xchg
would see & return 0
- the problem is pushed down to hardware
- guess how
xchg
is implemented
- understand the performance overhead
- memory models this Friday
- xv6/JOS: see
spinlock.c