Lecture 10: Locking
Preparation
- Read OSPP §5, Synchronizing Access to Shared Objects.
Multi-threaded hash table
- ph.c:
put()
, get()
- recall 333 lab: divide work for parallel speedup
- why missing keys
- data races: concurrent
put()
s
- example: all try to set
table[0]->next
- one winner, others lose
- detection
valgrind --tool=helgrind
(better on Linux) - really slow
- lockset: see the Eraser paper
- use lock(s) to protect
put()
- coarse-grained: one lock for the entire table
- fine-grained: per-bucket locks, or even per-entry
- trade-off
- why
atomic_fetch_add
and atomic_load
- would
done += 1; while (done < nthread);
work?
gcc -O2
: infinite loop - why?
- how about
get()
, or del()
?
- other approaches?
Locks
- mutual exclusion: only one core can hold a given lock
- concurrent access to the same memory location, at least one write
- example:
acquire(l); x = x + 1; release(l);
- “serialize” critical section: hide intermediate state
- another example: transfer money from account A to B
put(a + 100)
and put(b - 100)
must be both effective, or neither
Dead locks
- assume per-bucket lock
- acquire the lock for bucket 1 and then the lock for bucket 2
- write two values
- release both blocks
- deadlock
- thread 1: lock bucket 1; lock bucket 2
- thread 2: lock bucket 2; lock bucket 1
- approach
- programmers enforce partial order over locks
- always grab locks in pre-defined order
Lock implementation
- hw: draw cores, caches, bus, RAM
- try it on
ph.c
: what can go wrong
- atomic exchange: combine test and set into one atomic step
- show assembly code:
xchg
instruction
- if
l->locked
was 1, set it to 1 again & return 1
- if
l->locked
was 0, at most one xchg
would see & return 0
- the problem is pushed down to hardware
- guess how
xchg
is implemented
- understand the performance overhead
- how would you design a scalable map
- see also
atomic_flag_test_and_set