Weak consistency and disconnected operation 1. Motivation How can we support highly available/low latency/high throughput updates to shared state? How can we support disconnected or weakly connected operation? 1.1 Applications: Low latency ecommerce sites (always ok to buy, despite failures) File synchronization across users or devices (Dropbox) Disconnected or intermittent connectivity (laptops, cell phones, 3rd world) source code control (git) 1.2 Consistency recap Concurrency forces us to to think about meaning of reads/writes Sequential consistency: everyone sees same read/write order (cache coherence, Paxos) Release consistency: everyone sees writes in unlock order (x86/ARM) (more generally, reads/writes forced to complete at memory barriers) But sequential/release consistency is slow: wait at memory barrier communication needed if recently modified by another node or if write and the local copy is not exclusive multiple rounds of communication in Paxos and unavailable: if no cached copy and latest copy is down Paxos unavailable unless a majority is up Paxos unavailable if network partition (to nodes on the wrong side of the partition) and none suitable for disconnected operation. 1.3 Weak or optimistic consistency Do the operation now (e.g., read/write cached copy) Check if it was OK later Recover if not OK Benefit: write any copy, so writes are fast Drawback: harder for application programmer (in some cases) concurrent reads/writes to different versions 2. Simplest example: Source code control Early source code control systems (CVS): client/server (Similar in many ways to file synchronization as in dropbox) 1. clients download copy of files from central server 2. clients upload changes when ready 3. if copy has changed in the meantime, client merges changes -- to different lines of a file -- to different files -- to directories (rename, delete) -- if conflicting, manual merge 4. once merged, retry -- might fail again and need new merge 2.1 Implementation Server keeps complete log of changes (needed for time travel with source code control) Client downloads latest consistent version (along with log timestamp) When upload, check if any intervening updates If so, check if updates conflict or can be automatically merged 2.2 Dropbox What if we want to keep track of every change to the file system, not just periodic udpates? Log could get very large! When can we garbage collect the log? On client, need to keep track of updates while disconnected. On server, need to keep track of whoever is farthest behind (oldest resync). If someone goes on vacation, then server log could get very big. 2.3 Timestamps Can we use timestamps instead of a log? Focus on an individual file. We need two things: -- apply the latest version -- detect if there is a conflicting update Just checking for the latest update time on a file isn't enough: H1: f->1 sync f->2 sync H2: f->0 sync H2 has latest timestamp, but we also want to be able to detect conflict. Instead: Each client keeps timestamp of last time it sync'ed with the server File system timestamps each file; conflict if (when merging) both server and client have a newer version 2.4 Limitations of central source code control/file synchronization has Large organizations: central server is performance/availability bottleneck sharding only helps if you don't need consistency across shards (independent projects) lots of manual work remerging can't work offline for long periods of time (may never catch up) Hard to do local integration E.g., at a coding party off the Internet Often difficult to create, manage, and remerge branches Example: one of the big cloud companies said they had a 5 person team just to integrate and test linux changes into their production system 3. Peer to peer (git, ficus) A set of replicas, each able to apply changes independently, but which can branch and merge directly with each other. Ex: a team can clone a repo, merge with each other through that repo, then merge that repo with the main branch Might also pull a set of changes from another team -- and either team might merge with the main branch first, but don't want to reapply changes. Example: H1: f=1 ->H2 H2: f=2 -> H3 H3: f=3 -> H1 Is there a conflict? Should there be a conflict? Definition: causal ordering OK for sync to copy version x2 over version x1 iff x2 includes all updates that are in x1. git implementation is that all updates are logged, and each merge copies the log entries (with appropriate compression) So in the example above: after first merge, H2 has f=1 after change, H2 has f=1, f=2 after second merge, H3 has f=1, f=2 after change, H3 has f=1, f=2, f=3 after third merge, easy to see that H1 is subset of H3 If conflicting update in the meantime on H1, then do merge and apply merge operation into log f=<1,0> -> f=<1,1> f=<2,0> f=<3,0> then if unconflicting, merge to f=<3,1> and enter that into the log 3.1 Causal consistency What if H2 tries to merge with H1, then need to apply a set of changes f=<1,1> f=<3,0> f=<3,1> What if H2 merged with H1's change to <1,1> introducing a change to <2,1>, before H1 merged with H3, and then H2 merges with H1 or H3? Would like to end up at the same place! Even though updates get entered into the various logs in a different order. Solution: label each update with its origin, and apply updates in causal order if x causally precedes y, everyone sees x before y Example: everyone's log has H1's first update before all other updates. 3.2 Vector clocks (also called Vector Timestamps) Let's revisit the log compaction problem. Ficus does p2p file sync, but doesn't keep a log of every update to every file. Rather it merges based on timestamps for each file -- has there been an update, but it wants to ignore updates if it has already merged them. Vector Clock (VC)/Vector Timestamp (VT) Each node numbers its own actions VC is a vector of numbers, one slot per node VC[i]=x => sender had seen all updates from node i up through #x VC comparisons to answer "is update A before update B?" four situations: a < b, a || b a < b if: forall i: a[i] <= b[i] AND exists j: a[j] < b[j] i.e. a summarizes a proper prefix of b i.e. a causally precedes b a || b if: exists i,j: a[i] < b[i] and a[j] > b[j] i.e. neither summarizes a prefix of the other i.e. neither causally precedes the other Many systems use VC variants, in the reading list both Ficus and Dynamo compact way to say "I've seen everyone's updates up to this point" compact way to agree whether event x preceded event y Ficus Constraint: No Lost Updates Only OK for sync to copy version x2 over version x1 if x2 includes all updates that are in x1. Example 1: Focus on a single file H1: f=1 ->H2 ->H3 H2: f=2 H3: ->H2 What is the right thing to do? Is it enough to simply take file with latest modification time? Yes in this case, as long as you carry them along correctly. I.e. H3 remembers mtime assigned by H1, not mtime of sync. Example 2: H1: f=1 ->H2 f=2 H2: f=0 ->H1 H2's mtime will be bigger. Should the file synchronizer use "0" and discard "2"? No! They were conflicting changes. We need to detect this case. Modification times are not enough by themselves What if there were concurrent updates? So that neither version includes the other's updates? Copying would then lose one of the updates So sync doesn't copy, declares a "conflict" Conflicts are a necessary consequence of optimistic writes How to decide if one version contains all of another's updates? We could record each file's entire modification history. List of hostname/localtime pairs. And carry history along when synchronizing between hosts. For example 1: H2: H1/T1,H2/T2 H3: H1/T1 For example 2: H1: H1/T1,H1/T2 H2: H1/T1,H2/T3 Then its easy to decide if version X supersedes version Y: If Y's history is a prefix of X's history. We can use VCs to compress these histories! Each host remembers a VC per file Number each host's writes to a file (or assign wall-clock times) Just remember # of last write from each host VC[i]=x => file version includes all of host i's updates through #x VCs for Example 1: After H1's change: v1=<1,0,0> After H2's change: v2=<1,1,0> v1 < v2, so H2 ignores H3's copy (no conflict since <) v2 > v1, so H1/H3 would accept H2's copy (again no conflict) VCs for Example 2: After H1's first change: v1=<1,0,0> After H1's second change: v2=<2,0,0> After H2's change: v3=<1,1,0> v3 neither < nor > v1 thus neither has seen all the other's updates thus there's a conflict What if there *are* conflicting updates? VCs can detect them, but then what? Depends on the application. Easy: mailbox file with distinct immutable messages, just union. Medium: changes to different lines of a C source file (diff+patch). Hard: changes to the same line of C source. Reconciliation must be done manually for the hard cases. Today's paper is all about reconciling conflicts