Weak consistency and disconnected operation

1. Motivation

How can we support highly available/low latency/high throughput updates to shared state?
How can we support disconnected or weakly connected operation?

1.1 Applications:
  Low latency ecommerce sites (always ok to buy, despite failures)
  File synchronization across users or devices (Dropbox)
  Disconnected or intermittent connectivity (laptops, cell phones, 3rd world)
  source code control (git)

1.2 Consistency recap
  Concurrency forces us to to think about meaning of reads/writes
  Sequential consistency: everyone sees same read/write order (cache coherence, Paxos)
  Release consistency: everyone sees writes in unlock order (x86/ARM)
   (more generally, reads/writes forced to complete at memory barriers)

  But sequential/release consistency is slow:
   wait at memory barrier
   communication needed if recently modified by another node
    or if write and the local copy is not exclusive
   multiple rounds of communication in Paxos

  and unavailable:
   if no cached copy and latest copy is down
   Paxos unavailable unless a majority is up
   Paxos unavailable if network partition (to nodes on the wrong side of the partition)
  
  and none suitable for disconnected operation. 

1.3 Weak or optimistic consistency 
  Do the operation now (e.g., read/write cached copy)
  Check if it was OK later
  Recover if not OK

  Benefit: write any copy, so writes are fast
  Drawback: harder for application programmer (in some cases) 
   concurrent reads/writes to different versions

2. Simplest example: Source code control

Early source code control systems (CVS): client/server
(Similar in many ways to file synchronization as in dropbox)

1. clients download copy of files from central server
2. clients upload changes when ready
3. if copy has changed in the meantime, client merges changes
  -- to different lines of a file
  -- to different files
  -- to directories (rename, delete)
  -- if conflicting, manual merge
4. once merged, retry -- might fail again and need new merge

2.1 Implementation

Server keeps complete log of changes (needed for time travel with source code control)

Client downloads latest consistent version (along with log timestamp)

When upload, check if any intervening updates

If so, check if updates conflict or can be automatically merged

2.2 Dropbox 

What if we want to keep track of every change to the file system,
not just periodic udpates?

Log could get very large! When can we garbage collect the log?
On client, need to keep track of updates while disconnected.
On server, need to keep track of whoever is farthest behind (oldest resync).

If someone goes on vacation, then server log could get very big.

2.3 Timestamps

Can we use timestamps instead of a log?

Focus on an individual file. We need two things:
-- apply the latest version
-- detect if there is a conflicting update

Just checking for the latest update time on a file isn't enough:

H1: f->1 sync f->2 sync

H2:                       f->0 sync

H2 has latest timestamp, but we also want to be able to detect conflict.

Instead:
Each client keeps timestamp of last time it sync'ed with the server
File system timestamps each file; conflict if (when merging) 
both server and client have a newer version

2.4 Limitations of central source code control/file synchronization has

Large organizations: 
  central server is performance/availability bottleneck
  sharding only helps if you don't need consistency across shards (independent projects)
  lots of manual work remerging
  can't work offline for long periods of time (may never catch up)

Hard to do local integration
  E.g., at a coding party off the Internet

Often difficult to create, manage, and remerge branches 

Example: one of the big cloud companies said they had a 5 person team just to integrate and test linux changes into their production system

3. Peer to peer (git, ficus) 

A set of replicas, each able to apply changes independently, but which
can branch and merge directly with each other.

Ex: a team can clone a repo, merge with each other through that repo, 
then merge that repo with the main branch 

Might also pull a set of changes from another team -- and either team might
merge with the main branch first, but don't want to reapply changes.


Example:

H1: f=1 ->H2

H2:          f=2 -> H3

H3:                    f=3 -> H1

Is there a conflict?  Should there be a conflict?

Definition: causal ordering
  OK for sync to copy version x2 over version x1 iff
    x2 includes all updates that are in x1.

git implementation is that all updates are logged, and each merge
copies the log entries (with appropriate compression)

So in the example above: 

after first merge, H2 has f=1
after change, H2 has f=1, f=2
after second merge, H3 has f=1, f=2
after change, H3 has f=1, f=2, f=3
after third merge, easy to see that H1 is subset of H3

If conflicting update in the meantime on H1, then do merge and apply merge operation into log

f=<1,0> ->   f=<1,1>
f=<2,0>
f=<3,0>

then if unconflicting, merge to f=<3,1> and enter that into the log

3.1 Causal consistency

What if H2 tries to merge with H1, then need to apply a set of changes
f=<1,1>
f=<3,0>
f=<3,1>

What if H2 merged with H1's change to <1,1> introducing a change to <2,1>, 
before H1 merged with H3, and then H2 merges with H1 or H3?

Would like to end up at the same place! Even though updates get entered
into the various logs in a different order.  

Solution: label each update with its origin, and apply updates in causal order

  if x causally precedes y, everyone sees x before y

Example: everyone's log has H1's first update before all other updates.

3.2 Vector clocks (also called Vector Timestamps)

Let's revisit the log compaction problem.  Ficus does p2p file sync, but
doesn't keep a log of every update to every file.  

Rather it merges based on timestamps for each file -- has there been
an update, but it wants to ignore updates if it has already merged them.

Vector Clock (VC)/Vector Timestamp (VT)
  Each node numbers its own actions 
  VC is a vector of numbers, one slot per node
  VC[i]=x => sender had seen all updates from node i up through #x

VC comparisons to answer "is update A before update B?"
  four situations: a < b, a || b
  a < b if:
    forall i: a[i] <= b[i]  AND  exists j: a[j] < b[j]
    i.e. a summarizes a proper prefix of b
    i.e. a causally precedes b
  a || b if:
    exists i,j: a[i] < b[i] and a[j] > b[j]
    i.e. neither summarizes a prefix of the other
    i.e. neither causally precedes the other

Many systems use VC variants, in the reading list both Ficus and Dynamo
  compact way to say "I've seen everyone's updates up to this point"
  compact way to agree whether event x preceded event y

Ficus
Constraint: No Lost Updates
  Only OK for sync to copy version x2 over version x1 if
    x2 includes all updates that are in x1.

Example 1:
  Focus on a single file
  H1: f=1 ->H2       ->H3
  H2:            f=2
  H3:                       ->H2
  What is the right thing to do?
  Is it enough to simply take file with latest modification time?
  Yes in this case, as long as you carry them along correctly.
    I.e. H3 remembers mtime assigned by H1, not mtime of sync.

Example 2:
  H1: f=1 ->H2 f=2
  H2:                  f=0 ->H1
  H2's mtime will be bigger.
  Should the file synchronizer use "0" and discard "2"?
    No! They were conflicting changes. We need to detect this case.
    Modification times are not enough by themselves

What if there were concurrent updates?
  So that neither version includes the other's updates?
  Copying would then lose one of the updates
  So sync doesn't copy, declares a "conflict"
  Conflicts are a necessary consequence of optimistic writes

How to decide if one version contains all of another's updates?
  We could record each file's entire modification history.
  List of hostname/localtime pairs.
  And carry history along when synchronizing between hosts.
  For example 1:   H2: H1/T1,H2/T2   H3: H1/T1
  For example 2:   H1: H1/T1,H1/T2   H2: H1/T1,H2/T3
  Then its easy to decide if version X supersedes version Y:
    If Y's history is a prefix of X's history.

We can use VCs to compress these histories!
  Each host remembers a VC per file
  Number each host's writes to a file (or assign wall-clock times)
  Just remember # of last write from each host
  VC[i]=x => file version includes all of host i's updates through #x

VCs for Example 1:
  After H1's change: v1=<1,0,0>
  After H2's change: v2=<1,1,0>
  v1 < v2, so H2 ignores H3's copy (no conflict since <)
  v2 > v1, so H1/H3 would accept H2's copy (again no conflict)

VCs for Example 2:
  After H1's first change: v1=<1,0,0>
  After H1's second change: v2=<2,0,0>
  After H2's change: v3=<1,1,0>
  v3 neither < nor > v1
    thus neither has seen all the other's updates
    thus there's a conflict

What if there *are* conflicting updates?
  VCs can detect them, but then what?
  Depends on the application.
  Easy: mailbox file with distinct immutable messages, just union.
  Medium: changes to different lines of a C source file (diff+patch).
  Hard: changes to the same line of C source.
  Reconciliation must be done manually for the hard cases.
  Today's paper is all about reconciling conflicts