Lecture 8: Cache coherence

Caching is fundamental to obtaining reasonable performance in a 
distributed system -- both latency and throughput.

A fundamental question: 
can a program (or user) on a system with caches, tell that it is using a cache, ignoring performance?

weakly consistent -- doesn't behave like a single system
eventually consistent -- if no further modifications to state, all nodes eventually see the same state.  A program may temporarily detect that system is inconsistent with a single system. Of course if the program acts on that,
divergence can be persistent.
sequentially consistent/serializable -- behaves like a single system, to programs running on it
linearizable -- behaves like a single system, to users/external observers

Today's goal: explain what I just said ;-)

Sidebar: the nomenclature is confusing.  In plain English usage, 
one would think the words, linearizable and serializable, mean precisely 
the same thing – that is, they are synonyms in English.  
But as technical jargon, they mean very slightly different things, 
and the difference does not have any relationship to the plain 
English meaning of these two terms.

Please! when inventing jargon – don’t use English words against their 
plain meaning! Blame database researchers for this. (Of course, you can 
blame OS researchers for periodically redefining what "process" means.)

Lab 2b: want primary/backup system to appear as if it is one system,
with higher availability.  App doing put/get can't tell if system is
implemented using primary/backup or not.

One more bit of complexity:

What should happen if multiple clients do operations at the same time?

Say: two clients PUT to same key?

If it was one system, then they would occur in some order -- you wouldn't
know which came first.  The fact that the clients do PUT concurrently 
means that the application hasn't told system which must happen first. 

A further complexity: operations take time.  What if the operations overlap?

Q: One starts before the other starts, and finishes before the other finishes (but after the other starts)?  Do we know if one happens before the other?

Either

Q: What if one starts/ends entirely within the time the other operation started and ended?

Either

Q: What if one ended before the other started? 

A: If it acts like a single system to an observer (linearizable), 
then first operation occurred before the second.

Further, even with overlaps, every node reading the data should see
the sequence of values as progressing through the same sequence.
That is, you don't see: A, B, C on one node, and B, C, A on a different node.

With weak consistency (aka NFS), doesn't appear like a single system.
You might think: oh well, that's obviously wrong -- don't implement it 
like that. And you'd be right!  But there are tradeoffs: as Ousterhout
explains, its faster and more scalable to be weakly or eventually consistent.

Those issues apply with primary/backup; they apply even more
with caching.

Draw picture of FB's architecture, as a series of steps.
Where course is headed.

1. Start with one server.

2. Want fault tolerance, so do primary backup

3. Want one IP address (invisible to client when you do failover), 
so put a load balancer in front; monitors backends, and acts like the viewserver.

4. Scale (1): shard the server.  Each user has their data stored on one server; if they have friends on other servers, need to do RPC's to get that data and assemble it into the web page.

Why shards are interesting: does a memory with multiple shards act like 
a single memory?  Not necessarily! 

So we probably also need a configuration server -- to let us know who is responsible for which portions of the user data.

5. Engineering simplicity: REST
  Separate web server front end from storage server backend.
  Application logic runs on web server; storage server has 
  standard API that doesn't change much.

  Web server is stateless: the up to date version of the data is at the 
  storage server. 

  Front end assembles the requested page from data store; if it crashes, 
  just start again.  

  Both storage servers and web servers are sharded.

  Load balancer monitors front ends, with simple failover. 

  Backend storage server is sharded primary/backup.

6. Scale (2): memcache

  Cache results of storage queries and web server computations.

  Caches are sharded

  Weakly consistent as in NFS. Causes problems!  Eg., a post where the content of the post is stored separately; what if you get latest copy of one but not the other?

7. Scale (3): shard across data centers

So: user web page constructed from a web server coalescing data from multiple sharded caches and storage servers. 

Today: a conceptual model for how to think about caches.
Monday: some implementation techniques

We also use caches in other places, e.g., locally to avoid RPC's.
Example: web client caches objects; also lookup to find FB web address.
These are also weakly consistent.

Examples of distributed systems that do caching:  
(pretty much every distributed system!)

Web
Email
git
Ipod sync
Distributed file systems: Many clients, one server
DNS (Internet naming)
Shared virtual memory
Multicore architectures
Distributed databases

As an example: ORCA 
When you add money to your account, buses aren’t updated instantly

One way to view this: caching is the inverse of an RPC.  
With RPC, we send the computation to the data; with caching, 
we bring the data to the computation.  If we always send the 
computation to the data, then the result is simple, if inefficient.  
(How inefficient -- it would mean all of FB running on one server...)

Caching provides an extra dimension of flexibility to the design – 
location independence for where we put the data and where we put 
the computation (indeed, we can move around the location depending 
on the needs of the application).  Of course there are issues with 
security, fault tolerance, etc, that we’ll punt for now.

All caching systems face the same set of design issues:
1. what items to cache (data?, results of computation?)
2. what to evict (if not enough space to store everything)
3. where to look on a miss?  other clients?  Server?  Up to the system designer: what’s relative cost of LAN, WAN, disk? 
4. what happens when there is an update?  Multiple copies of state that might be stale.

Our focus today: this last question; see 451 for the first two. 

Need to start a bit abstractly.  

Consider a memory, with ability to load/store to memory.   
We can think of that memory as a RESTful web server talking to
a sharded object store backend, or clients doing put/get in the labs. 
Or as CPU's talking to DRAM.  Or clients talking to a network
network attached disk.   

Where we started: does system behave like a single system, or can we tell
the difference? 

Example:
  Suppose we have 3 nodes; and they are sending results to a single 
backend store. Then we can make it more complex by adding sharding to the store.
And caches to the nodes.

Note: this is written as C code, but could be PUTs and GETs in Lab 2 in place of load/stores on a processor.

  node0:
    v0 = f0();
    done0 = true;
  node1:
    while(done0 == false)
      ;
    v1 = f1(v0);          
    done1 = true;
  node2:
    while(done1 == false)
      ;
    v2 = f2(v0, v1);

  Intuitive intent:
    node2 should execute f2() with results from node0 and node1
    waiting for node1 implies waiting for node0

  Problem A:
    Suppose every operation is done in order; you wait for the operation to complete before moving to the next one.  And the data is stored on the same server.

    Then we know that f0 is written if done0 is written. 

    What if we want to speed things up?  After all, we can do the RPC to write done0 before waiting for the v0 RPC to complete.

    We're still ok if the RPC's are processed, one at a time, and in order 
     sent (e.g., with client-specific sequence #'s).
     This has a name: events occur in "processor order"

    Now suppose f0 and done0 are stored on different shards. 
Where a value is stored shouldn't make any difference, right?

    OK if node1 sees writes in the order they are issued.

    But suppose we don't wait for each store to complete before moving
    to the next operation. Then node1 *might* observe done0
    being true *before* v0 is initialized.

    We can prevent this problem by slowing everything down to a crawl.
    Issue one write, wait for it to complete, before issuing the next write.

    What if we have caches?  The copy of the data might be out of date.
    That is, update might not have reached the cache.  So if a node
    reads the copy, it will see the old copy.  And there might be
    a cached copy of f0 but not done0 -- node1 might then see done0 as true 
    (its up to date) even when f0 is not up to date.

  Problem B:
    CPU2 may see CPU1's writes before CPU0's writes
    i.e. CPU2 and CPU1 disagree on order of CPU0 and CPU1 writes

   Example: suppose we try to keep caches up to date by sending the
   new data to every node?  Does that help?  No: the order of arrival
   might differ on the different nodes.  Rather, we need to keep the same
   order everywhere.