Cache Coherence 2 0. Duality of RPC and caching Caching is (in a sense) the inverse of an RPC. With RPC, we send the computation to the data; with caching, we bring the data to the computation. If we always send the computation to the data, then the result is simple, if inefficient. (How inefficient -- it would mean all of FB running on one server...) Caching provides an extra dimension of flexibility to the design – location independence for where we put the data and where we put the computation (indeed, we can move around the location depending on the needs of the application). Of course there are issues with security, fault tolerance, etc. 1. Is there a serializable system that is not linearizable? Assign timestamp to each operation, apply in timestamp order. Linearizable means operation must complete within window (after start of call, before return). Serializable allows operation to complete after the return, as long as you can prove no one reads the result. An example from 451: multiprocessor locking. Operations within a critical section are buffered -- they complete on the local processor (e.g., if the processor "read my writes" it will see the new value). But they are buffered -- they don't complete on other nodes before the next instruction starts. So other nodes can see intermediate values. So not linearizable. suppose data structure is locked. Then the lock release will force all updates to be visible everywhere. As long as all threads use the locking discipline, no thread can observe the updates in anything other than serial order, even though the updates arrive at other nodes in varying order. So its serializable, but not linearizable. 2. Example node0: v0 = f0(); done0 = true; node1: while(done0 == false) ; v1 = f1(v0); done1 = true; node2: while(done1 == false) ; v2 = f2(v0, v1); Intuitive intent: node2 should execute f2() with results from node0 and node1 waiting for node1 implies waiting for node0 Problem A: Does "node1 observes done0 as set" imply "node1 observes v0 as set"? Yes: if every operation is done in order at the server, and there is only one server. No: if sharded store and RPC's done in parallel No: if cached copy can be out of date. Suppose cached copy of f0 but not done0 -- node1 might then see done0 as true (its up to date) even when f0 is not up to date. Problem B: CPU2 may see CPU1's writes before CPU0's writes i.e. CPU2 and CPU1 disagree on order of CPU0 and CPU1 writes Example: suppose we try to keep caches up to date by sending the new data to every node? Does that help? No: the order of arrival might differ on the different nodes. Rather, we need to keep the same order of updates everywhere. Thus, behavior of the program depends on the memory model: that is, linearizable, serializable, eventually consistent or weakly consistent. 3. Very Useful fact: serializable if (i) operations applied in processor order, and (ii) all operations to same memory location (e.g., PUT/GET key) are serialized (single copy). In other words, OK to have multiple copies of data for reads, but only one copy of data when it is being updated. If there are no caches, there is only a single copy at the server. So that means serializable if operations occur in processor order. If there are multiple copies -- e.g., caches, system can’t finish a write operation until all copies have been updated or invalidated. Once the operation has finished: everyone must see the result of the write. But! updates have to be applied in the same order everywhere. otherwise, one reader might see a different intermediate result than another reader. So this says: send all updates to one place, have them put in order, and then multicast to the rest of the system (e.g., by invalidating the cache, so that the new value is fetched on the next access). 4. Causal ordering The example program works correctly in a third model: causal ordering. A read returns a causally consistent recent version of the data. That is, if I have received a message A from a node (or indirectly received it through some other node), then I will see all updates that node made prior to A. This relaxes ordering constraints even more, admitting a faster implementation, although at the cost of making the system more complex to reason about for the programmer. (Question for the audience: is it possible for causal order to differ from serializability? They don’t differ for the case we started with at the beginning.) CPU1: write a value CPU2: write a value (a bit later) CPU3: read value CPU4: read value What value does CPU3 and CPU 4 read? With serializability, CPU3 and CPU4 read the same value, the value written by CPU2. If there are two writes by the same CPU, they appear to other CPUs in order, but it allows CPU1 and CPU2’s writes to be in any order, provided every CPU sees them in the same order. So CPU3 and CPU4 can see either CPU1’s write or CPU2’s write, but both will see the same one. With causal ordering, it would be possible for CPU3 to see the writes as CPU1 then CPU2, while CPU4 can see them in the other order, CPU2 then CPU1. This is not serializable – not consistent with some sequential order of operations. It isn't even eventually consistent. But it is causally consistent! Any advantage to doing causal ordering? In most cases, it does what you want. For example, the code we started with, should work correctly with causal ordered memory. 5. Weak consistency Why consider weak consistency at all? Examples: NFS, DNS, web. Why not always do sequential consistency or linearizability? We’ll see that it simplifies the implementation tremendously to provide only eventual consistency, especially when there is cached, replicated, or sharded data. Then you don’t need to instantly propagate every update to every replica – you can do that in the background. Another reason: if nodes can become disconnected, or we want to provide access to data even when the most up to date copy is unavailable. (some would say that need to use eventual consistency for any highly available system.) For example, Amazon’s revenue is ~ $200K/minute, and customer purchase rates decline dramatically even for small increases in client response time. So for many sites, it is important for it to be always up and responsive, even if data is not always consistent. [Example of airline reservations: overbook, and apologize if the seat isn't there in the end, vs. have a slow website?] 6. Implementation techniques Table of: write through/write back vs. coherence: none, weak/strong lease, callback 1) no caching (very early dfs, novell) 2) write through, weak lease (DNS, web) 3) write back, weak lease (NFS) 4) write through, lease (chubby, zookeeper) 5) write through, callback (AFS) 6) write back, callback (sprite, coda, ivy) Illustrate behavior of each quadrant 7. Start simple. No caches – what if every RPC goes to server? What semantics does that provide? (linearizability) 8. weak lease, or time to live (TTL). Allow client to use copy for some period of time (lease). After lease expires, client throws away the cached copy, (and on next use) goes back to the server to get the latest version. Example: Lab 2 -- clients get copy of current view, and periodically recheck. Means they might do an RPC to an old primary, and if so we'd need to detect and fix. Example: web browser. Client web cache holds objects, each with a server-defined lease or TTL. If access after TTL has expired, ping server with hash of object contents; it will either say, "ok -- nothing's changed" or "refetch". What semantics does this provide? (eventual) Advantage to weak lease: a. no state needed at the server. The server does not need to keep track of who has a copy, since they will each (eventually) get the new version. [An anecdote: NFS server. If it crashes, what does it need to do when it reboots? Nothing! Clients can simply retry any RPC’s they had in progress, since there is no callback state at the server.] b. Can always update state, regardless of whether server can reach other copies Disadvantages: a. Not serializable b. Overhead of revalidations Allows tradeoff for how stale the data can be, versus overhead of revalidations You can think of NFS as the pesky little brother repeatedly asking: has this item changed? Has this item changed? Until you want to scream, I’ll tell you when it does! c. Potential for synchronized revalidation. [Anecdote: DNS uses TTL cache coherence, but client checks only when name is used. USGS web site becomes very slow, every time an earthquake hits California, because everyone’s TTL will have expired.] 9. Strong lease Can we make leases serializable? Means we need to wait to perform a write until all leases have expired. E.g. when server gets a write request, it holds onto it, but stops providing new leases. Eventually every client will invalidate; the server will then be able to process the write, and resume issuing read leases While waiting for all leases to expire, do clients need to stall? Pick a time when you plan to perform write. Give out new leases to just before that time. Advantages: a. Serializable! b. Can (eventually) reclaim lease even if the network fails. Server reclaims leases unilaterally at expiration, even if the client or network is down Problems with this model? a. Requires clients and servers have loosely synchronized clocks b. Per-client/per-object state at the server c. Periodic traffic to revalidate lease d. If server goes down, clients cannot continue past end of lease e. Writes stall for length of longest lease Fix? Add a twist that we ask readers to release their lease.