Lecture 8: Linearizability (and Consistency Models)

Agenda

Answer the question "What does it mean for a state-machine replication algorithm to be correct?"

It's actually sort of surprising that we haven't had to answer this question yet!
- Since we already described one state-machine replication algorithm (primary-backup) that you are implementing in lab 2.
Our hand wavy answer so far has been that the replicas are "equivalent" to one machine from the clients perspective.
In other words, clients can pretend that there's just one copy of the Application, that just happens to be very fault tolerant. - There are some (mostly irrelevant) details here because clients have to send their operations to the primary, which might change over time, so they have to be aware that there are multiple servers. But this detail is hidden by the client library that we, the authors of primary-backup write. - So the application-layer's API to the state machine is actually the same as in lab 1: sendCommand()/getResult().
If we look at state-machine replication at a high level, we have:
- an opaque box in charge of the replication
- clients send requests into the box and wait to get responses back
- there can be multiple clients interacting with the system at once

If we think about the simplest case where we only have one client, and they only send one request at a time, then the system should evolve "linearly":
- the client retransmits its current request for ever until it gets a response
- then it sends the next request, and so on
So in the situation with one client, the system is "equivalent" to executing the clients' commands in order, starting from the initial state of the state machine/Application.
- Since the Application is (assumed to be) deterministic, there is only one right answer for a particular sequence of commands, so this tells us whether the system was correct or not (did it return the right answers?)

Now consider the case where there are two clients, and they each submit a request.
These requests can arrive at servers in the system in either order.
- Do we need to try to reorder them somehow?
Intuitively, no. Either order is ok.
- The reason is that the clients are prepared to handle delays in the network, so they cannot possibly be expecting one to definitely arrive first. So we are free to execute them in either order.

The above examples show us the flavor of a consistency model.
- Given an execution of a distributed system, a consistency model says whether or not it is "correct".
- Different consistency models have different definitions of correct.
For this discussion, we are going to consider two forms of execution
- The bird's-eye space-time diagram model
- The request-response-execution model

We have seen a few of these before.
Here we'll use a simplified version of this kind of diagram, where we only draw clients, not servers.
Instead of showing requests going into the state-machine replication box and responses coming out, we will abbreviate this using the "regions of time" notation.
- A region begins when the client first sends the request to the system. Labeled with the request.
- A region ends when the client first receives the response to the current request (clients only have one outstanding request). Labeled with the response.
We represent the timing information visually in the picture, but you could equivalently think of it as labeling the beginning and end of each region with a bird's-eye timestamp.

So a consistency model takes one of these space-time diagrams as input and tells you either "yes, that is allowed" or "no, that is not allowed".

(These notes incomplete... let me know if you want me to finish them as that will help encourage me!)