Lecture 5: Distributed Transition Systems

Let's now use the technology of transition systems to model distributed systems.

How to model a distributed system with a transition system

Last time, we defined transition systems.
Remember that a transition system is three things: state space, initial states, and transitions.
Idea: describe the executions of a distributed system.
- State space: the set of global states of the distributed system.
  - remember that a global state is the combination of:
    - the current local state of every node
    - the set of timers at each node that have been set but not fired yet
    - the set of all messages that have ever been sent
- Initial state:
  - Initialize all the local states (according to their constructor, or whatever)
  - Set any timers that are supposed to be set initially
  - The set of messages is initially empty
- Transitions:
  - Deliver a message (and do what's supposed to happen when it is received)
  - Fire a timer (and do what's supposed to happen when it fires)

An example: Incrementing a counter

Consider this extremely simplified RPC-like protocol:

Two nodes: one client and one server
The server keeps track of a counter, stored as a (mathematical) integer.
The client has no state.
There are no timers.
There is one message, called Inc, which the client can send to the server to tell it to increment its counter.
- When the server receives an Inc message, it increments its local counter.

There are no other messages. (The server does not even tell the client that it executed the Inc request!) I told you it was extremely simplified.

The example as a transition system

How can we convert this into a transition system?

First, think about the global states of the system.
- The client has no state. The server has just one mathematical integer as its state.
- There are no timers.
- There is only one possible message, Inc.
  - Initially, this message has not been sent.
  - After the client sends it the first time, already in our fault model it can be delivered any number of times.
    - So the client sending it again doesn't do anything new.
  - So really we just need to keep track of whether the client has ever sent the message at all or not.
    - We could do this with a boolean, but that approach doesn't generalize to real protocols.
    - Instead, just keep a "set of messages", which, since there's only one message in this protocol, is either the empty set (initially) or the set containing just the message Inc (after the client sends it the first time).
- So to summarize, a global state looks like \((n, M)\) where \(n\in\mathbb{Z}\) is the server's count (an integer) and \(M\in \mathtt{Set\langle Msg\rangle}\) is the set of messages, where \(\mathtt{Msg}\) in this system is just the message Inc.
The initial state is \((0, \{\})\).
- Let's say the server initializes the counter to 0.
- And there have been no messages sent yet, so the set is empty.
Finally, there are two kinds of transitions:
- The client can send an Inc message.
  - From any state \((n, M)\), this transition moves us into the state \((n, M\cup\{\mathtt{Inc}\})\).
  - In other words, the server's count doesn't change and we add the Inc message to the set of sent messages.
- The server can receive an Inc message.
  - From any state \((n, M)\) where \(\mathtt{Inc}\in M\), this transition moves us into the state \((n+1, M)\).
  - Note that this transition is not possible in the initial state! But only in a state where Inc has been sent.
  - Also note that we don't remove the Inc message from \(M\). This allows us to model duplication "for free".

What are the reachable states of this system?

Remember: A reachable state is one that can appear on some execution of the transition system starting from the initial state.
Are all states of this system reachable?
- Remember, the set of states is the set of pairs of integers (the server's count) and sets of messages.
- There are finitely many "network states" since there is only one message, so either it has been sent or not.
- But There are infinitely many global states, because the local state of the server is an integer, and there are infinitely many integers.
No, not all states are reachable. Examples of unreachable states:
- \((1, \{\})\) is unreachable since if the server incremented its count, the client must have sent an Inc message.
- More generally, any state \((n, \{\})\) where \(n > 0\) is unreachable for the same reason.
- \((-1, \{\})\) is unreachable because the server only increments its counter, never decrements it. Since it starts at 0, it can never become negative via incrementing.
- More generally, any state \((n, M)\) where \(n < 0\) is unreachable for the same reason.
There are also plenty of reachable states
- The initial state \((0, \{\})\) is reachable. (Initial states are always reachable by definition—just take no steps!)
- The state \((0, \{\mathtt{Inc}\})\) is reachable because from the initial state, the client can send an Inc message to reach this state.
  - It might be obvious, but worth mentioning that in this state the client has sent a message that has not yet been received by the server! That's totally normal and happens all the time in distributed systems. Messages take time to deliver. So there are states where a message has been sent but not yet received.
- Every state of the form \((n, \{\mathtt{Inc}\})\) where \(n \ge 0\) is reachable.
  - From the initial state \((0, \{\})\), first take a step where the client sends an Inc message to reach the state \((0, \{\mathtt{Inc}\})\).
  - Then take \(n\) steps, each of which delivers a copy of the Inc message to the server.
  - The resulting state is \((n, \{\mathtt{Inc}\})\).

Invariants

Unreachable states are closely related to the idea of an invariant.
Remember that an invariant is a property that is true in all reachable states.
So to convince yourself that a state is unreachable, one way is to find an invariant of the system that is false in that state.
In our incrementing counter system above, we said that states with \(n < 0\) are unreachable. In other words, \(n\ge 0\) is an invariant of this system.
In our incrementing counter system above, we also said that states with \(n > 0\) and \(M = \{\}\) are unreachable. In other words, the claim \[ \text{If } M = \{\}\text{ then } n = 0 \] is an invariant of this system.

Hopefully obvious but worth mentioning: whether something is an invariant or not is a claim about a specific distributed system!
- If we change the distributed system, what used to be an invariant might not be any more.
- For example, if we add a decrement command to our system above, then \(n\) can definitely become negative, breaking our invariant that \(n \ge 0\).

Stable properties

In the incrementing counter system, consider the claim that \(n \ge 1\).
This is not an invariant, since it's not true in the initial state.
But this property is very interesting because once it becomes true, it never becomes false again.
We call such properties stable.

Communicating properties to other nodes

Stable properties are essential in distributed system design because they are the only kind of property that you can tell other nodes about and know that it will still be true when they receive your message.
If you try to tell another node "P is true" but P is not stable, then it might be false by the time your message gets to that node.
But if you tell them "P is true" and P is stable, then since P was true when you sent the message, it will still be true when your message arrives.
In a distributed system, nodes have no ways to learn about each other's state except by sending messages.
- Stable properties allow nodes to make "promises" to other nodes about all their future states.

Every invariant is stable

Every invariant is a stable property because invariants are always true, so in particular, "if they become true" (they already have) then they will always be true.
Another way to think about this is that an invariant is a stable property that is also true in the initial state. Therefore, since it is stable, it will always be true.
Not every stable property is an invariant!
- Any stable property that is false in the initial state is an example.
- The property "\(n \ge 1\)" in the Incrementing Counter System is an example.
So, given a property, it can be unstable, stable but not an invariant, or an invariant.
- It's impossible to be an invariant but not stable.

Example 2: The server responds

Our incrementing counter example above was ridiculously simple in many ways, perhaps most egregiously that the server sends no messages back to the client.
Let's fix that by adding response messages.
Remember one of the challenges in RPC was that the client had trouble telling which request the server was responding to, so we needed to add sequence numbers.
We will get back to sequence numbers eventually, but I want to start simpler. Let's start by having the server just send back the current value of its counter.
Add a new kind of message Resp(j) where j is an integer. When the server receives an Inc message, it will execute it, and send back Resp(j) where j is the new value of the server's counter.
What can the client conclude when it receives Resp(j)?
- Wrong answer: the server's current value of the counter is j.
  - Wrong because that's not stable: the server might have incremented some more since it sent this response.
- Right answer: the client can conclude that the server's current value is at least j.
  - Ok because the property of the server's counter being at least j is stable in this system.
  - Demonstrates the power of stable properties!

Example 2 as a transition system

Here is an updated transition system with response messages.

Call this system Incrementing Counter System with Response Messages.

State space: \((n, M)\) where \(n\in\mathbb{Z}\) is the server's count (an integer) and \(M\in \mathtt{Set\langle Msg\rangle}\) is the set of messages
- A \(\mathtt{Msg}\) is either the message Inc or a message of the form \(\mathtt{Resp}(j)\) where \(j\) is an integer.
Initial state: \((0, \{\})\).
Transitions:
- The client can send an Inc message.
  - From any state \((n, M)\), this transition moves us into the state \((n, M\cup\{\mathtt{Inc}\})\).
  - In other words, the server's count doesn't change and we add the Inc message to the set of sent messages.
- The server can receive an Inc message.
  - From any state \((n, M)\) where \(\mathtt{Inc}\in M\), this transition moves us into the state \((n+1, M\cup\{\mathtt{Resp}(n+1)\})\).
  - In other words, we increment the server's count and send a response message back to the client with the new count.

Converting a stable property into a (network) invariant

The property \(n \ge j\) is still stable in this new system.
- It is also still not an invariant (unless \(j = 0\)).
If we want to tell another node that this property is true via a message, then we encounter a network invariant that depends on the stable property:
- The invariant is: "If the message \(\mathtt{Resp}(j)\) is in the network, then the server's counter is at least \(j\)."
- In other words, the network invariant gives meaning to the message by saying what the client can conclude when it receives it.
- If the underlying property of \(n \ge j\) was not stable, then the client could not conclude this, and it wouldn't be an invariant.
- Notice that the invariant is true initially because the hypothesis "the message \(\mathtt{Resp}(j)\) is in the network" is false, so the implication is vacuously true.
- Once the message is sent, the hypothesis becomes true, and the conclusion is also true.

Rule of thumb: Every message should have a meaning

Network invariants give meaning to messages.
- If the destination node cannot conclude anything new when it receives the message, then there was no point in sending the message at all.
Network invariants depend on stable properties.
- Otherwise, the invariant would become false when the server's state changes out from under it, for example.
So, as a rule of thumb, for every message in your protocol, you should describe what stable property the destination can conclude when it receives the message.
- And convince yourself that the property really is stable!

Example 3: Decrementing

Suppose we add a decrement command to the system.

Example 3 as a transition system

Here is an updated transition system with decrement commands.

Call this system Incrementing/Decrementing Counter System with Response Messages.

State space: \((n, M)\) where \(n\in\mathbb{Z}\) is the server's count (an integer) and \(M\in \mathtt{Set\langle Msg\rangle}\) is the set of messages
- A \(\mathtt{Msg}\) is either the message Inc or the message Dec, or a message of the form \(\mathtt{Resp}(j)\) where \(j\) is an integer.
Initial state: \((0, \{\})\).
Transitions:
- The client can send an Inc message.
  - From any state \((n, M)\), this transition moves us into the state \((n, M\cup\{\mathtt{Inc}\})\).
- The client can send an Dec message.
  - From any state \((n, M)\), this transition moves us into the state \((n, M\cup\{\mathtt{Dec}\})\).
- The server can receive an Inc message.
  - From any state \((n, M)\) where \(\mathtt{Inc}\in M\), this transition moves us into the state \((n+1, M\cup\{\mathtt{Resp}(n+1)\})\).
- The server can receive an Dec message.
  - From any state \((n, M)\) where \(\mathtt{Dec}\in M\), this transition moves us into the state \((n-1, M\cup\{\mathtt{Resp}(n-1)\})\).

What stable properties are still true?

The decrement command destroys all stable properties we had before about the server's counter.
Is there anything which is still stable?
- One example: the size of \(M\) is at least 7
  - Stable because we never remove messages from \(M\), so once it gets that big (or however big) it will never be smaller than that again.