Problem Set 3: Paxos

Due: Tuesday, February 6, 6:30pm via Gradescope

Each problem is worth 5 points.

Single-Decree Paxos

Setup

Here is a description of single-decree Paxos. (We describe each role as a separate node. In lab 3, you will combine all three roles on each node.)

A ballot is a pair of a ballot number and a proposer id. Ballots are ordered by first comparing ballot numbers and if those are equal, then comparing proposer ids.

Nodes: there are \(k\) proposers, \(n\) acceptors, and \(l\) learners
- Proposer: will refer to the current proposer's id as \(i\) (which will satisfy \(1 \le i \le k\)).
  - state:
    - \(\mathtt{current\_ballot\_num}\): current ballot number, an integer, initially 0
    - \(\mathtt{votes}\): set of (sender, PrepareResponse (1b) message) pairs received for current ballot number, initially empty
      - in this description of single-decree Paxos, the proposer uses the size of the \(\mathtt{votes}\) set to remember whether or not is has already proposed a value in this ballot number yet or not. if the size of \(\mathtt{votes}\) is greater than \(\lfloor n/2 \rfloor\), then the proposer has already proposed. if it is less than or equal to \(\lfloor n/2 \rfloor\), then the proposer has not proposed in this ballot number yet.
- Acceptor:
  - state:
    - \(\mathtt{promised\_ballot}\): the highest ballot this acceptor has ever sent a 1b message for, optional, initially None
    - \(\mathtt{last\_accepted}\): the highest-balloted AcceptResponse (2b) message ever sent by this acceptor, optional, initially None
- Learner:
  - state:
    - \(\mathtt{accepts}\): set of all (sender, AcceptRessponse (2b) message) pairs ever received, initially empty
Messages
- Prepare (also known as "1a")
  - contents:
    - a ballot (i.e., a pair of a ballot number and a proposer id)
  - sent from a proposer to an acceptor
  - when received:
    - let \(b\) be the ballot on the incoming Prepare message
    - the acceptor ignores the message if \(\mathtt{promised\_ballot}\) is not None and \(b\) is less than or equal to \(\mathtt{promised\_ballot}\)
    - otherwise, the acceptor sets \(\mathtt{promised\_ballot}\) to \(b\), and sends a PrepareResponse message back to the proposer containing the incoming Prepare message's ballot and \(\mathtt{last\_accepted}\).
- PrepareResponse (also known as "1b")
  - contents:
    - a ballot
    - an optional AcceptResponse (2b) message
  - sent from acceptor to proposer
  - when received:
    - let \(b\) be the ballot on the incoming PrepareResponse message
    - the proposer ignores the message if \(b\) is not equal to \((\mathtt{current\_ballot\_num}, i)\) where \(i\) is the proposer's id
    - let \(m\) be the old size of the proposer's \(\mathtt{votes}\) set
    - the proposer adds the pair (sender, incoming PrepareResponse message) to the \(\mathtt{votes}\) set
    - let \(m^\prime\) be the new size of the proposer's \(\mathtt{votes}\) set
    - if \(m \le \lfloor n/2 \rfloor\) and \(m^\prime > \lfloor n/2 \rfloor\), then:
      - if every element of \(\mathtt{votes}\) contains None as its 2b message, then let \(v\) be any value the proposer wants.
      - otherwise, if there is some element of \(\mathtt{votes}\) with a non-None 2b message, then let \(v\) be the value of the highest-balloted 2b message from any 1b message in \(\mathtt{votes}\).
      - send an Accept message containing \(((\mathtt{current\_ballot\_num}, i), v)\) to all acceptors, where \(i\) is the proposer's id.
- Accept (also known as "2a")
  - contents:
    - a ballot
    - a value
  - sent from proposer to acceptor
  - when received:
    - let \(b\) and \(v\) be the ballot and value on the incoming Accept message
    - the acceptor ignores the message if \(\mathtt{promised\_ballot}\) is not None and \(b\) is strictly less than to \(\mathtt{promised\_ballot}\)
    - the acceptor sends an AcceptResponse message containing \((b, v)\) to all learners
    - if \(\mathtt{last\_accepted}\) is None or if the ballot of \(\mathtt{last\_accepted}\) is less than \(b\), then the acceptor sets \(\mathtt{last\_accepted}\) to \((b, v)\).
- AcceptResponse (also known as "2b")
  - contents:
    - a ballot
    - a value
  - sent from acceptor to learner
  - when received
    - the learner adds the pair (sender, incoming AcceptResponse) message to \(\mathtt{accepts}\)
Spontaneous actions
- start new proposal
  - at any time, a proposer can decide to start a new proposal by incrementing its \(\mathtt{current\_ballot\_num}\), clearing its \(\mathtt{votes}\) set, and sending a Prepare (1a) message with ballot \((\mathtt{current\_ballot\_num}, i)\) (where \(i\) is the proposer's id) to all acceptors.

(A "spontaneous action" is just a high-level way of describing something that you would do with a timer in practice. It means the proposer can do it whenever it wants to.)

In our discussion below, we imagine that sending a message adds it to the set of messages in the network, but receiving a message does not remove it from the set. In other words, once a message is sent, it is "in the network" forever.

We define \(\mathit{Chosen}(v)\) to mean that there exists a ballot \(b\) and a set of acceptors \(A\) such that the size of \(A\) is greater than \(\lfloor n/2\rfloor\) and every acceptor in \(A\) has sent an AcceptResponse (2b) message with contents \((b, v)\).

To "describe an execution", list the events that happen. An event can be a "spontaneous action" or a message delivery. No need to explain the events, just list them.

Problems

Suppose \(k=1\), \(n=3\), and \(l=1\). Describe an execution of single-decree Paxos that reaches a state where \(D(Chosen(v))\) but not \(K_L(Chosen(v))\), where \(L\) is the one learner.
- Recall from the distributed knowledge paper that \(D(x)\) means "looking down from a bird's eye view of the system, one can see that \(x\) holds".
- \(K_N(x)\) means "node \(N\) knows that \(x\) holds"

For each state below, say whether the state is reachable or not. If yes, describe an execution that reaches it. On the other hand, if the state is not reachable, (1) describe an invariant that is false in this state, (2) explain in one sentence why the invariant is false in this state, and (3) explain in one sentence how the protocol ensures your invariant is an invariant.

We omit some pieces of the state (often the proposer and learner). In that case, you should say whether there is any state matching the parts we did not omit that is reachable, or whether all such states are unreachable. (Still describing an execution or an invariant (and its explanation) as above.)

Suppose \(k=2\), \(n=3\), and \(l=2\), and let \(P_1\) and \(P_2\) be the proposers, \(A_1\), \(A_2\), and \(A_3\) be the acceptors, and \(L_1\) and \(L_2\) be the learners. In the ordering of ballots, suppose \(P_1 < P_2\). Let \(v\) and \(w\) be values such that \(v \ne w\).

- \(A_1: (\mathtt{promised\_ballot} = (1, P_1), \mathtt{last\_accepted} = ((1, P_1), v))\)
- \(A_2: (\mathtt{promised\_ballot} = (1, P_1), \mathtt{last\_accepted} = ((1, P_1), w))\)
- \(A_3: (\mathtt{promised\_ballot} = \mathtt{None}, \mathtt{last\_accepted} = \mathtt{None})\)
- \(A_1: (\mathtt{promised\_ballot} = (1, P_2), \mathtt{last\_accepted} = ((1, P_1), v))\)
- \(A_2: (\mathtt{promised\_ballot} = (1, P_2), \mathtt{last\_accepted} = ((1, P_2), w))\)
- \(A_3: (\mathtt{promised\_ballot} = \mathtt{None}, \mathtt{last\_accepted} = \mathtt{None})\)
- \(A_1: (\mathtt{promised\_ballot} = (1, P_1), \mathtt{last\_accepted} = ((1, P_1), v))\)
- \(A_2: (\mathtt{promised\_ballot} = \mathtt{None}, \mathtt{last\_accepted} = ((1, P_1), v))\)
- \(A_3: (\mathtt{promised\_ballot} = (1, P_1), \mathtt{last\_accepted} = \mathtt{None})\)
- \(A_1: (\mathtt{promised\_ballot} = (1, P_1), \mathtt{last\_accepted} = ((1, P_1), v))\)
- \(A_2: (\mathtt{promised\_ballot} = \mathtt{None}, \mathtt{last\_accepted} = ((1, P_1), v))\)
- \(A_3: (\mathtt{promised\_ballot} = \mathtt{None}, \mathtt{last\_accepted} = \mathtt{None})\)
- \(A_1: (\mathtt{promised\_ballot} = (1, P_1), \mathtt{last\_accepted} = ((1, P_1), v))\)
- \(A_2: (\mathtt{promised\_ballot} = (1, P_2), \mathtt{last\_accepted} = ((1, P_2), w))\)
- \(A_3: (\mathtt{promised\_ballot} = \mathtt{None}, \mathtt{last\_accepted} = \mathtt{None})\)

MultiPaxos

The problems below are about the version of the MultiPaxos protocol presented in lecture.

Imagine we have an infinite number of copies of single-decree Paxos indexed by a slot number, which is a positive integer.
All nodes play all roles.
We combine ballot numbers across all slots.
A ballot is a combination of a per-node sequence number and the node's name.
We combine phase 1 across all slots. 1b messages contain of vote summaries for all nonempty slots.
We define a node to be the leader for ballot \(b\) if it has collected a majority of 1b message in \(b\).
Clients broadcast requests to the all servers. Leaders place client requests into the first available slot and try to get them chosen.
When a leader is elected, it proposes no-ops in all empty slots that have slot number less than any non-empty slot.
We assume 2a/2b messages are sent separately for each slot, so these messages have a slot number on them in addition to single-decree Paxos data. (You can batch these messages in lab 3 if you want, but we ignore that here.)
For all problems except problem 16, we ignore heartbeats and garbage collection.

This protocol is less well specified than Single-Decree Paxos, so you will necessarily need to keep your explanations high-level. Your answers should make sense to anyone who has studied the lectures from this class—you should not depend on details of your own lab 3 design.

We define \(\mathit{Chosen}(i, v)\) where \(i\) is a slot number and \(v\) is a value to mean that there exists a ballot \(b\) and a set of nodes \(S\) such that the size of \(S\) is greater than \(\lfloor n/2\rfloor\) and every node in \(S\) has sent an AcceptResponse (2b) message with contents \((i, b, v)\).

To "describe a execution", list the events that happen. An event could be a message delivery or "spontaneous action" or timer firing, or it could be a network failure (drop, duplicate, delay, reorder) or node failure. No need to explain the events, just list them. Also, if there are a lot of events, you can describe them at a high level instead of listing them one by one, but be sure that your reader can understand what specific execution you are talking about.

For all problems assume there are \(n=3\) nodes, \(N_1\), \(N_2\), and \(N_3\) and that \(N_1 < N_2 < N_3\) in the ballot ordering.

What goes weird but right

Describe a execution where two different nodes think they are currently the leader.

Hint: They will be leaders of different ballots.
In the scenario from the previous problem, explain why MultiPaxos does not violate linearizability even though there are multiple leaders.

Hint: The leaders are leaders of different ballots.
Describe a execution where a value is chosen for slot 1 but no node knows that it is chosen yet.
Describe a execution where a value has been chosen for slot 2 but no value is chosen (yet) in slot 1.
Suppose that leaders attempt to deduplicate client requests as follows. When a leader receives a request, it first looks in its log to see if that request is already in the log, and if so, it ignores the request. Otherwise, it puts the request into the first available slot.

Describe a execution where one client's requested value is chosen in two different slots (i.e., the same value is chosen in two different slots).

What would go wrong

Notes to future James:

add questions along these lines
- what goes wrong if you don't treat no-ops as commands but instead immediately choose them.
- describe an execution where two different commands are accepted in a single slot
fix problem 14 so that you can't reuse the solution to 13

Here are the old problems:

Using only finite message delays and reordering (and not node failures, message drops or duplicates) describe an infinite execution where no value is ever chosen even though clients are submitting requests.
Suppose, as in Problem 11, that leaders attempt to deduplicate client requests as follows. When a leader receives a request, it first looks in its log to see if that request is already in the log, and if so, it ignores the request. Otherwise, it puts the request into the first available slot.

Further suppose that the leader did not propose no-ops for empty slots when it got elected. Describe a sequence of events (possibly including failures) that leads to a state from which a client retransmits a request forever and no further failures occur, but the system never executes the client's request
Consider the following "optimization" to MultiPaxos. If the leader receives a client request that is read-only (e.g. Get in a key-value store), it immediately executes the request on its current copy of the state machine and sends the response to the client. Describe an execution of this "optimized" system that violates linearizability.

To show a linearizability violation, describe an execution where clients submit requests, MultiPaxos executes those requests (you should describe the events required for MultiPaxos to do this as well), and clients get responses, but those responses are not linearizabile. To show that a set of requests/responses is not linearizable, you must explain why there is no possible global order for the operations, as in problem set 2.

Hint: There might be multiple nodes that think they are the leader.
Suppose we try to "optimize" the protocol by eliminating phase 1. Any node can declare itself the leader of any ballot consisting of a sequence number and its own node name that is higher than any ballot that node has ever participated in. Describe an execution of this "optimized" system that violates linearizability.
For this problem, consider the protocol including garbage collection and heartbeats. Suppose we "optimize" garbage collection so that the leader declares a slot and value to be garbage as soon as it executes that value on its copy of the state machine, and then the leader then informs all nodes via a new "hey this is garbage" message that the nodes should delete the data in the slot and never consider that slot again. Describe an execution where one node crashes and the system reaches a state where the only remaining two nodes are missing slot data that they need to make further progress.