Lecture 10: Single-decree Paxos — Notes

These are notes from the lecture on April 20, 2026. See also the whiteboard descriptions and the whiteboard PDF.

These materials were drafted by AI based on the live whiteboard PDF and audio transcript from the corresponding lecture and then reviewed and edited by course staff. They may contain errors. Please let us know if you spot any.

Where We Are

Lab 2 builds a primary-backup replicated key-value store. It tolerates one failure at a time: the view server designates a primary and backup; the primary forwards operations to backup; if either fails, the view server promotes the survivor and brings in a new replica via state transfer. Of course if the view server fails, then the system is stuck. If you try to have more than one view server, then you have the problem of getting them to agree on the current view.

Getting nodes to agree on stuff is the topic for this week and lab 3.

That problem is called "consensus". Lab 3 builds a replicated log built on MultiPaxos. Today is about the algorithmic core: single-decree Paxos, which solves consensus for one decision. MultiPaxos strings together many single-decree instances, one per slot in the log.

Log-based State Machine Replication

The shape of the system we're aiming for:

Every replica keeps a log of operations, each element of the log is called a slot. Each slot will eventually hold a client command.
Clients submit commands; the system decides on a command for each slot.
Each replica applies the log in order to a local copy of the state machine. Because all replicas see the same sequence of commands, they end up in the same state.

So the per-slot problem is: get a group of replicas to agree on a single value (the command for that slot). That's consensus, and we solve it one slot at a time, using single-decree Paxos.

Three server nodes S1, S2, S3 arranged at the vertices of a triangle (S1 at top, S2 at bottom-left, S3 at bottom-right). Each node is a square box, and inside each box is a small horizontal array of empty log-slot cells drawn as a rectangle split node.

The Consensus Problem

A group of nodes each proposes a value. The protocol must choose one of the proposed values. The three requirements:

The chosen value was proposed. No making up values out of thin air.
At most one value is chosen. Never two different winners.
Nobody believes a value is chosen unless it actually was. Learners don't get misled.

Notice what's not required: we don't require that the protocol always terminates, or that a specific node's proposal wins. Those concessions are what make consensus solvable at all. Recall from last lecture that consensus is impossible in full generality. Paxos is safe under arbitrary message loss, reordering, and delay, and it makes progress whenever the network behaves well enough for long enough. (We'll come back to what "well enough" means.)

Roles

Paxos is described in terms of three roles. These are logical roles; in a real deployment one physical node typically plays all three.

Proposer — proposes values. Drives the protocol.
Acceptor — votes on proposals. The acceptors collectively hold the state that determines which value is chosen.
Learner — learns the outcome. Reports back to the client, writes to the replicated log, etc.

An execution has a fixed number of each role. In particular, the set of acceptors is fixed, and majorities are taken over that fixed set.

The proposer and the acceptor work together to make sure the protocol achieves its goals. As we will see, the proposer collects information about past behavior of acceptors before proposing a new value, and acceptors make promises to proposers about their future behavior.

Ballots (Rounds)

Proposers attach a ballot number (also called a round number) to every attempt. Ballot numbers are monotonically increasing. In single-decree Paxos, they also must be unique per proposer: different proposers never use the same ballot number. A simple way to arrange this with two proposers is to give one the even numbers and the other the odd numbers; more generally you can partition the integers among the proposers however you like, or attach the proposer's id as a tiebreaker.

The point of ballots is to give the protocol a total order on attempts. When two proposers race, the one with the higher ballot wins. The acceptors' rules will force the later proposer to respect anything the earlier one might already have gotten chosen.

The Ballot Protocol: Two Phases

A Single-decree Paxos ballot has two phases. Phase 1 is "prepare", in which the proposer gathers information about past ballots. Phase 2 is "accept", in which the proposer picks a value and asks the acceptors to vote for it (accept it).

Messages:

1a(r) — prepare, from proposer to acceptors. "I am beginning round r."
1b(r, summary) — promise, from acceptor to proposer. "OK, and here's what I know about earlier rounds, and I promise not to participate in earlier rounds anymore."
2a(r, v) — accept request, from proposer to acceptors. "Please vote for value v in round r."
2b(r, v) — accept reply, from acceptor to learner. "I voted for v in round r."

The overall shape of the protocol as a message sequence chart:

Phase 1

A proposer picks a fresh round number r from among those numbers allocated to that proposer. The number should be larger than any previously used by this proposer.
It sends 1a(r) to all acceptors.
An acceptor that receives 1a(r) responds with 1b(r, summary), where summary describes anything the acceptor has already voted for in an earlier round. For today we only fill in the case where the acceptor has never voted. In that case the summary is null. (The non-null case is for next lecture.)
The proposer waits for 1b replies from a majority (more than half) of acceptors.

Why a majority? Because any two majorities overlap in at least one acceptor. That single overlapping acceptor is the point of contact between competing rounds. Whatever the overlapping acceptor tells the later proposer will force that proposer to defer to the earlier one if a value was already on its way to being chosen.

Phase 2

Once the proposer has 1b responses from a majority:

Inspect the summaries.
- If all summaries are null (no acceptor in the majority has voted before), the proposer is free to propose any value. Typically in this case it will propose a value from a client that contacted it.
- If some summary is non-null, the proposer must pick a value constrained by the summaries. (We'll fill this in next lecture.)
The proposer sends 2a(r, v) to all acceptors where v is the proposed value.
Each acceptor, if the rules allow, sends 2b(r, v) to all learners. (Notice that the 2b message is not sent back to the proposer but to the learner.)

The Learner

The learner waits for 2b(r, v) messages from a majority of acceptors, all in the same round r. Once it has them, it declares v chosen.

What "Chosen" Means

Paxos is unusual in that "chosen" is a property of the system state, not an action any single node takes. There is no moment where some node flips a switch. Instead:

chosen(r, v) holds iff a majority of acceptors have sent 2b(r, v) messages. In other words, a majority voted for v in round r. (To "vote" for it means to send a 2b message for it.)
chosen(v) holds iff there exists some round r with chosen(r, v).

Some consequences worth internalizing:

The value may be chosen before any node knows it. It becomes chosen the moment the majority's 2b messages exist in the network, even if none have arrived anywhere yet.
The proposer may crash immediately after sending 2a. If a majority of acceptors then send 2b, the value is chosen regardless, even if no living node may yet be aware of the choice.
The learner's job is to detect that a value has been chosen. A value being chosen and a learner learning it are distinct events.

That "state of the world" framing is what makes the safety argument go through even though the protocol is asynchronous and lossy.

Connection to Knowledge

Last lecture we talked about how protocols climb the knowledge hierarchy. Paxos is a great example:

Before the protocol runs, the value-to-be-chosen is distributed knowledge at best. Proposers each know their own proposals, but no one knows what "the" chosen value is.
Phase 2 spreads the proposed value from one proposer to many acceptors.
Once a majority have voted, the fact "v is chosen" is distributed knowledge of the acceptors. No one acceptor knows, but the group's combined state determines it.
The learner gathers the 2b messages and converts that distributed knowledge into its own individual knowledge: K_L chosen(v).
Further rounds of broadcast can push that up to E_G chosen(v) for the whole system.

We never reach common knowledge (that's the impossibility we proved last time) but we reach enough of the hierarchy to do useful work.

What's Next

Today's protocol has a hole in it: What do acceptors do if they have participated in previous rounds when responding to a 1a message? Similarly, what does a proposer do in phase 2 if the summaries from phase 1 are non-null? That case is the entire reason the protocol is correct in the face of multiple proposers. Next lecture we fill it in and then walk through why the three consensus requirements hold.

After that, we lift single-decree Paxos to MultiPaxos, one Paxos instance per log slot, with optimizations so that the prepare phase doesn't have to run on every decision. That's the foundation of Lab 3.