Lecture 4: More Primary-Backup — Whiteboard Descriptions

These are text descriptions of the whiteboard PDF from the lecture on April 6, 2026. See also the whiteboard PDF and the notes.

These materials were drafted by AI based on the live whiteboard PDF and audio transcript from the corresponding lecture and then reviewed and edited by course staff. They may contain errors. Please let us know if you spot any.

Don't forget to fill out the partner form!

Why not just use TCP?

TCP claims: ordered reliable message delivery
Disconnection
- network cuts out
- machine crashes

Primary Backup

Diagram: Two clients (C1, C2) send requests to the Primary (P). P forwards requests to the Backup (B) with sequence numbers (reqA + seqnum). B acknowledges (ack(seqnum)).

Primary monitors backup: if B has failed, then P becomes single server
Backup monitors primary: if P has failed, B becomes single server, tell clients

Split Brain Problem

Two subsystems operating independently but "claiming" to be the whole system

Failure Detector and Spare Servers

Diagram: A Failure Detector (FD) node at the top communicates with both the Primary (P) and Backup (B). Clients (C1, C2) send requests to P. P replicates to B. A pool of spare servers (S, S, S, S) is available. When a node fails, a spare can be promoted.

New backup has no data
Telling everyone who P and B are

View Server

Diagram: A View Server node manages views. A view contains: who P is, who B is, and a version number (view number). Example: View 1 has primary=S1, backup=S2. View 2 has primary=S1, backup=S3 (after S2 fails). State transfer occurs from S2 to S3 so the new backup gets up to date. S3 must acknowledge the state transfer to S2 so it knows it is complete.