Lecture 4: More Primary-Backup — Whiteboard Descriptions
These are text descriptions of the whiteboard PDF from this lecture.
These materials were drafted by AI based on the live whiteboard PDF and audio transcript from the corresponding lecture and then reviewed and edited by course staff. They may contain errors. Please let us know if you spot any.
Don't forget to fill out the partner form!
Why not just use TCP?
- TCP claims: ordered reliable message delivery
- Disconnection
- network cuts out
- machine crashes
Primary Backup
Diagram: Two clients (C1, C2) send requests to the Primary (P). P forwards requests to the Backup (B) with sequence numbers (reqA + seqnum). B acknowledges (ack(seqnum)).
- Primary monitors backup: if B has failed, then P becomes single server
- Backup monitors primary: if P has failed, B becomes single server, tell clients
Split Brain Problem
- Two subsystems operating independently but "claiming" to be the whole system
Failure Detector and Spare Servers
Diagram: A Failure Detector (FD) node at the top communicates with both the Primary (P) and Backup (B). Clients (C1, C2) send requests to P. P replicates to B. A pool of spare servers (S, S, S, S) is available. When a node fails, a spare can be promoted.
- New backup has no data
- Telling everyone who P and B are
View Server
Diagram: A View Server node manages views. A view contains: who P is, who B is, and a version number (view number). Example: View 1 has primary=S1, backup=S2. View 2 has primary=S1, backup=S3 (after S2 fails). State transfer occurs from S2 to S3 so the new backup gets up to date. S3 must acknowledge the state transfer to S2 so it knows it is complete.