Lecture 6: Stability; Virtual Clocks — Notes
These are notes from the lecture on April 10, 2026. See also the whiteboard descriptions and the whiteboard PDF.
These materials were drafted by AI based on the live whiteboard PDF and audio transcript from the corresponding lecture and then reviewed and edited by course staff. They may contain errors. Please let us know if you spot any.
Context
Primary-backup is now fully covered. Lab 2 (implementing primary-backup) is assigned, with the design doc due in one week. The next three or four lectures cover foundational concepts in distributed systems that are not directly relevant to the labs but are important to know, and appear on the problem sets.
Today's topic: time in distributed systems. Three ideas:
- When things are true (invariants, stable properties, unstable properties)
- The happens-before relation
- Lamport clocks (virtual clocks) — deferred to next lecture
The paper behind ideas 2 and 3 is Lamport's "Time, Clocks, and the Ordering of Events in Distributed Systems" (1978) — probably the most cited paper in distributed systems, with over 10,000 citations. It's posted as optional reading.
When Things Are True
In any distributed protocol, different things are true at different times. It's useful to have vocabulary for talking about when properties hold.
Invariant — always true
An invariant is a property that is true in every state of every execution of the protocol. Its truth does not depend on time.
Examples of invariants:
- If a computer is running instructions, the computer is powered on
- Each client has at most one outstanding request (a request that has been sent but whose reply has not been received, identified by client ID + sequence number)
- The sequence number saved by the server for each client is monotonically increasing (per client)
- All request messages in the network from a given client have sequence numbers at most one greater than the server's saved sequence number for that client
- In the highest view (at the view server), after state transfer has completed, the backup has at least as up-to-date state as the primary
That last invariant required several rounds of refinement:
- "The backup has all the state the primary has" — too vague. Which backup? In which view?
- "In every view" — no, because the view server can reuse servers across views, and later state transfers can overwrite earlier ones
- "After state transfer" — need to specify: after state transfer in that view
- The backup can actually be ahead of the primary (it executes first), so "has all the state" means "at least as up-to-date," not "exactly the same"
- Final version: restricted to the highest view, after state transfer in that view
This illustrates that thinking about invariants forces you to consider edge cases carefully.
Stable — once true, true forever
A stable property is one that, once it becomes true during an execution, remains true for the rest of that execution. Every invariant is vacuously a stable property, but the interesting stable properties are ones that start false and become permanently true.
Examples of stable properties (that are not invariants):
- Client C is in the server's AMO application — once a client contacts the server and gets added to the AMO app's map, it's never removed. (Note: the property must be stated about a single state, like "client C is in the map," not as a temporal statement like "once a client connects, it's always there.")
- There is a primary in the view server's highest view — false at the very beginning (before any node pings), but once a primary is assigned, there is always one.
- Server S is dead — if by "dead" we mean actually crashed (not just suspected by the view server). Once a server crashes, it doesn't come back. Note: the view server's approximate failure detection gets this wrong all the time — it frequently declares servers dead and then changes its mind. But the lab 2 protocol is correct despite this.
- If the view server is in view n+1, then the primary of view n completed state transfer in view n — this is actually an invariant (not just stable), because of how if-then (implication) works: before reaching view n+1, the antecedent is false, making the whole statement vacuously true.
A general pattern: monotonic counters produce stable properties. "Client A is at sequence number ≥ 100" is stable — once it passes 100, it never goes back. Sequence numbers, view numbers, ballot numbers — distributed systems are full of these.
Unstable — not stable
An unstable property is one that can become true and then later become false.
To prove that a property is unstable, you need a counterexample to stability: describe an execution that reaches a state S₀ where the property is true, then continues to a state S₁ where the property is false. Simply showing the property is false in some state is not enough — a property that is never true is actually stable (vacuously).
Examples of unstable properties:
- The current view has a backup (meaning the backup machine is alive) — the backup can crash at any time, making this false after it was true
- Whether "view V has a backup" is stable depends on the definition: if "has a backup" means the view server assigned a backup, that's stable; if it means the backup machine is alive, that's unstable
"False" is stable
A surprising consequence of the definition: the property false is a stable property. Why?
Stability means "once true, always true." Since false is never true, there is no
counterexample to stability — you can never find a state where it's true and a later state
where it's false. But false is not an invariant, because an invariant must be true in
every state, and false is true in no state.
This means any property that is never true in any execution is vacuously stable.
The Happens-Before Relation
From Lamport's 1978 paper. The key insight: in a distributed system, there is no global clock and no inherent notion of simultaneity. The only way to establish ordering between events on different machines is through communication.
Definition
Event e₁ happens before event e₂ if:
- They occur on the same machine and e₁ comes first, or
- e₁ is the sending of a message and e₂ is the receipt of that message, or
- There is a chain of events connecting e₁ to e₂ through cases 1 and 2 (transitivity)
Happens-before is related to causality: you cannot cause another node to do something without sending it a message. If you haven't communicated with another node, your events and its events are unordered with respect to each other.
Concurrency
Two events e₁ and e₂ are concurrent (or simultaneous) if neither one happens before the other: e₁ did not happen before e₂, and e₂ did not happen before e₁.
Until two nodes communicate, they are essentially operating "out of time" with each other. There is no coherent notion of time across nodes without communication.
Next lecture will cover algorithms (Lamport clocks, vector clocks) for computing happens-before relationships from within the system.