Lecture 9: Knowledge — Notes
These are notes from the lecture on April 17, 2026. See also the whiteboard descriptions and the whiteboard PDF.
These materials were drafted by AI based on the live whiteboard PDF and audio transcript from the corresponding lecture and then reviewed and edited by course staff. They may contain errors. Please let us know if you spot any.
Context
Today's topic is knowledge. The optional reading is Halpern and Moses, Knowledge and Common Knowledge in a Distributed Environment. That paper connects modal logics of knowledge to distributed systems. The most basic kind of logic (eg propositional logic from CSE 311) talk about whether facts are true or false; logics of knowledge let you talk about who knows which facts, from inside the logic.
This is useful for analyzing distributed systems because we think of nodes as agents that may or may not know things. We already speak informally this way: "the view server knows the primary has completed state transfer." That has a precise technical meaning, not a wishy-washy one, and it's closely tied to protocol design.
The Muddy Foreheads Puzzle
Setup:
nschool children sit in a circle.kof them (wherek ≥ 1) have mud on their forehead. Each child can see everyone else's forehead but not their own. The children do not knowk.- The teacher (who can see everyone) announces: "Somebody has mud on their forehead."
- The teacher then plays a game: "Raise your hand if you know you have mud on your forehead."
- If at least one child raises a hand and is correct, the class wins.
- If any child raises a hand and is wrong, everyone loses.
- If nobody raises a hand, the teacher pauses and asks again.
- The children are perfectly rational and always follow instructions — meaning they won't raise a hand unless they are certain, and they can reason about what other rational children would do.
- The children cannot communicate during the game.
The question: is there a strategy that lets them win?
The k = 1 case
Suppose there is exactly one muddy child. That child looks around and sees zero muddy foreheads. The teacher announced that at least one child has mud, so the only possibility is "me." That child raises their hand on the first round.
The k = 2 case
Now suppose there are exactly two muddy children.
Each muddy child sees exactly one muddy forehead. From their perspective, k is either 1 or
2, and the missing case would be "me." They wait one round. If k were really 1, the sole
muddy child would have raised their hand in round 1 (by the reasoning above). They didn't —
so k must be 2, and the second muddy child is me. Both muddy children realize this
simultaneously and raise their hands in round 2.
The general case
By induction: if there are k muddy children, each of them sees exactly k-1 muddy
foreheads and reasons that k is either k-1 or k. The k-1 case would have been resolved
in round k-1; when it isn't, all k muddy children raise their hands on round k.
Why the teacher's announcement matters
A seeming paradox: if k ≥ 2, every child can already see at least one muddy
forehead, so everyone already knows "somebody has mud." Why does the teacher's
announcement add anything?
The announcement is the base case of the induction. Without it, the k = 1 case cannot be
resolved, so the k = 2 case has nothing to infer from a silent round 1, and so on. The
announcement turns a fact that everyone individually knows into something stronger:
something everyone knows that everyone knows, and iterated up to any depth. That stronger
form of knowledge is what the protocol needs. In particular, the announcement let's the
children reason about what other children will do (since they know that the other children know that someone has mud on their forehead).
Differences from distributed systems
The puzzle isn't a perfect model of distributed systems:
- Real servers can (and do) talk to each other.
- The puzzle is synchronous: rounds advance in lockstep, and every child can observe every other child's action at the same time. Distributed systems usually don't work this way.
But there are also useful connections to distrubuted systems: the need to reason about which nodes know which facts.
Kinds of Knowledge
The paper introduces several operators for talking about knowledge. Let i range over nodes,
G range over sets of nodes, and φ be a proposition.
K_i φ— nodeiknowsφ. Example:K_VS (server 1 is alive)would mean the view server knows server 1 is alive. (As it happens, failure detection is impossible, so this fact isn't knowable — but that's how you'd write it if it were.)- We will take it as an assumption that
K_i φ → φ. In other words, if you know a fact, the fact must actually be true.
- We will take it as an assumption that
S_G φ— someone inGknowsφ.D_G φ— distributed knowledge ofφ: if you combined everything every member ofGknows, you'd be able to deriveφ. No individual need actually know it. Muddy foreheads is a nice example: no single child knows the full state of foreheads, but the group's combined knowledge does.E_G φ— everyone inGknowsφ.E_G^k φ— iterated: everyone knows that everyone knows that ... (k times) ...φ.
In a message-passing system (and in the muddy children puzzle), the levels are strictly
distinct: for every k, there are situations where E_G^{k-1} φ holds but E_G^k φ does
not. Muddy foreheads is a good example. With k = 2 muddy children Alice and Bob,
let m = "someone is muddy". Then, before the teacher speaks:
E_G mholds — each sees a muddy forehead.E_G^2 mfails — Alice doesn't know whether she's muddy, so she considers it possible Bob is the only muddy child, in which case Bob sees no mud and doesn't knowm.
With k muddy children, E_G^{k-1} m holds but E_G^k m fails by a nested version of the
same argument.
The teacher's public announcement jumps straight to C_G m: everyone hears it, everyone
sees everyone hear it, and so on. Had the teacher whispered to each child privately, E_G m
would hold but no higher level would — and the puzzle wouldn't work.
Common Knowledge
C_G φ—φis common knowledge inG. Formally the infinite conjunction: \(C_G\, \varphi \;\equiv\; \bigwedge_{k=1}^{\infty} E_G^k\, \varphi\)
Colloquially this matches the everyday sense of "common sense" or "common knowledge": not only does everyone know it, everyone knows everyone knows it, and so on at every depth.
Evolution of a Protocol
A useful lens on distributed protocols: they climb the knowledge hierarchy over time. The
operators, ordered by strength: D_G φ (weakest) → S_G φ → K_i φ → E_G φ → ...
→ C_G φ (strongest).
Two very common protocol moves:
K_i φ ⤳ E_G φ: one node learns a fact and publishes it to a group. The view server deciding the next primary and then announcing that decision is a canonical example.D_G φ ⤳ S_G φ: starting from distributed knowledge (the information exists, spread across nodes, but nobody individually has it), the protocol gathers it so that some specific node learnsφ. A brute-force way to do this is for every node to tell every other node everything it knows.
These transitions really do add information to the system. Protocol design is largely about climbing this hierarchy efficiently.
Gaining Common Knowledge Is Impossible
A key theorem from the paper: in a distributed system with unreliable communication,
common knowledge cannot be gained during an execution. Equivalently: if C_G φ holds at
any point in an execution, then C_G φ held at the very beginning. You can climb the
hierarchy by a lot, but you can never climb all the way to common knowledge.
We won't prove the general theorem, but we'll prove a closely related special case.
Impossibility of Coordinated Attack
Two halves of an army are camped on two hilltops, with a valley between them containing a village. Either half, attacking alone, is outnumbered; attacking together, they win. The only way to communicate between hilltops is to send messengers across the valley, traversing enemy territory, and there is some possibility that the messenger will be captured and not deliver its message. In other words, the only channel between the two halves is an unreliable network.
The coordinated attack problem: design a protocol such that both halves of the army attack at the same time.
Claim: no such protocol exists.
Proof sketch
Simplify: each hilltop communicates internally without loss, so model the situation as two
nodes A and B exchanging messages over an unreliable channel.
Suppose some protocol P solves coordinated attack. Run it on some execution that eventually
decides to attack. That execution exchanges some finite sequence of messages.
Look at the last message sent before they decide. Without loss of
generality, say it goes from A to B.
Now consider what happens if that last message is dropped. What information available to the nodes changes between the execution where the last message is delivered and the one where the last message is dropped?
B's information has changed, because in one execution it sees the last message, and the other it does not.- But
Aobserves exactly the same information in both executions. Whether or not the message was delivered is not observable toA: since this is the last message,Bnever sends a follow-up back, soAgets no feedback about delivery.A's execution looks identical in both worlds.
Since A observes the same information in both executions, A still attacks
even if the last message was dropped. But since the protocol P is correct,
both sides must attack together. This means B must also attack, even without
receiving the last message. So the last message wasn't needed: the protocol also
works without it.
Now suppose we modify the protocol to not even send this last message. What observable
information does this change? This time, B's information (compared to the dropping execution)
has not changed, so it must still attack. Since not sending a message doesn't affect any
other nodes' behavior, the modified protocol is still correct as long as A attacks as well.
This means we have reduced the protocol to need one fewer message without changing correctness.
Repeat the argument on the new last message, and again, and again. By induction, the protocol works with no messages at all. Both sides attack at a pre-agreed time with no communication. But that's not a solution to coordinated attack; it's just a fixed schedule, and it doesn't let the decision to attack depend on any information gathered on the hilltops.
This is the canonical shape of a distributed impossibility proof: assume a correct protocol, examine an execution, and find an indistinguishable execution (here, one with messages dropped) where correctness must also hold, yielding a contradiction.
Preview: Consensus and Paxos
Next week we turn to consensus: getting a group of nodes to agree on a single value. Consensus is what eventually lets us replace our single-node view server with a cluster that can tolerate individual failures, which is the subject of Lab 3. Learning Paxos well is one of the most important takeaways from this course.
Two points to preview:
- Paxos is based on majority voting.
- Unlike the everyday sense of picking a winning side of a debate, the nodes don't care which value is chosen; they only care that some value is chosen and they all agree on it. Majority voting is the mechanism that guarantees only one value is chosen.
- Consensus, like coordinated attack, is also impossible in full generality. The proof has the same flavor as the one above. The puzzle to ponder over the weekend: if consensus is impossible, how can Paxos solve it? The answer lies in what we mean by "solve," and in the specific fault model Paxos targets.