Skip to main content

Lecture 9: Knowledge — Notes

These are notes from the lecture on April 17, 2026. See also the whiteboard descriptions and the whiteboard PDF.

These materials were drafted by AI based on the live whiteboard PDF and audio transcript from the corresponding lecture and then reviewed and edited by course staff. They may contain errors. Please let us know if you spot any.

Context

Today's topic is knowledge. The optional reading is Halpern and Moses, Knowledge and Common Knowledge in a Distributed Environment. That paper connects modal logics of knowledge to distributed systems. The most basic kind of logic (eg propositional logic from CSE 311) talk about whether facts are true or false; logics of knowledge let you talk about who knows which facts, from inside the logic.

This is useful for analyzing distributed systems because we think of nodes as agents that may or may not know things. We already speak informally this way: "the view server knows the primary has completed state transfer." That has a precise technical meaning, not a wishy-washy one, and it's closely tied to protocol design.

The Muddy Foreheads Puzzle

Setup:

  • n school children sit in a circle.
  • k of them (where k ≥ 1) have mud on their forehead. Each child can see everyone else's forehead but not their own. The children do not know k.
  • The teacher (who can see everyone) announces: "Somebody has mud on their forehead."
  • The teacher then plays a game: "Raise your hand if you know you have mud on your forehead."
    • If at least one child raises a hand and is correct, the class wins.
    • If any child raises a hand and is wrong, everyone loses.
    • If nobody raises a hand, the teacher pauses and asks again.
  • The children are perfectly rational and always follow instructions — meaning they won't raise a hand unless they are certain, and they can reason about what other rational children would do.
  • The children cannot communicate during the game.

The question: is there a strategy that lets them win?

The k = 1 case

Suppose there is exactly one muddy child. That child looks around and sees zero muddy foreheads. The teacher announced that at least one child has mud, so the only possibility is "me." That child raises their hand on the first round.

The k = 2 case

Now suppose there are exactly two muddy children. Each muddy child sees exactly one muddy forehead. From their perspective, k is either 1 or 2, and the missing case would be "me." They wait one round. If k were really 1, the sole muddy child would have raised their hand in round 1 (by the reasoning above). They didn't — so k must be 2, and the second muddy child is me. Both muddy children realize this simultaneously and raise their hands in round 2.

The general case

By induction: if there are k muddy children, each of them sees exactly k-1 muddy foreheads and reasons that k is either k-1 or k. The k-1 case would have been resolved in round k-1; when it isn't, all k muddy children raise their hands on round k.

Why the teacher's announcement matters

A seeming paradox: if k ≥ 2, every child can already see at least one muddy forehead, so everyone already knows "somebody has mud." Why does the teacher's announcement add anything?

The announcement is the base case of the induction. Without it, the k = 1 case cannot be resolved, so the k = 2 case has nothing to infer from a silent round 1, and so on. The announcement turns a fact that everyone individually knows into something stronger: something everyone knows that everyone knows, and iterated up to any depth. That stronger form of knowledge is what the protocol needs. In particular, the announcement let's the children reason about what other children will do (since they know that the other children know that someone has mud on their forehead).

Differences from distributed systems

The puzzle isn't a perfect model of distributed systems:

  • Real servers can (and do) talk to each other.
  • The puzzle is synchronous: rounds advance in lockstep, and every child can observe every other child's action at the same time. Distributed systems usually don't work this way.

But there are also useful connections to distrubuted systems: the need to reason about which nodes know which facts.

Kinds of Knowledge

The paper introduces several operators for talking about knowledge. Let i range over nodes, G range over sets of nodes, and φ be a proposition.

  • K_i φnode i knows φ. Example: K_VS (server 1 is alive) would mean the view server knows server 1 is alive. (As it happens, failure detection is impossible, so this fact isn't knowable — but that's how you'd write it if it were.)
    • We will take it as an assumption that K_i φ → φ. In other words, if you know a fact, the fact must actually be true.
  • S_G φsomeone in G knows φ.
  • D_G φdistributed knowledge of φ: if you combined everything every member of G knows, you'd be able to derive φ. No individual need actually know it. Muddy foreheads is a nice example: no single child knows the full state of foreheads, but the group's combined knowledge does.
  • E_G φeveryone in G knows φ.
  • E_G^k φ — iterated: everyone knows that everyone knows that ... (k times) ... φ.

In a message-passing system (and in the muddy children puzzle), the levels are strictly distinct: for every k, there are situations where E_G^{k-1} φ holds but E_G^k φ does not. Muddy foreheads is a good example. With k = 2 muddy children Alice and Bob, let m = "someone is muddy". Then, before the teacher speaks:

  • E_G m holds — each sees a muddy forehead.
  • E_G^2 m fails — Alice doesn't know whether she's muddy, so she considers it possible Bob is the only muddy child, in which case Bob sees no mud and doesn't know m.

With k muddy children, E_G^{k-1} m holds but E_G^k m fails by a nested version of the same argument.

The teacher's public announcement jumps straight to C_G m: everyone hears it, everyone sees everyone hear it, and so on. Had the teacher whispered to each child privately, E_G m would hold but no higher level would — and the puzzle wouldn't work.

Common Knowledge

  • C_G φφ is common knowledge in G. Formally the infinite conjunction: \(C_G\, \varphi \;\equiv\; \bigwedge_{k=1}^{\infty} E_G^k\, \varphi\)

Colloquially this matches the everyday sense of "common sense" or "common knowledge": not only does everyone know it, everyone knows everyone knows it, and so on at every depth.

Evolution of a Protocol

A useful lens on distributed protocols: they climb the knowledge hierarchy over time. The operators, ordered by strength: D_G φ (weakest) S_G φ K_i φ E_G φ ... C_G φ (strongest).

Two very common protocol moves:

  • K_i φ ⤳ E_G φ: one node learns a fact and publishes it to a group. The view server deciding the next primary and then announcing that decision is a canonical example.
  • D_G φ ⤳ S_G φ: starting from distributed knowledge (the information exists, spread across nodes, but nobody individually has it), the protocol gathers it so that some specific node learns φ. A brute-force way to do this is for every node to tell every other node everything it knows.

These transitions really do add information to the system. Protocol design is largely about climbing this hierarchy efficiently.

Gaining Common Knowledge Is Impossible

A key theorem from the paper: in a distributed system with unreliable communication, common knowledge cannot be gained during an execution. Equivalently: if C_G φ holds at any point in an execution, then C_G φ held at the very beginning. You can climb the hierarchy by a lot, but you can never climb all the way to common knowledge.

We won't prove the general theorem, but we'll prove a closely related special case.

Impossibility of Coordinated Attack

Two halves of an army are camped on two hilltops, with a valley between them containing a village. Either half, attacking alone, is outnumbered; attacking together, they win. The only way to communicate between hilltops is to send messengers across the valley, traversing enemy territory, and there is some possibility that the messenger will be captured and not deliver its message. In other words, the only channel between the two halves is an unreliable network.

The coordinated attack problem: design a protocol such that both halves of the army attack at the same time.

Claim: no such protocol exists.

Proof sketch

Simplify: each hilltop communicates internally without loss, so model the situation as two nodes A and B exchanging messages over an unreliable channel.

Suppose some protocol P solves coordinated attack. Run it on some execution that eventually decides to attack. That execution exchanges some finite sequence of messages.

Look at the last message sent before they decide. Without loss of generality, say it goes from A to B.

Now consider what happens if that last message is dropped. What information available to the nodes changes between the execution where the last message is delivered and the one where the last message is dropped?

  • B's information has changed, because in one execution it sees the last message, and the other it does not.
  • But A observes exactly the same information in both executions. Whether or not the message was delivered is not observable to A: since this is the last message, B never sends a follow-up back, so A gets no feedback about delivery. A's execution looks identical in both worlds.

Since A observes the same information in both executions, A still attacks even if the last message was dropped. But since the protocol P is correct, both sides must attack together. This means B must also attack, even without receiving the last message. So the last message wasn't needed: the protocol also works without it.

Now suppose we modify the protocol to not even send this last message. What observable information does this change? This time, B's information (compared to the dropping execution) has not changed, so it must still attack. Since not sending a message doesn't affect any other nodes' behavior, the modified protocol is still correct as long as A attacks as well. This means we have reduced the protocol to need one fewer message without changing correctness.

Repeat the argument on the new last message, and again, and again. By induction, the protocol works with no messages at all. Both sides attack at a pre-agreed time with no communication. But that's not a solution to coordinated attack; it's just a fixed schedule, and it doesn't let the decision to attack depend on any information gathered on the hilltops.

This is the canonical shape of a distributed impossibility proof: assume a correct protocol, examine an execution, and find an indistinguishable execution (here, one with messages dropped) where correctness must also hold, yielding a contradiction.

Preview: Consensus and Paxos

Next week we turn to consensus: getting a group of nodes to agree on a single value. Consensus is what eventually lets us replace our single-node view server with a cluster that can tolerate individual failures, which is the subject of Lab 3. Learning Paxos well is one of the most important takeaways from this course.

Two points to preview:

  • Paxos is based on majority voting.
    • Unlike the everyday sense of picking a winning side of a debate, the nodes don't care which value is chosen; they only care that some value is chosen and they all agree on it. Majority voting is the mechanism that guarantees only one value is chosen.
  • Consensus, like coordinated attack, is also impossible in full generality. The proof has the same flavor as the one above. The puzzle to ponder over the weekend: if consensus is impossible, how can Paxos solve it? The answer lies in what we mean by "solve," and in the specific fault model Paxos targets.