The Bysantine Generals Problem Lamport, Shostak, and Pease The basic purpose of this paper is to show how to build a system of nodes that cooperate to distribute a value to all non-faulty nodes, even in the presence of some number of byzantine faulty traitors. The fundamental question is always how much extra redundancy must be added to handle the faulty nodes: none of the asynchronous-network problems that make solutions fundamentally impossible are dealt with because of some strong assumptions about the network (see below). Since they start in the context of a collection of equal "generals," the first important reduction is to a "commander-lieutenant" model where the goal is to communicate a value from the commander to the lieutenants. Two conditions must hold: IC1: all loyal (non-faulty) lieutenants obey the same order (data) IC2: if the commander is lotal, then all loyal lieutenants obey the actual order issued Note that, despite the way they keep describing IC2 as implying IC1 if the commander is loyal, the easier (for me) way to think about it is partitioning: either the commander is loyal or not; if the commander is loyal then IC2 must hold, and if not then IC1 must hold (the loyal lieutenants all do the same thing, but the action they take may be arbitrary). There are two models they alternate between discussing: "oral messages" and "signed messages." This is not precisely the issue of integrity of the data channel, because that is assumed anyway. This is the application-level equivalent: with signed messages a lieutenant can pass on a message, and the integrity of the original checked later on. With only data-channel integrity all bets are off once a message has passed through the application. Signed messages greatly improve the efficiency of the schemes. They make three assumptions about the network: reliability (with a correctness notion that implies integrity), authentication, and that "the absenceof a message can be detected." The last point, I believe, implies bounded message delay. The basic flavor of all the algorithms is that the commander sends the order to all the lieutenants, and then the lieutenants communicate among themselves to make sure the commander isn't a traitor (which effectively means the commander sends different orders to different lieutenants: if the same order is sent to all then it might as well be correct). If the commander is loyal then the first step alone assures correct behavior; the real trick is making the second step both allow for the discovery of a faulty commander and prevent a faulty lieutenant from usurping the process. In many ways it is the latter that is most challenging. Given a loyal commander the problem is easy, unless a faulty lieutenant can scatter doubt later on, and cause either non-consensus or consensus on the wrong value. At least in the case of a faulty commander we have more flexibility. The OM(m) family of algorithms solves the oral message case for up to m faulty nodes, assuming at least 3m+1 total nodes, and an enormous number of messages. At each step of the "consensus" phase each lieutenant takes on the role of commander and resends his orders to the other lieutenants (cutting out the commander). Each step is conducted recursively, cutting out one additional node (the commander from the prior recursion), and the total number of recursive levels is m. Intuitively, the idea is to ensure that the recursive executions where the traitors are cut out will outvote those that the traitors can effect, and thus the fault-resistence increases as we work back up the recursion. The SM(m) family uses signed messages, and is based on the notion of passing orders between the the lieutenants, with each adding their own signature and passing it on to all the others. Orders are only valid if they were first signed by the commander, so faulty lieutenants can only effectively forward the orders they received. If the commander is loyal it's clear that this will work, and no consensus process can affect the actions of the loyal lieutenants. If the commander is faulty then the lieutenants have to agree on a set of orders to compute a majority function over and take that action. The only thing traitor lieutenants (yes, we can have both) can do is pass their version of the commander's orders on to some but not all of the others. This is handled by bouncing the messages around, so that if a traitor lieutenant gives his orders to any loyal lieutentant that loyal lieutenant will make sure it ends up in everyone's view. All the previous discussion assumed that all nodes could talk to all others. The paper goes on to discuss reduced graphs, showing a sufficient condition for the oral message case and a necessary condition for signed messages. The so called m-regular graphs are a sufficient condition to allow the obvious extension of the OM(m) algorithms to work using simple routing over the network. The connectivity of these graphs is enough to ensure non-traitorous routes between loyal nodes. For the signed message case they show that it is necessary and sufficient for the subgraph containing all loyal nodes to be connected. Finally, the paper talks about building reliable systems on top of the byzantine generals algorithms. As near as I can tell these are basically the ideas in the Schneider paper, and I'll defer discussion. Andy