The Bysantine Generals Problem
Lamport, Shostak, and Pease

The basic purpose of this paper is to show how to build a system of
nodes that cooperate to distribute a value to all non-faulty nodes,
even in the presence of some number of byzantine faulty traitors.  The
fundamental question is always how much extra redundancy must be added
to handle the faulty nodes: none of the asynchronous-network problems
that make solutions fundamentally impossible are dealt with because of
some strong assumptions about the network (see below).

Since they start in the context of a collection of equal "generals,"
the first important reduction is to a "commander-lieutenant" model
where the goal is to communicate a value from the commander to the
lieutenants.  Two conditions must hold:

IC1: all loyal (non-faulty) lieutenants obey the same order (data)
IC2: if the commander is lotal, then all loyal lieutenants obey the
     actual order issued

Note that, despite the way they keep describing IC2 as implying IC1 if
the commander is loyal, the easier (for me) way to think about it is
partitioning: either the commander is loyal or not; if the commander
is loyal then IC2 must hold, and if not then IC1 must hold (the loyal
lieutenants all do the same thing, but the action they take may be
arbitrary).

There are two models they alternate between discussing: "oral
messages" and "signed messages."  This is not precisely the issue of
integrity of the data channel, because that is assumed anyway.  This
is the application-level equivalent: with signed messages a lieutenant
can pass on a message, and the integrity of the original checked later
on.  With only data-channel integrity all bets are off once a message
has passed through the application.  Signed messages greatly improve
the efficiency of the schemes.

They make three assumptions about the network: reliability (with a
correctness notion that implies integrity), authentication, and that
"the absenceof a message can be detected."  The last point, I believe,
implies bounded message delay.

The basic flavor of all the algorithms is that the commander sends the
order to all the lieutenants, and then the lieutenants communicate
among themselves to make sure the commander isn't a traitor (which
effectively means the commander sends different orders to different
lieutenants: if the same order is sent to all then it might as well be
correct).  If the commander is loyal then the first step alone assures
correct behavior; the real trick is making the second step both allow
for the discovery of a faulty commander and prevent a faulty
lieutenant from usurping the process.  In many ways it is the latter
that is most challenging.  Given a loyal commander the problem is
easy, unless a faulty lieutenant can scatter doubt later on, and cause
either non-consensus or consensus on the wrong value.  At least in the
case of a faulty commander we have more flexibility.

The OM(m) family of algorithms solves the oral message case for up to
m faulty nodes, assuming at least 3m+1 total nodes, and an enormous
number of messages.  At each step of the "consensus" phase each
lieutenant takes on the role of commander and resends his orders to
the other lieutenants (cutting out the commander).  Each step is
conducted recursively, cutting out one additional node (the commander
from the prior recursion), and the total number of recursive levels is
m.  Intuitively, the idea is to ensure that the recursive executions
where the traitors are cut out will outvote those that the traitors
can effect, and thus the fault-resistence increases as we work back up
the recursion.  

The SM(m) family uses signed messages, and is based on the notion of
passing orders between the the lieutenants, with each adding their own
signature and passing it on to all the others.  Orders are only valid
if they were first signed by the commander, so faulty lieutenants can
only effectively forward the orders they received.  If the commander
is loyal it's clear that this will work, and no consensus process can
affect the actions of the loyal lieutenants.  If the commander is
faulty then the lieutenants have to agree on a set of orders to
compute a majority function over and take that action.  The only thing
traitor lieutenants (yes, we can have both) can do is pass their
version of the commander's orders on to some but not all of the
others.  This is handled by bouncing the messages around, so that if a
traitor lieutenant gives his orders to any loyal lieutentant that
loyal lieutenant will make sure it ends up in everyone's view.

All the previous discussion assumed that all nodes could talk to all
others.  The paper goes on to discuss reduced graphs, showing a
sufficient condition for the oral message case and a necessary
condition for signed messages.  The so called m-regular graphs are a
sufficient condition to allow the obvious extension of the OM(m)
algorithms to work using simple routing over the network.  The
connectivity of these graphs is enough to ensure non-traitorous routes
between loyal nodes.  For the signed message case they show that it is
necessary and sufficient for the subgraph containing all loyal nodes
to be connected.

Finally, the paper talks about building reliable systems on top of the
byzantine generals algorithms.  As near as I can tell these are
basically the ideas in the Schneider paper, and I'll defer discussion.


Andy