Lamport Time, Clocks, and the Ordering of Events in a Distributed System As I see it, this paper hits on three important topics, with varying success. It starts by defining the "happens before" function and the logical clock scheme, both of which are somewhat abstract, but simple. Then there is a swing towards the concrete with the logical clock based locking system and finally a swing back (way too far back for my taste) towards strong clocks. The "happens before" relation and the logical clock ordering scheme are closely related. Perhaps the most important point to get from these sections is also the simplist: that it is important to understand the impact of concurrent events, which crops up in the notion that the "happens before" relation is partial, and for some pairs of events neither happens before the other. This is not a weakness or an inability to figure out which event happened first, but an acknowledgment that, with no causal relation between them, it not only doesn't matter which happened first, but that it makes no sense to define an ordering at all. "Happens before" allows us to apply an ordering to all events on all nodes, but only after the computation completes. The logical clock algorithm shows how to maintain the ordering as the computation is running, and be able to know that we will never find a situation that would require us to run time backwards (a decided no-no). One of the nice features of the logical clock algorithm (not really mentioned here) is that both the cost and the accuracy of the clock at each node is proportional to the rate at which that node is communicating with the others. Nodes that are communicating rapidly are able to do so without involving those nodes not actively participating. One important trick that allows is is to define sending and receiving a message as separate events occuring different logical times (also necessary to have a conflict-free ordering, I believe). This means that when an idle node begins to transmit it is able to record the sending at it's current logical time even though this time is far in the past from the point of view of the other nodes. When the message arrives the receive event will be given an appropriate timestamp for the destination node, so the fact that it appears to be a message from the past is unimportant. The great thing about the locking example is that it is so concrete. The end result is typical of heavily synchronized distributed algorithms: all nodes must in effect agree to the granting of the lock by sending out events timestamped later than the request: because the commnication is lossless and ordered we know that every node saw our request, and because they responded with something other than their own request with a lower timestamp, we know that they have accepted our claim to the resource. The scheme works because nobody is allowed to run time backwards. The only thing missing is a higher-level discussion of alternatives, either in terms of more efficient algorithms to solve the same problem (highly unlikely, and I suspect later on we will prove that you can't do much better) or the relation between weakening the requirements and system overhead. One important point for all of this paper and the next one is the assumption that communication channels are lossless and ordered. We know how to do both of these things (TCP), but the cost of pushing this into the infrastructure can be large, and I wonder how outrageously harder it would be to repeat some of this stuff relaxing one or both of these. Brief reflection suggests that for maintaining logical clocks you probably need both and are better off pushing them down, but I wonder if you can assume some figures for uncertainty on loss and reordering and then push through the clocking scheme and get a probability on ordering errors at the top. Finally, the paper delves into the question of extending the basic "happened before" relation into a true causal ordering, even in the face of messages between the parties external to the system (but still within the constraints of physics). All I really got out of this is that external messages can hose you, that you probably ought to define the problem away, or else you'll have to synchronize the clocks to within the speed-of-light delay between nodes. Andy