Understanding hte Limitations of Causally and Totally Ordered
Communication
Cheriton and Skeen

This paper basically argues that there is no point in building
causal-order multicast process-group systems (like ISIS, that they
call CATOCS for reasons that aren't totally clear since they don't
seem to talk about totally ordered systems at all), based on two
points:

1) There are lots of things we want for applications that CATOCS can't
   directly support, either because it loses ordering we cared about
   or it's just too inefficient, and

2) For each application they look at, they can "easily" see how to do
   it better at the application level without support from a
   communication infrastructure (which is what the CATOCS systems are,
   a layer between the network and the application).

I'm being a little harsh on the second point, because at the end they
do talk some about how to build a more useful infrastructure more as
an application-level library to support versioning and dependencies in
messages in contrast to the layered approach, but through most of the
paper I felt they were giving short-shrift to the advantages (if we
can do it) of having a library or layer that can be implemented once
and nicely bundles up a large chunk of complexity.

The meat of the paper is an analysis of several applications, arguing
why CATOCS fails for both reasons above.  I found myself wanting to
generalize to get to the higher level issues, but this isn't to say
that I didn't want to see the examples, only that many of the gory
details aren't so interesting taken in isolation.

The most important thing the paper points out is that causal ordering
does not capture everything about correct behavior.  They assert this
directly, but it seems to me the issue must be deeper, particularly
because causality seems like it should be the final arbiter.  In the
case of the option pricing/theoretical analysis example (figure 4), I
think the problem isn't that we call any events casually concurrent
when there turns out to be a dependence between them, but that our
definition of potential causality is too strong with respect to
received messages.  

The problem is that the user monitor sees the theoretical price of
26.25, which is based on the option price of 26.00, only after it sees
the subsequent option price of 26.50.  This causes the monitor to
think that the theoretical price is less than the option price, which
it in fact never is.  The paper suggests that the failure was in
deciding that the two sends (from the option pricing and the
theoretical pricing) were causally concurrent, thereby allowing the
messages to be received in any order.  But as I see it, the sends
*are* causally unrelated, and the problem is that, at the receiving
process, the receives are forced into a happens-before relationship
even though they should still be thought of as concurrent events and
handled independently.  Although one reception must happen before
another, in a fuzzy sense we can't say that a receive event can cause
another receive event without some intervening send.

I suppose, to sum up this example, the real problem is that, if we
require applications to manage this notion of causal concurrency
explicitly, we will have forced too complicated an interface on them
in terms of a communications infrastructure.  I would point out,
however, that the versioning system they propose basically does the
same thing, by making sure the application realizes that the 26.25
theoretical price is based on old data, so that it doesn't apply it
inappropriately.

One thing I objected to in the analysis of real-time systems is that
it seems to me that the timestamp-based approach (which they propose
as better than the CATOCS systems) misses some of what the causal
ordering would capture.  In the case of the logging engine, while I
agree that after the fact you could sort all the messages and find the
proper total order, if you were showing things in real-time you might
not be able to tell that there were messages "from the past" still
outstanding.  If you catch all the hidden channels, a causally ordered
communication system should be able to tell you this, because the
"later" message will contain references to the "from the past"
messages, so you'll know that there is stuff out there that you
haven't seen yet.