CSE 461: Introduction to Computer-Communication Networks, Autumn 2012

The basic operation of SNet requires that a node be able to determine which of two update messages generated by some other member is the most recent. Thus, generation numbers must be ordered. In our case, they are strictly increasing.

A practical issue that immediately arises as a side effect is that transient bugs can have long term consequences:

If a node, say elmo, loses its own generation (due, say, to a disk crash), and then restarts its generation numbering at 0, it may take it a long time to reach its now lost generation number. During that time all of elmo's updates are ignored by other nodes because they have generation numbers lower than maximum generation number already seen by those nodes for elmo updates.
If any node anywhere in the system mistakenly injects a record with a very high sequence number for, say, elmo, then none of elmo's legitimate updates will be accepted by any other node for a potentially very long time.

There are a number of ways to deal with these problems, but the approach taken in SNet is to base generation numbers on a real-time clock. In particular, SNet uses the clock implemented as NetBase.theNetBase().now(), which is the local system's best guess at the number of seconds that have passed since midnight, January 1, 1970 UTC. So long as nodes don't somehow manage to have their time estimate go backwards (even across various kinds of failures), this solves the first problem noted above. The second problem is solved by having all nodes refuse updates with generation numbers that are too far in the future, relative to their own clocks.

In SNet, a node refuses an update if its generation number is more than 10 seconds in the future, presuming that such an update was caused by a bug.

This scheme isn't perfect. For one thing, unless the time estimates on all the nodes are all reasonbly accurate, it will be impossible to communicate. Because systems typically engage in what is essentially a global, distributed, clock synchronization protocol, it's not terribly optimistic to hope for synchronization within 10 seconds these days.

The other problem is that the clock value changes only once per second. That may, or may not, be fast enough to assign distinct generation numbers to distinct updates, even on a single node. The fix for this is simple:

the generation number assigned to an update is the maximum of the current clock value and the old generation number plus one.

Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA 98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
[comments to zahorjan at cs.washington.edu]