A Reliable Multicast Framework for Light-weight Sessions and Application Level Framing S. Floyd, V. Jacobson, C. Liu, S. McCanne, and L. Zhang This paper describes the Scalable Reliable Multicast (SRM) system, which was based on the Clark and Tennenhouse, Application Level Framing. The paper proposes a reliable multicast system that intentionally leaves ordered delivery as an add-on feature. Another key point about SRM is that it heavily leverages the existing IP multicast protocol, where receivers announce that they are interested in joining a multicast without having any knowledge of the group membership. The paper also does a good job of explaining how many unicast concepts do not apply well in a multicast world, such as loss detection/recovery, scalability, and shared communication state. The authors also describes "wb", a distributed whiteboard tool, which was an application based on the SRM framework. I thought this was really helpful as a concrete example of a multicast application that has limited use for ordering. One particularly interesting anecdote was how drawing operations are idempotent and therefore don't need ordering information, with the exception of "delete", which can be "patched after the fact". Also, the assumptions for wb's design explain much of the SRM design, such as unique and persistent names for data and sources, as well as the indistinction between senders and receivers within a multicast group. Loss detection and repair is the main focus of this paper. This is accomplished by holding each individual responsible to detect loss and then request retransmission. Even this request is multicast to save other nodes who had the same problem from having to make the same request. What is interesting about this model is how to prevent too many duplicate requests for repairs when multiple nodes miss something, plus how to reduce useless repair messages taking up bandwidth across parts of the network that did not experience the loss. The first issue is solved with timing or randomness, depending on the network topology. The second issue is addressed using the concept of local neighborhoods. Chains are topologies that can easily take advantage of "deterministic suppression" by using timers to control when losses are reported to allow for others that are closer to the source of the problem to respond first. However, when the authors state that "we assume that packets take one unit of time to travel each link", I did wonder what this implied about the use of synchronized clocks. Stars use "probabilistic suppression", which uses randomness since the time/distance is indistinguishable across any pair of nodes. Finally, bounded-degree trees use a combination. The paper describes experiments to validate the approaches, but notes that no one set of timing values will fit all scenarios. The authors then describe an adaptive algorithm that adjusts some of the timing values depending on factors such as message transmission frequency. The local recovery section was interesting, but I felt like it wasn't fully fleshed out. How do you know when to send to your local neighborhood, instead of the global network. Wouldn't you really want to send to one node beyond the local neighborhood to find out how limited the problem is. I did notice that in the future work section they mentioned spending more time on this. Overall, I thought this seemed to be a much more constructive look at the problem with CATOCS that Cheriton and Skeen had raised.