CSE 590YB/580: Distributed Systems

Instructor: Tom Anderson
TA: E Christopher Lewis
Class meets Wednesdays 6:30 - 9:30, in EE1 003
Office Hours: TBA
Class list: cse590yb@cs.washington.edu (hypermail archive)
(to subscribe: send mail to majordomo@cs.washington.edu with body "subscribe cse590yb".)

Readings

Collection of research papers
Coulouris et al., Distributed Systems: Concepts and Design.
The textbook is for background only, in case it helps you understand the papers.

Overview

The ongoing fusion of the computing, telecommunciations, and consumer electronics industries is leading to widespread deployment and use of distributed systems technologies. Distributed systems are appearing at all granularities, from web sites implemented as massive server farms, to wide area information management systems such as Walmart and electronic markets, to pervasive computing integrating thousands of small devices in a room into an invisible human-centered computing and communications infrastructure. In each case, there is a need for scalability, robustness, availability, ease of use, security, simple programming models, etc. Yet our conceptual understanding of how to build distributed systems with these qualities is still very limited.

This course is designed to bring students up to the state of the art in distributed systems research and to provide the tools necessary to allow students to evaluate new technologies after the course ends. A major focus of the course will be class discussion based on readings from the book and research papers. Clearly, we could all read the book and the papers on our own, so the class meetings are designed to add significantly to that. As partial preparation for each class, students will be asked to write a 1/2 page evaluation of one paper per week, due by email at 4pm on the Wednesday the paper is to be discussed in class. This will give me a chance to read them before class. The evaluation should focus on critical reading -- independent of what "spin" the authors put on the paper, what are the most important lessons that the paper shows, and what are the most important drawbacks or limitations to those lessons.

In addition, to make the discussion more concrete, there will be a series of small programming assignments and problem sets. The course will also have a final exam.

Grading

Programming assignments: 30%
Problem sets: 10%
Paper reviews: 10%
Class discussion: 25%
Final: 25%

One written paper review is due each week prior to class; you may choose which of the two papers you wish to review. In addition, you get one "slip" week in case you fall behind; for that week, you do not have to submit a review. However, you are responsible for coming to class prepared -- having read and understood both papers, not just the one you wrote up.

Listed below there is a programming assignment per week; in addition, we will hand out a problem set per week. You are responsible for doing either the programming assignment or the problem set each week; you can choose which but you must do a total of four of the programming assignments and four problem sets. As with the paper reviews, you get one slip week where you can skip doing anything, and there will be no programming assignment or problem set the last week. Warning: the programming assignments get harder as the course progresses, so don't leave them all until the end of the quarter. Programming assignments can be done in teams of 2-3; problem sets should be done individually.

Click here to enter or read reviews.

Syllabus

Week 1: Motivating Examples

Eric Brewer, Lessons from Giant Scale Servers. (postscript)
M. Esler, J. Hightower, T. Anderson, and G. Borriello. Data-Centric Networking for Invisible Computing. Proc. of Mobicom 99, August 1999. (postscript)
Programming assignment: build an ad hoc routing system for mobile devices or a build a web server front end. (sample problem set solution)

Week 2: Communication

Coulouris, Chap. 5 (background, read if needed)
Nelson et al., Network Objects, Proceedings of the Sixteenth Symposium on Operating Systems Principles, October 1997.
Anthony D. Joseph, Alan F. deLespinasse, Joshua A. Tauber, David K. Gifford, and M. Frans Kaashoek. Rover: A Toolkit for Mobile Information Access. Proceedings of the Fifteenth Symposium on Operating Systems Principles, December 1995. (postscript)

Week 3: Distributed Synchronization

Coulouris, Chap. 10 (background, read if needed)
Leslie Lamport. Time, Clocks, and the Ordering of Events in a Distributed System. Communications of the ACM, Vol. 21, No. 7 (July 1978), pp. 558-565.
Chandy and Lamport, Distributed Snapshots: Determining the Global States of a Distributed System, ACM TOCS, pp. 63-75, Feb. 1985. (postscript)
Programming assignment: build a clock synchronization algorithm.

Week 4: Distributed Agreement

Leslie Lamport, Part Time Parliament, TOCS vol. 16, no. 2, 133-169. (postscript)
Castro and Liskov, Practical Byzantine Fault Tolerance, OSDI 98. (html/ps)
Programming assignment: implement the PTP consensus protocol. (homework4.rtf)

Week 5: Process Groups

Coulouris, Chap. 11 (background, if needed)
Kenneth P. Birman. The Process Group Approach to Reliable Distributed Computing. Communications of the ACM 36(12), December 1993, pp. 37-53. (pdf)
Cheriton and Skeen. Understanding the Limitations of Causally and Totally Ordered Communication. Proc. of SOSP, December 1993. (pdf)
Programming assignment: implement causally ordered message passing. (homework5.rtf)

Week 6: Replication

Coulouris, Chap. 14 (background, if needed)
Bruce Lindsay, Laura Haas, C. Mohan, Paul Wilms and Robert Yost. Computation and Communication in R*: A Distributed Database Manager. ACM Trans. on Computer Systems 2(1), February 1984, pp. 24-38.
Ladin, B. Liskov, L. Shrira and S. Ghemawat. Providing High Availability Using Lazy Replication. ACM Transactions on Computer Systems, vol. 10 (4), pp. 360--391, November 18 1992. (pdf)
Programming assignment: implement two-phase commit. (homework6.rtf)

Week 7: Disconnected Operation

James Kistler and M. Satyanarayanan. Disconnected Operation in the Coda File System. ACM Trans. on Computer Systems 10(1), February 1992, pp. 3-25. (pdf)
Douglas B. Terry, Marvin M. Theimer, Karin Peterson, Alan J. Demers, Mike J. Spreitzer, and Carl H. Hauser. Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System. Proceedings of the 15th ACM Symposium on Operating Systems Principles, December, 1995, p. 172-183. (ps/pdf/txt)
Programming assignment: implement two-phase commit. (homework7.rtf)

Week 8: Scalability

postscript

Yasushi Saito, Brian Bershad, and Hank Levy. Manageability, availability and performance in Porcupine: a highly scalable, cluster-based mail service. Proc. 17th Symposium on Operating System Principles, December 1999. (postscript)
Programming assignment: implement a sequentially consistent distributed cache coherence protocol.

Week 9: Security and Robustness

Coulouris, Chap. 16 (background, if needed)
Abadi and Needham, Prudent Engineering Practices for Cryptographic Protocols, SRC report 125 (postscript)
Savage et al., Robust Protocol Design in Uncooperative Environments
Programming assignment: hack TCP to allow a browser to surf at full link speed. See also: Stefan Savage, Neal Cardwell, David Wetherall and Tom Anderson. ``TCP Congestion Control with a Misbehaving Receiver." ACM Computer Communications Review, v 29, no 5, October, 1999.

Week 10: Experience

Michael D. Schroeder, Andrew D. Birrell and Roger M. Needham. Experience with Grapevine: The Growth of a Distributed System. ACM Trans. on Computer Systems 2(1), February 1984, pp.3-23.
J. Ousterhout. The Role of Distributed State. CMU Computer Science: A 25th Anniversary Commemorative. ACM Press Anthology Series, R. Rashid (Ed.), July 1991.

E. Christopher Lewis

Last modified: Thu May 11 01:19:25 PDT 2000