CSE 590YB/580: Distributed Systems
Instructor: Tom Anderson
TA: E Christopher Lewis
Class meets Wednesdays 6:30 - 9:30, in EE1 003
Office Hours: TBA
Class list: cse590yb@cs.washington.edu (hypermail archive)
(to subscribe: send mail to majordomo@cs.washington.edu with body
"subscribe cse590yb".)
Readings
- Collection of research papers
- Coulouris et al., Distributed Systems: Concepts and Design.
The textbook is for background only, in case it helps you understand
the papers.
Overview
The ongoing fusion of the computing, telecommunciations, and
consumer electronics industries is leading to widespread deployment
and use of distributed systems technologies. Distributed systems
are appearing at all granularities, from web sites implemented as
massive server farms, to wide area information management systems
such as Walmart and electronic markets, to pervasive computing
integrating thousands of small devices in a room into an invisible
human-centered computing and communications infrastructure.
In each case, there is a need for scalability, robustness, availability,
ease of use, security, simple programming models, etc.
Yet our conceptual understanding of how to build distributed
systems with these qualities is still very limited.
This course is designed to bring students up to the state
of the art in distributed systems research and to provide the tools
necessary to allow students to evaluate new technologies after
the course ends. A major focus of the course will be class discussion
based on readings from the book and research papers. Clearly, we could
all read the book and the papers on our own, so the class meetings are
designed to add significantly to that. As partial preparation for each
class, students will be asked to write a 1/2 page evaluation of one paper
per week, due by email at 4pm on the Wednesday the paper is to be
discussed in class. This will give me a chance to read them before class.
The evaluation should focus on critical reading -- independent of what
"spin" the authors put on the paper, what are the most important lessons
that the paper shows, and what are the most important drawbacks or
limitations to those lessons.
In addition, to make the discussion more concrete, there will be a series
of small programming assignments and problem sets. The course will
also have a final exam.
Grading
Programming assignments: 30%
Problem sets: 10%
Paper reviews: 10%
Class discussion: 25%
Final: 25%
One written paper review is due each week prior to class; you may choose
which of the two papers you wish to review. In addition, you
get one "slip" week in case you fall behind; for that week, you do not
have to submit a review. However, you are responsible for coming to class
prepared -- having read and understood both papers, not just the one you
wrote up.
Listed below there is a programming assignment per week; in addition,
we will hand out a problem set per week.
You are responsible for doing either the programming assignment or
the problem set each week; you can choose which but you must do a total
of four of the programming assignments and four problem sets.
As with the paper reviews, you get one slip week where you can skip
doing anything, and there will be no programming assignment or problem
set the last week. Warning: the programming assignments get
harder as the course progresses, so don't leave them all until the
end of the quarter. Programming assignments can be done in teams
of 2-3; problem sets should be done individually.
Click here to enter or read reviews.
Syllabus
Week 1: Motivating Examples
- Eric Brewer, Lessons from Giant Scale Servers.
(postscript)
- M. Esler, J. Hightower, T. Anderson, and G. Borriello.
Data-Centric Networking for Invisible Computing.
Proc. of Mobicom 99, August 1999.
(postscript)
- Programming assignment: build an ad hoc routing system for mobile devices
or a build a web server front end. (sample problem set solution)
Week 2: Communication
- Coulouris, Chap. 5 (background, read if needed)
- Nelson et al., Network Objects,
Proceedings of the Sixteenth Symposium on Operating Systems Principles,
October 1997.
- Anthony D. Joseph, Alan F. deLespinasse, Joshua A. Tauber,
David K. Gifford, and M. Frans Kaashoek.
Rover: A Toolkit for Mobile Information Access.
Proceedings of the Fifteenth Symposium on Operating Systems Principles,
December 1995. (postscript)
Week 3: Distributed Synchronization
- Coulouris, Chap. 10 (background, read if needed)
- Leslie Lamport.
Time, Clocks, and the Ordering of Events in a Distributed System.
Communications of the ACM, Vol. 21, No. 7 (July 1978), pp. 558-565.
- Chandy and Lamport, Distributed Snapshots: Determining the Global
States of a Distributed System, ACM TOCS, pp. 63-75, Feb. 1985.
(postscript)
- Programming assignment: build a clock synchronization algorithm.
Week 4: Distributed Agreement
- Leslie Lamport, Part Time Parliament, TOCS vol. 16, no. 2, 133-169.
(postscript)
- Castro and Liskov, Practical Byzantine Fault Tolerance, OSDI 98.
(html/ps)
- Programming assignment: implement the PTP consensus protocol.
(homework4.rtf)
Week 5: Process Groups
- Coulouris, Chap. 11 (background, if needed)
- Kenneth P. Birman.
The Process Group Approach to Reliable Distributed Computing.
Communications of the ACM 36(12), December 1993, pp. 37-53.
(pdf)
- Cheriton and Skeen.
Understanding the Limitations of Causally and Totally Ordered
Communication. Proc. of SOSP, December 1993.
(pdf)
- Programming assignment: implement causally ordered message
passing. (homework5.rtf)
Week 6: Replication
- Coulouris, Chap. 14 (background, if needed)
- Bruce Lindsay, Laura Haas, C. Mohan, Paul Wilms and Robert Yost.
Computation and Communication in R*: A Distributed Database Manager.
ACM Trans. on Computer Systems 2(1), February 1984, pp. 24-38.
- Ladin, B. Liskov, L. Shrira and S. Ghemawat.
Providing High Availability Using Lazy Replication.
ACM Transactions on Computer Systems, vol. 10 (4), pp. 360--391,
November 18 1992.
(pdf)
- Programming assignment: implement two-phase commit.
(homework6.rtf)
Week 7: Disconnected Operation
- James Kistler and M. Satyanarayanan.
Disconnected Operation in the Coda File System.
ACM Trans. on Computer Systems 10(1), February 1992, pp. 3-25.
(pdf)
-
Douglas B. Terry, Marvin M. Theimer, Karin Peterson, Alan J. Demers,
Mike J. Spreitzer, and Carl H. Hauser.
Managing Update Conflicts in Bayou, a Weakly Connected Replicated
Storage System.
Proceedings of the 15th ACM Symposium on Operating Systems Principles,
December, 1995, p. 172-183.
(ps/pdf/txt)
- Programming assignment: implement two-phase commit.
(homework7.rtf)
Week 8: Scalability
T. Anderson, M. Dahlin, J. Neefe, D. Patterson, D. Roselli,
and R. Wang. Serverless Network File Systems.
ACM Trans. on Computer Systems 14(1), Feb. 1996, pp. 41-79.
(postscript)
- Yasushi Saito, Brian Bershad, and Hank Levy.
Manageability, availability and performance in Porcupine: a highly
scalable, cluster-based mail service.
Proc. 17th Symposium on Operating System Principles, December 1999.
(postscript)
- Programming assignment: implement a sequentially consistent distributed
cache coherence protocol.
Week 9: Security and Robustness
- Coulouris, Chap. 16 (background, if needed)
- Abadi and Needham, Prudent Engineering Practices for Cryptographic Protocols,
SRC report 125
(postscript)
- Savage et al., Robust Protocol Design in Uncooperative Environments
- Programming assignment: hack TCP to allow a browser to surf at
full link speed. See also: Stefan Savage, Neal Cardwell, David Wetherall
and Tom Anderson. ``TCP Congestion Control with a Misbehaving Receiver."
ACM Computer Communications Review, v 29, no 5, October, 1999.
Week 10: Experience
- Michael D. Schroeder, Andrew D. Birrell and Roger M. Needham.
Experience with Grapevine: The Growth of a Distributed System.
ACM Trans. on Computer Systems 2(1), February 1984, pp.3-23.
- J. Ousterhout. The Role of Distributed State.
CMU Computer Science: A 25th Anniversary Commemorative.
ACM Press Anthology Series, R. Rashid (Ed.), July 1991.
E. Christopher Lewis
Last modified: Thu May 11 01:19:25 PDT 2000