This course will focus on correctness. Distributed systems are particularly difficult to make correct. Concurrent updates to distributed state introduce subtle race conditions. Component failures are inevitable, and "distributed" means the system is expected to proceed despite them. What invariants and progress conditions ensure that systems operate correctly despite concurrency and failures?
This iteration of the class will NOT cover parallel computing.Prerequisities: You should have taken CSE 550 or CSE 551 or CSE 452. The course listing has a pre-requisite of CSE 551; this will not be enforced, and it won't matter if you haven't had it. However, it's essential that you understand and have experience with how to program with threads before taking this class; at UW, that can be accomplished with, e.g., CSE 451.
Further, we'll assume basic knowledge of distributed-systems topics, including: remote procedure calls (RPC), two-phase commit, distributed time, serializability, and MapReduce. This can be obtained by taking CSE 550 before CSE 552 (which is the normal order). A motivated student will be able to pick up these topics on their own. See the background section below for more information. Students having taken an undergraduate distributed-systems class will find about 25% of the material will overlap; in particular, it's permitted to take both CSE 452 and CSE 552.
Research Project: A major part of the course will be an independent research project on a topic of your choosing related to distributed systems, with team members of your choosing. A strict requirement is that every project must have a quantitative result — in other words, purely paper designs are not sufficient.
Course Reading and Discussion: Another major part of the course will be a group discussion of various assigned papers both before and during class. The goal will be to develop your ability to uncover the broader implications of research papers. Most systems research starts with this process: what can one conclude from a research result, beyond what's written by the authors?
For each class except the first, we'll have a forum thread that any student may post to and read. Each student must, before class begins, post a paragraph that makes an interesting point about one of that class's papers.
The class discussion will be divided into two parts. First, we'll discuss the plain content of the paper: What did the authors think the paper was about? Second, we'll examine the subtext and context of each paper: What do we (the lecturers and the students) think is really interesting about the paper? For instance, what limits and opportunities does the paper miss?
For the first part of the discussion, we'll lecture on mechanical elements of the papers that are hard to discuss without prepared slides. We'll then kick off an interactive discussion. The class-participation portion of your grade will depend on the extent to which you participate meaningfully in each such discussion.
Prior to each class, we'll post a short list of questions to get the discussion started, but we'll also cover topics from the forum thread and topics that arise naturally from the class discussion.
At the end of each class, we'll spend 5–10 minutes preparing you to read the papers for the next class. We'll provide background material that provides context for them, and we may suggest questions to think about while reading them.
https://canvas.uw.edu/courses/1272963The discussion board can be used for two purposes:
Your posts will be graded using the following simple 2-bit scale. If you don't post before the class, you get a 0. If you post something, but if there are significant problems with what you've posted, e.g., we're not convinced you read and thought about the paper, you get a 1. If you post something interesting and thoughtful, you get a 2; we expect most of your posts to get this grade. Once in a while, if you post something that is surprisingly insightful or otherwise exceptional, you'll earn a 3; these grades will be rare.
|Friday, April 12||5:00 pm||Initial one-page research project proposal due|
|Friday, April 19||5:00 pm||Problem set 1 due|
|Friday, May 3||5:00 pm||Three-page outline of research project, including a complete introduction, due|
|Sunday, May 5||5:00 pm||Problem set 2 due|
|Tuesday, May 21||5:00 pm||Problem set 3 due|
|Friday, May 24||5:00 pm||Five-page version of research project paper, including a complete discussion of related work, due|
|Monday, May 27||11:30 am – 12:50 pm||No class (Memorial Day)|
|Monday–Friday, June 10–14||Various||Research project presentations, scheduled individually|
|Tuesday, June 11||5:00 pm||Final research project paper due|
|Wednesday, June 12||2:00 pm – 4:20 pm||Final exam|
Here's the schedule of papers to be discussed during class. For each class, there's one primary paper you must read before class, and one or more optional papers we encourage you to also read before class. Your post to the forum thread before class can be on either the primary paper or an optional paper.
|Monday, April 1
Wednesday, April 3
Paxos and Raft
Class Intro slides
|Monday, April 8||Performance enhancements to Paxos||
|Wednesday, April 10||Distributed system verification||
|Monday, April 15||Chain replication||
|Wednesday, April 17||Byzantine fault tolerance||
|Monday, April 22||Distributed logging||
|Wednesday, April 24||Byzantine-fault-tolerant distributed logging|
|Monday, April 29||Blockchains|
|Wednesday, May 1||Peer-to-peer systems|
|Distributed file systems|
|Byzantine-fault-tolerant distributed file systems|
|Remote memory direct access (RDMA)|
We've provided a few suggestions for project ideas. But you're welcome (indeed, encouraged!) to devise your own project ideas not on that list.
The project will be due in five steps:
Everybody registered for the course should already have had an instructional UNIX account created for them by the department support staff, and have been notified of it. Using this account, you can remotely log into (via ssh) the attu.cs.washington.edu compute cluster. You can find more information about instructional resources here.
If the compute cluster doesn't meet your needs, you may want to consider one of the following experimental platforms, which we can help you get access to: