|
TA: Niel Lebeck (nl35 AT cs)
Office hours: by appointment only
Classes: MW 11:30am–12:50pm in CSE2 G04
This course will focus on correctness. Distributed systems are particularly difficult to make correct. Concurrent updates to distributed state introduce subtle race conditions. Component failures are inevitable, and "distributed" means the system is expected to proceed despite them. What invariants and progress conditions ensure that systems operate correctly despite concurrency and failures?
This iteration of the class will NOT cover parallel computing.
Prerequisities: You should have taken CSE 550 or CSE 551 or CSE 452. The course listing has a pre-requisite of CSE 551; this will not be enforced, and it won't matter if you haven't had it. However, it's essential that you understand and have experience with how to program with threads before taking this class; at UW, that can be accomplished with, e.g., CSE 451.Further, we'll assume basic knowledge of distributed-systems topics, including: remote procedure calls (RPC), two-phase commit, distributed time, serializability, and MapReduce. This can be obtained by taking CSE 550 before CSE 552 (which is the normal order). A motivated student will be able to pick up these topics on their own. See the background section below for more information. Students having taken an undergraduate distributed-systems class will find about 25% of the material will overlap; in particular, it's permitted to take both CSE 452 and CSE 552.
Research Project: A major part of the course will be an independent research project on a topic of your choosing related to distributed systems, with team members of your choosing. A strict requirement is that every project must have a quantitative result — in other words, purely paper designs are not sufficient.
Course Reading and Discussion: Another major part of the course will be a group discussion of various assigned papers both before and during class. The goal will be to develop your ability to uncover the broader implications of research papers. Most systems research starts with this process: what can one conclude from a research result, beyond what's written by the authors?
For each class except the first, we'll have a forum thread that any student may post to and read. Each student must, before class begins, post a paragraph that makes an interesting point about one of that class's papers.
The class discussion will be divided into two parts. First, we'll discuss the plain content of the paper: What did the authors think the paper was about? Second, we'll examine the subtext and context of each paper: What do we (the lecturers and the students) think is really interesting about the paper? For instance, what limits and opportunities does the paper miss?
For the first part of the discussion, we'll lecture on mechanical elements of the papers that are hard to discuss without prepared slides. We'll then kick off an interactive discussion. The class-participation portion of your grade will depend on the extent to which you participate meaningfully in each such discussion.
Prior to each class, we'll post a short list of questions to get the discussion started, but we'll also cover topics from the forum thread and topics that arise naturally from the class discussion.
At the end of each class, we'll spend 5–10 minutes preparing you to read the papers for the next class. We'll provide background material that provides context for them, and we may suggest questions to think about while reading them.
https://canvas.uw.edu/courses/1272963The discussion board can be used for two purposes:
Your posts will be graded using the following simple 2-bit scale. If you don't post before the class, you get a 0. If you post something, but if there are significant problems with what you've posted, e.g., we're not convinced you read and thought about the paper, you get a 1. If you post something interesting and thoughtful, you get a 2; we expect most of your posts to get this grade. Once in a while, if you post something that is surprisingly insightful or otherwise exceptional, you'll earn a 3; these grades will be rare.
Date | Time | Event |
Friday, April 12 | 5:00 pm | Initial one-page research project proposal due |
Friday, April 19 | 5:00 pm | Problem set 1 due |
Friday, May 3 | 5:00 pm | Three-page outline of research project, including a complete introduction, due |
Friday, May 10 | 5:00 pm | Problem set 2 due |
Friday, May 24 | 5:00 pm | Five-page version of research project paper, including a complete discussion of related work, due |
Monday, May 27 | 11:30 am – 12:50 pm | No class (Memorial Day) |
Friday, May 31 | 5:00 pm | Problem set 3 due |
Tuesday, June 11 | 5:00 pm | Final research project paper due |
Wednesday, June 12 | Various | Research project presentations, scheduled individually |
Wednesday, June 12 | 9:00 am | Final exam released online |
Thursday, June 13 | 5:00 pm | Final exam due |
Here's the schedule of papers to be discussed during class. For each class, there's one primary paper you must read before class, and one or more optional papers we encourage you to also read before class. Your post to the forum thread before class can be on either the primary paper or an optional paper.
Date | Reading | Slides |
Monday, April 1 Wednesday, April 3 |
Paxos and Raft
|
Class Intro slides
Lecture slides |
Monday, April 8 | Performance enhancements to Paxos |
Background slides
Lecture slides |
Wednesday, April 10 | Distributed system verification |
Background slides
Lecture slides |
Monday, April 15 | Chain replication |
Background slides
Lecture slides |
Wednesday, April 17 | Byzantine fault tolerance |
Background slides
Lecture slides |
Monday, April 22 | Distributed logging |
Background slides
Lecture slides |
Wednesday, April 24 | Byzantine-fault-tolerant distributed logging |
Background slides
Lecture slides |
Monday, April 29 | Blockchains |
Background slides
Lecture slides |
Wednesday, May 1 | Peer-to-peer systems |
Background slides
Lecture slides |
Monday, May 6 | Distributed file systems |
Background slides
Lecture slides |
Wednesday, May 8 | Failure detection |
Background slides
Lecture slides |
Monday, May 13 | Byzantine-fault-tolerant distributed file systems |
Background slides
Lecture slides |
Wednesday, May 15 | Consistency models |
Background slides
Lecture slides |
Monday, May 20 | Storage semantics |
Background slides
Lecture slides |
Wednesday, May 22 | Relational storage |
Background slides
Lecture slides |
Wednesday, May 29 | In-network computation |
Background slides
Lecture slides |
Monday, June 3 | Remote direct memory access (RDMA) |
Background slides
Lecture slides |
Wednesday, June 5 | In-network caching |
Background slides
Lecture slides |
We've provided a few suggestions for project ideas. But you're welcome (indeed, encouraged!) to devise your own project ideas not on that list.
The project will be due in five steps:
Everybody registered for the course should already have had an instructional UNIX account created for them by the department support staff, and have been notified of it. Using this account, you can remotely log into (via ssh) the attu.cs.washington.edu compute cluster. You can find more information about instructional resources here.
If the compute cluster doesn't meet your needs, you may want to consider one of the following experimental platforms, which we can help you get access to:
The final will constitute 20% of your grade. Its questions will be drawn from the required portions of the reading list.