CSE550 Problem Set #2

out: friday oct 10th, 2014
due: friday oct 24th, 2014 by 5:00pm.

[ Summary | Paxos | Lock Service | Submission |

Summary

In order to build an highly available service such as Google's MegaStore/Spanner storage systems or Facebook's social networking site, one of the most important building blocks is to provide a way in which the underlying data is replicated and can be accessed even in the presence of failures. To this end, this assignment will require you to build a replicated state machine based on Paxos. The focus of this assignment will be on the backend Paxos algorithms and in designing and building a highly reliable distributed storage system. Beyond the base requirements outlined in this document, this assignment is open-ended and we would like to give you substantial flexibility in design and implementation choices, as well as the ability to explore the features of Paxos-based systems that interest you. Specifically, you can choose your implementation language. Because of the open-ended nature of this assignment, deliverables include (a) a working implementation, (b) documentation of your system and design decisions in a writeup, and (c) a presentation of your system in a live demo to a course TA.

Paxos Implementation

The core of the assignment is to develop a Paxos replicated state machine. The basic idea of the Paxos RSM is described in Section 3 of the "Paxos made simple" paper by Lamport. A quick summary of the implementation strategy is as follows:

You can organize your code in whatever mechanism that you think is appropriate. For example, you could have a single entity that serves all three roles of proposer, acceptor, and listener. Include a discussion of design issues that came up in your writeup, as well as a rough overview of what your code does.

The implementation of Paxos should be robust to message loss and node failures and should be able to make progress as long as a majority of nodes are online. However, given the short timeline for this assignment, we don't require the following features in your Paxos implementation:

Lock Service

Demonstrate the correctness of your Paxos implementation by developing a simple lock server as the replicated state machine. The lock server should have the following functionality:

We would like you to focus primarily on the Paxos implementation, so you can make any simplifying assumptions that you feel appropriate with respect to the actual lock service exported by the replicated RSM. For instance, you can assume the following:
Submission

When you're ready to turn in your assignment, do the following:

  1. Create a directory called "problemset2". In it, you should have the code for the assignment, a README.txt file, and a writeup.

  2. The README.txt file should contain the name, student number, and UW email address for all team members, as well as instructions on how to launch your server.

  3. A major component of your grade will be based on a project write-up, in which your group should explain the protocol and any relevant implementation details. You should include, at a minimum, the following:
    • a detailed description of your group's implementation
    • any assumptions you made
    • how to use your implementation
    • any outstanding issues
    • anything else that you feel is important to discuss (e.g. quirks, interesting behavior, performance characteristics, etc)

  4. Use the course dropbox (there is a link on the course homepage) to submit that tarball.

  5. Schedule a live demo to present your system.

Your grade will be based on all deliverables: implementation, writeup, and live demo.