In order to build an highly available service such as Google's MegaStore/Spanner storage systems or
Facebook's social networking site, one of the most important building blocks is to provide a way in which the
underlying data is replicated and can be accessed even in the presence of failures. To this end, this
assignment will require you to build a replicated state machine based on Paxos. The focus of this assignment
will be on the backend Paxos algorithms and in designing and building a highly reliable distributed storage
system. Beyond the base requirements outlined in this document, this assignment is open-ended and we would
like to give you substantial flexibility in design and implementation choices, as well as the ability to
explore the features of Paxos-based systems that interest you. Specifically, you can choose your
implementation language, as long as your code compiles and runs on the UW CSE home VMs
(attu/(re|bi|tri)cycle
. (In other words, C/C++/Go/Java/Python
are fine, but not
something like C#
.) Because of the open-ended nature of this assignment, deliverables include (a)
a working implementation, (b) documentation of your system and design decisions in a writeup, and (c) a
presentation of your system in a live demo to a course TA.
The core of the assignment is to develop a Paxos replicated state machine. The basic idea of the Paxos RSM is described in Section 3 of the "Paxos made simple" paper by Lamport. A quick summary of the implementation strategy is as follows:
You can organize your code in whatever mechanism that you think is appropriate. For example, you could have a single entity that serves all three roles of proposer, acceptor, and listener. The only requirement is that you must have the ability to terminate any of your Paxos nodes at will (e.g., You can kill the process on a machine to terminate the node. You could run multiple paxos nodes on the same machine as different processes and use atleast two physical machines). Include a discussion of design issues that came up in your writeup, as well as a rough overview of what your code does.
The implementation of Paxos should be robust to message loss and node failures and should be able to make progress as long as a majority of nodes are online. Bonus points will be provided for:
Demonstrate the correctness of your Paxos implementation by developing a simple lock server as the replicated state machine. The lock server should have the following functionality:
We would like you to focus primarily on the Paxos implementation, so you can make any simplifying assumptions that you feel appropriate with respect to the actual lock service exported by the replicated RSM. For instance, you can assume the following:
atmost-once
or atleast-once
semantics for the
delivery of client requests or responses.Client code that you submit with the assignment should however satisfy the following two requirements that will aid in testing:
i = f(message) % N
and using replica-i
where f
is a deterministic function on the message data eg. Integer sum of all bytes or a cryptographic hash
function.When you're ready to turn in your assignment, do the following:
paxos-{{UWNETID}}
with the UWNETID of the team member submitting
the project. Include a writeup of the project and a code instruction or documentation in the
README.txt
file.README.txt
file should contain the name, email address for all team members, as well as
instructions on how to launch your server with the necessary configuration instructions if you have a
configuration file or IP addresses which need to be changed.minor
code changes and write up feedback.)