Summary

In order to build an highly available service such as Google's MegaStore/Spanner storage systems or Facebook's social networking site, one of the most important building blocks is to provide a way in which the underlying data is replicated and can be accessed even in the presence of failures. To this end, this assignment will require you to build a replicated state machine based on Paxos. The focus of this assignment will be on the backend Paxos algorithms and in designing and building a highly reliable distributed storage system. Beyond the base requirements outlined in this document, this assignment is open-ended and we would like to give you substantial flexibility in design and implementation choices, as well as the ability to explore the features of Paxos-based systems that interest you. Specifically, you can choose your implementation language, as long as your code compiles and runs on the UW CSE home VMs (attu/(re|bi|tri)cycle. (In other words, C/C++/Go/Java/Python are fine, but not something like C#.) Because of the open-ended nature of this assignment, deliverables include (a) a working implementation, (b) documentation of your system and design decisions in a writeup, and (c) a presentation of your system in a live demo to a course TA.

Paxos Implementation

The core of the assignment is to develop a Paxos replicated state machine. The basic idea of the Paxos RSM is described in Section 3 of the "Paxos made simple" paper by Lamport. A quick summary of the implementation strategy is as follows:

  • Model the service that you are implementing as a deterministic state machine whose behavior or output is simply a function of its previous state and any inputs received from the user or environment in the current step.
  • Implement a multi-instance Paxos algorithm (Multiple-machine setup) to come to a consensus on the sequence of input values that the service receives in a consistent and reliable way. That is, the ith instance of Paxos determines the ith input or command received by the service.
  • From a client perspective, it can send the service request to any of the nodes that comprise the Paxos group. The node that receives the request tries to pass a Paxos instance using the received value. If it fails to pass it with the current instance, it tries to pass as the consensus value for a subsequent instance.
  • Run a copy of the underlying state machine on each of the nodes that belongs to the Paxos group. Each state machine executes the ith command once it is stable and after it has executed all previous commands. A command is stable if it is the chosen value for that instance of Paxos. Once the command is stable and the state machine has processed it, a response is sent back to the client.

You can organize your code in whatever mechanism that you think is appropriate. For example, you could have a single entity that serves all three roles of proposer, acceptor, and listener. The only requirement is that you must have the ability to terminate any of your Paxos nodes at will (e.g., You can kill the process on a machine to terminate the node. You could run multiple paxos nodes on the same machine as different processes and use atleast two physical machines). Include a discussion of design issues that came up in your writeup, as well as a rough overview of what your code does.

The implementation of Paxos should be robust to message loss and node failures and should be able to make progress as long as a majority of nodes are online. Bonus points will be provided for:

  • Node Recovery. (Network loss / Physical Node crash) Assume that once a node fails then it will never recover. Use any recovery mechanism of your choice (logging)
  • Resending messages. Although your base implementation should still be able to make progress given a reasonable threshold of lost messages, bonus credit will be provided for implementations which resend messages that may have been lost.

Lock Service

Demonstrate the correctness of your Paxos implementation by developing a simple lock server as the replicated state machine. The lock server should have the following functionality:

  • It manages the lock status for a number of locks.
  • It supports a lock(x) operation that:
    • If the lock is available, it obtains the lock and returns success.
    • If the lock has been assigned to some other client, it simply returns failure. It is up to the client to retry the operation using some form of exponential back-off heuristic
  • It supports a unlock(x) operation that frees up the lock.

We would like you to focus primarily on the Paxos implementation, so you can make any simplifying assumptions that you feel appropriate with respect to the actual lock service exported by the replicated RSM. For instance, you can assume the following:

  • Simplified Discovery: the identity of the Paxos group members can be hardcoded at the clients in the code or through a configuration file.
  • clients will be well-behaved, so that your system does not need to handle deadlock, clients that fail after acquiring a lock, etc. Additional Bonus for being able to effectively handle deadlock
  • the client to server communications are reliable as long as the servers are reachable, so you do not have to worry about issues such as atmost-once or atleast-once semantics for the delivery of client requests or responses.

Client code that you submit with the assignment should however satisfy the following two requirements that will aid in testing:

  • Clients should be able to issue requests to different replicas of the lock service. If all of the clients issue requests to the same replica, it’s hard to know whether the underlying Paxos implementation works. One option is to submit the client request to a randomly chosen replica. You could also make it determinstic for testing by computing a i = f(message) % N and using replica-i where f is a deterministic function on the message data eg. Integer sum of all bytes or a cryptographic hash function.
  • Clients should be able to submit multiple client requests to the lock service concurrently.

Submission and Grading

When you're ready to turn in your assignment, do the following:

  1. Create a directory called paxos-{{UWNETID}} with the UWNETID of the team member submitting the project. Include a writeup of the project and a code instruction or documentation in the README.txt file.
  2. The README.txt file should contain the name, email address for all team members, as well as instructions on how to launch your server with the necessary configuration instructions if you have a configuration file or IP addresses which need to be changed.
  3. A major component of your grade will be based on a project write-up, in which your group should explain the protocol and any relevant implementation details. You should include, at a minimum, the following:
    • A detailed description of your group's implementation
    • Any assumptions you made
    • How to compile, deploy and run your implementation
    • Any issues or scenarios explicitly not considered
    • Any other behavior of the system you feel important to discuss (e.g. quirks, interesting behavior, performance characteristics, etc)
  4. Use the course canvas project submission to submit the tarball
  5. Schedule a demo to present your system to the TA (This need not be done in the same order, you could schedule, get feedback and make minor code changes and write up feedback.)