CSEP551 -- Programming Assignment #3

Implementing parts of Paxos

Out: Thursday November 24th, 2009
Due: Monday December 14th, 2009, before noon (PST)

[ overview | step 1 | step 2 (bonus) | partners | what to turn in ]

Overview

In this assignment, you will implement a simple version of the Paxos algorithm. Paxos allows you to decide upon the value of named variables in spite of fail-stop failures of participants.

At a high-level, you should implement this assignment in two steps; only the first step is required, and the second is an optional, bonus part for extra credit if you choose:

  1. (required) implement the paxos synod algorithm (i.e., consensus over a single variable).

  2. (optional, bonus) implement the paxos state machine algorithm (i.e., consensus over an ordered series of variables) using the paxos synod algorithm as a building block.

You should feel free to implement this assignment in any language you like, on any operating system you like. Feel free to take advantage of libraries that help you read and write to stable storage, or communicate over the network, but you cannot use an existing Paxos implementation: you must build your own.


Step 1 (required).

Implement the Paxos synod algorithm (consensus over a single variable).

At its heart, the Paxos synod algorithm is extremely simple; its goal is to come to consensus on the value of a single variable. The algorithm is succinctly described at the bottom of page 5 and top of page 6 in "Paxos Made Simple," which we read in class. For this assignment, each server replica will contain each of the three agents described in the paper (proposer, accepter, and learner). We will not bother with distinguished learners or proposers, and we'll leave it up to clients to decide which replica to connect to.

The acceptor agent in each replica must durably store and atomically update two pieces of state:

A proposer agent on a replica interacts with the set of acceptor agents on all replicas using the two-phase algorithm described in the paper.

A learner agent on a replica must scavenge the state of acceptor agents on all replicas in order to decide whether a value has been chosen. A simple strategy we suggest you follow is to have acceptor agents send a message to all learner agents whenever the acceptor accepts a proposal; this way, all learners will learn about chosen values soon after they happen. However, since replicas might crash, you also need to implement functionality in which a learner queries acceptors on-demand to discover whether or not a variable has been chosen.

Each replica must export an RPC interface to the external world containing two methods:

Any client can connect to any replica to issue either of these methods. It's up to clients to decide which replica to connect to, but you should build your implementation of the synod algorithm to handle these method calls correctly on any replica.

To implement these methods, you will need to build some internal RPC messages between the replicas to implement the synod algorithm itself. As well, you will need to implement some form of durable storage in the acceptor agent to store the two pieces of state mentioned above. (Strictly speaking, you need to update these two pieces of state atomically, requiring something like write-ahead logging. However, for the purpose of this assignment, if you so choose, you can update the state non-atomically using file system updates, and ignore the fact that a crash at the wrong time can break the system.) Finally, after a crash, while restarting a replica you'll need to read from this state in order to recover the in-memory state of your acceptor.

You should implement the system assuming three replicas; the set of replicas is decided upon at the beginning of time, and is known to all replicas in the system. We won't attempt to implement any form of group membership change -- i.e., the set of replicas is static. In practice, we recommend that you pass the list of replicas as a command-line argument when starting up a replica.

Feel free to make your replica implementation effectively single threaded. In other words, even though many clients might connect to a replica to issue external RPCs concurrently, you can use a single lock to only allow a single externally exposed method to be processed at a time. This will simplify your logic, but obviously will slow down the implementation. Note, of course, that different clients can connect to different replicas.

Deliverables for part 1:


Step 2 (bonus).

Implement that paxos state machine algorithm (consensus over an ordered set of variables)

Now that you have the synod algorithm working, you can use it as a building block to implement the paxos state machine algorithm. The paxos state machine algorithm uses the synod algorithm to build the abstraction of an ordered set of commands chosen by each replica; a command is just a value in the synod sense, and each command (command number 0, command number 1, command number 2, ...) is chosen by executing an instance of the synod protocol for that command number. The abstraction exposed by the state machine algorithm to clients is a method that allows a client to add a command to the sequence of chosen commands, and to receive back the sequence number of the command once it has been chosen.

To do this, you need to modify your replica so that it can run many concurrent instances of the synod algorithm, with each instance named by its instance number (number 0, number 1, number 2, ...). This, in turn, implies you'll need to modify your RPCs and the way you store state so that the instance number is passed along with the RPCs and stored along with the state.

You should add two more external RPCs exposed by replicas to clients:

Section 3 of paxos made simple describes how to implement these methods. In a nutshell, when a replica receives an add_command() RPC from a client, it must (a) learn what commands have already been chosen, and close any gaps in the sequence of commands chosen so far, and (b) propose and get accepted a new command with higher sequence number than all commands that have already been chosen. The only tricky part of this algorithm is figuring out how to implement the "execute phase 1 for infinitely many instances of the consensus algorithm" logic efficiently. It's not hard, but it will require you to return a set of responses from an acceptor in some cases, instead of a single response.

Deliverables for part 2:


Partnering up

You have the choice of doing this assignment solo, or in teams of two. You're responsible for self-forming teams of two if you'd like to do this with a partner. I'd recommend re-using your partner from assignment #2.


What to turn in

You should submit your assignment using the following "dropbox" URL:
https://catalysttools.washington.edu/collectit/dropbox/gribble/7604
Your submission should be a single .tar.gz or .zip file, containing the deliverables for part 1, and if you decided to attempt it, part 2.