CSEP552 Spring 2013 -- assignment #2

out: Monday, April 15th, 2013
due: Monday, May 13th, 2013, by 5pm

[ summary | part a | part b | part c | turnin ]

Summary

For assignment #2, you will implement two-phase commit over a replicated, durable key-value store, using RPC to coordinate the replica processes. There are three major pieces you will need to implement. First, you will build a durable key/value store. Second, you will implement a two-phase commit protocol to coordinate writing values atomically to replicas of the key/value store. Third, you will expose an RPC interface to the two-phase commit coordinator for clients to communicate with.

Part A: Key/Value Store

In this part of the assignment, your job is to implement a durable key/value store library. Your library should expose three methods:

The major requirement for your store is that it is durable; if your process exits (or is killed) and restarts, any values that were previously stored using a "put" must still be retrievable using a "get", and similarly, any value removed using a "del" must stay removed. It's up to you to decide on appropriate types for key and value, and to decide whether get/put/del can return failure in some cases.

I strongly recommend that you find existing code for this; you might, for example, consider using the sqlite database library. If you decide to roll your own library, this will take a lot of effort, particularly to get post-failure recovery correct.

Part B: Two-Phase Commit

Part B contains the bulk of this assignment. Your job is to implement a replicated key/value store, building on Part A. The architecture of your replicated key-value store is fairly simple; it consists of a single "master" process and multiple "replica" processes:

To implement two-phase commit, both the master and the replicas will need to do some form of logging to keep durable state associated with the two-phase commit. You will need to figure out how to implement logging, what to log, and how to replay your log on recovery.

Your implementation needs to be fault-tolerant. If either the master or one or more replicas fail, your two-phase commit implementation needs to do the right thing. Similarly, after a failed component recovers, your two-phase commit implementation needs to recover and proceed as appropriate.

It's fine for you to hard-code knowledge in the master and replicas about the configuration of the system, i.e., which processes exist and where they are running.

Part C: Clients

In part C, you should implement a client program that connects to the master and issues a stream of get, put, and del requests. You should be able to launch multiple clients in order to drive the master concurrently -- i.e., each separate client should be issuing its own stream of get/put/del operations.

Use your client programs to test your implementation of two-phase commit. Be sure to find a way to test as many corner cases as you can, such as the master failing during the period of uncertainty.

What to turn in

You should turn in the following:

Use the course dropbox to turn in your source code and your writeup.