CSE 550 problem set #2

CSE550 Problem Set #2

out: Monday October 17th, 2011
due: Friday November 4th, 2011 by 5:00pm.

[ summary | part a | part b | how to submit | grading ]

For this problem set, you will be implemented a simple write-ahead log to add basic transaction support to a key-value store, and you will be solving a small design exercise. For the write-ahead log, you can use any language you like.

Part A -- implement simple transactions with a write-ahead log

Your job is to implement augment an existing key-value store to give the programmer simple support for transactional commit of multiple key/value updates. Of the ACID properties, the only ones we care about for this assignment are A and D: you have to make sure once a transaction commits, it is durable on disk, even after failure, and you have to make sure that either the entire transaction commits or none of it does. However, you do not need to worry about concurrency control; you can assume only a single transaction at a time occurs.

To do this, you will need to implement a write-ahead logging module. Your module will implement a NO-STEAL / NO-FORCE buffer management policy:

NO-FORCE: you will not overwrite values in the key-value store until sometime after the transaction has committed. This means that transactional updates need to be in the log in the form of REDO records -- i.e., you'll build a write-ahead log! Once the commit record hits the log, you will then be able to issue updates to the key-value store itself, though you'll need to decide when to actually do this work. Upon recovery after a crash, you will need to replay some part of the write-ahead log against the key-value store to bring it up to date. (You have to figure out how to decide which part of the log to REDO.)
NO-STEAL: you will not overwrite the key-value store with uncommitted transactional updates; more specifically, you can only modify the key-value store with updates from a committed transaction. This means you don't need UNDO records in the log.

You need to find an existing key-value storage library implemented in the language of your choice. You can assume a few things about the key-value library to make your life easier -- (a) that it supports atomic updates of single keys (i.e., if a crash happens in the middle of a write to a key, you can assume that the entire write happened, or none of the write happened -- you don't need to worry about inconsistent state from a partial key write), and (b) that it has its own mechanism for durability / recovery after a crash. All you need to add is transactional support (atomicity for multiple actions), and post-crash recovery. You can also assume that mutations are idempotent -- it's OK to replay log entries multiple times, as long as they are replayed in the right order.

One issue is that you will need to figure out which subset of actions in your log to replay after a crash; this could mean you need to keep track of which committed transactions have been written into the key/value store, and which haven't. (Checkpoint records help with this.)

Don't worry about pruning the log, or hyper-optimizing performance. Assume your language supports a "sync" operation that forces any buffered file system writes to disk; use this operation if your language actually does support it, though!

Include a brief discussion of design issues that came up in a README.TXT file, as well as a rough overview (just a few sentences) of what your code does.

Part B -- answer a few questions

Write one or two paragraphs to answer each of the following:

What problem does two-phase commit solve?
What problem does two-phase locking solve?
LFS carefully orders writes into a log. Is it implementing a form of write-ahead logging, and if so, what set actions is it atomically committing using these ordered writes? If not, why doesn't LFS need atomic updates?

What to turn in

When you're ready to turn in your assignment, do the following:

Create a directory called "problemset2". In it, you should have three things: a subdirectory called "parta", a file called "partb.txt", and a README.TXT file.
The README.TXT file should contain your name, student number, and UW email address, as well as instructions on how to launch your server.
"parta" should contain your part a code, as well as a Makefile for compiling it. Running "make" should produce an executable called "waltest." (If you implement in a scripted language, waltest can be your main script file we run.) When we run waltest, it should issue (sequentially) a set of transactions against the key-value store. If there is anything we need to know about setting things up, include that in your README.TXT file. Your code should be clean, well commented, etc.
"partb.txt" should contain your answers to the three questions in part b. Remember, just one or two paragraphs suffices for each question.
Create a submission tarball by running the following command, but replacing "UWEMAIL" with your email account name:
tar -cvzf problemset2_submission_UWEMAIL.tar.gz problemset2
For example, since my email account is "gribble", I would run the command:
tar -cvzf problemset2_submission_gribble.tar.gz problemset2
Use the course dropbox (there is a link on the course homepage) to submit that tarball.

Grading

We will be basing your grade on several elements:

Whether your code works! It should be correct, compile without warnings, and not leak memory, processes, descriptors, etc.
How well structured your code is: you should have clean module interfaces, a nice decomposition, good comments, and so on.
The correctness of your answers to part B.