CSE550 Problem Set #2
out: Monday October 17th, 2011
due: Friday November 4th, 2011 by 5:00pm.
[
summary |
part a |
part b |
how to submit |
grading ]
For this problem set, you will be implemented a simple write-ahead
log to add basic transaction support to a key-value store, and you
will be solving a small design exercise. For the write-ahead log,
you can use any language you like.
Your job is to implement augment an existing key-value store to give
the programmer simple support for transactional commit of multiple
key/value updates. Of the ACID properties, the only ones we care
about for this assignment are A and D: you have to make sure once a
transaction commits, it is durable on disk, even after failure, and
you have to make sure that either the entire transaction commits or
none of it does. However, you do not need to worry about
concurrency control; you can assume only a single transaction at a
time occurs.
To do this, you will need to implement a write-ahead logging
module. Your module will implement a NO-STEAL / NO-FORCE buffer
management policy:
- NO-FORCE: you will not overwrite values in the key-value
store until sometime after the transaction has committed. This
means that transactional updates need to be in the log in the form
of REDO records -- i.e., you'll build a write-ahead log! Once the
commit record hits the log, you will then be able to issue updates
to the key-value store itself, though you'll need to decide when
to actually do this work. Upon recovery after a crash, you will
need to replay some part of the write-ahead log against the
key-value store to bring it up to date. (You have to figure out
how to decide which part of the log to REDO.)
- NO-STEAL: you will not overwrite the key-value store with
uncommitted transactional updates; more specifically, you can only
modify the key-value store with updates from a committed
transaction. This means you don't need UNDO records in the log.
You need to find an existing key-value storage library implemented
in the language of your choice. You can assume a few things about
the key-value library to make your life easier -- (a) that it
supports atomic updates of single keys (i.e., if a crash happens in
the middle of a write to a key, you can assume that the entire write
happened, or none of the write happened -- you don't need to worry
about inconsistent state from a partial key write), and (b) that it
has its own mechanism for durability / recovery after a crash. All
you need to add is transactional support (atomicity for multiple
actions), and post-crash recovery. You can also assume that
mutations are idempotent -- it's OK to replay log entries multiple
times, as long as they are replayed in the right order.
One issue is that you will need to figure out which subset of
actions in your log to replay after a crash; this could mean you
need to keep track of which committed transactions have been written
into the key/value store, and which haven't. (Checkpoint records
help with this.)
Don't worry about pruning the log, or hyper-optimizing performance.
Assume your language supports a "sync" operation that forces any
buffered file system writes to disk; use this operation if your
language actually does support it, though!
Include a brief discussion of design issues that came up in a
README.TXT file, as well as a rough overview (just a few sentences)
of what your code does.
Write one or two paragraphs to answer each of the following:
- What problem does two-phase commit solve?
- What problem does two-phase locking solve?
- LFS carefully orders writes into a log. Is it implementing a
form of write-ahead logging, and if so, what set actions is it
atomically committing using these ordered writes? If not, why
doesn't LFS need atomic updates?
When you're ready to turn in your assignment, do the following:
- Create a directory called "problemset2". In it, you should
have three things: a subdirectory called "parta", a file called
"partb.txt", and a README.TXT file.
- The README.TXT file should contain your name,
student number, and UW email address, as well as instructions on
how to launch your server.
- "parta" should contain your part a code, as well as a
Makefile for compiling it. Running "make" should produce an
executable called "waltest." (If you implement in a scripted
language, waltest can be your main script file we run.) When we
run waltest, it should issue (sequentially) a set of transactions
against the key-value store. If there is anything we need to know
about setting things up, include that in your README.TXT file.
Your code should be clean, well commented, etc.
- "partb.txt" should contain your answers to the three
questions in part b. Remember, just one or two paragraphs
suffices for each question.
- Create a submission tarball by running the following command,
but replacing "UWEMAIL" with your email account name:
tar -cvzf problemset2_submission_UWEMAIL.tar.gz problemset2
For example, since my email account is "gribble", I would run the command:
tar -cvzf problemset2_submission_gribble.tar.gz problemset2
- Use the course dropbox (there is a link on the course
homepage) to submit that tarball.
We will be basing your grade on several elements:
- Whether your code works! It should be correct, compile
without warnings, and not leak memory, processes, descriptors, etc.
- How well structured your code is: you should have clean
module interfaces, a nice decomposition, good comments, and so on.
- The correctness of your answers to part B.