CSEP551 -- Programming Assignment #2

Out: Thursday February 21st, 2008
Due: Thursday March 6th, 2008, before class

In this assignment, you will build a simple Web service that provides a RESTful API to storing and retrieving blocks of data. Your goal is to learn about some modern technological building blocks, including:


Overview

In the language of your choice, you should implement a Web service that listens on a port of your choosing, and exports a REST-like API that lets programs (or people) use HTTP GET, POST, and DELETE operations to read, write, and delete blocks of data stored by your service.

First, some terminology:

In case you are having trouble validating that you're properly doing the SHA1 hashing, the url-safe base64 encoding, or the XML document generation, here is a tar.gz archive that contains some example files that should help you out. In particular, it contains the following files:


Web server interface

Following the REST philosophy, each block of data stored within your Web server should have its own URL. This implies that your Web server must be able to handle URLs of the following form:
  /csep551/fx8mgRaLcV8U9vxUuAibm9QOxg0=
i.e., each URL starts with "/csep551/" and then is followed by the url-safe base64 encoding of a blockid.

For each of these kinds of URL, your server must handle the following HTTP methods:

As well, your server should handle uploading of blocks. To upload a block, a client should use the HTTP POST method, passing in a blocklist document as body, to the following URL:
    /csep551/post
The blocklist document should be uploaded verbatim, and clients are encouraged to use a Content-type HTTP header of text/xml in their POST request.

For each block in the blocklist, the server should verify that the blockid matches the hash of the blockdata. If all hashes match, the server should store the blocks and return a status code of 200. If any hash doesn't match, the server should store no blocks and return a status code of 400. If your server doesn't have the space to store all blocks, no blocks should be stored, and the server should return a status code of 503.

Finally, when the URL "/" is requested, your server should return a simple Web page that contains only the following string:

gribble(at)cs.washington.edu - Steven Gribble
i.e., your email address with the @ symbol replaced with the string "(at)", followed by a space, followed by a dash, followed by a space, followed by your name. This single string is the full HTML document; you should not have any HTML tags (i.e., no <html>, <body>, etc.). Also, this response should come with status code 200.

Just to summarize:


Additional requirements

Registering your server

While your web server is running, it should periodically (every 5 minutes) contact a server that I am running at futureproof.cs.washington.edu on port 8080, and "register" itself by simply fetching the following URL:
    /register/hostname/portnum
where "hostname" is the hostname of your web server, and "portnum" is the port number that it is running at. So, for example, if you run your web server at foo.com at port 1004, your server should fetch the following URL every 5 minutes:
    http://futureproof.cs.washington.edu:8080/register/foo.com/1004
When our server sees a registration attempt, it will:

State kept durably

Your server must store data durably -- even if your server crashes, if it previously accepted a POST of some blocks, those blocks must be GET/DELETE-able when your server resumes from the crash.

You can limit the amount of data you choose to store to 100MB if you like; if somebody attempts to POST blocks that would cause your server to exceed that limit, you should return a 503 status code.

Dealing with errors

Your server should be robust -- you should try to handle any corner cases of errors that you can think of, including those arising from poorly behaving or buggy clients.

NATs and port forwarding

Your server must be accessible from the wide-area internet. This means that if you choose to run it at home, and you have a home router that does NAT, you'll need to do port forwarding through your NAT so that my code can talk to your server, even though my code will be running at UW.

You also have access to the attu instructional cluster, if you'd prefer to run your code there. If you choose to do this, please be nice, and don't whammy the machines.

Test code

You will want to write a small test suite to make sure that your server functions correctly. Your test suite should be designed to exercise all parts of the server behavior specified in this assignment. (Feel free to share test suite code with each other!)

Sometime (very soon) now, the instructor will have a Web server running that adheres to this specification for you to test your test suite against.

Keep your server running

You must attempt to keep your server running until the last lecture of the quarter. Given that your server registers itself, we know where (and when) your servers are running. Your instructor and TA will be accessing your server periodically to try it out, test it for comformance, and to measure its availability. :)

Using my test harness

If you have your server up and running, and want to see a test report on how conformant it is, just launch your browser and visit the following URL:
  http://futureproof.cs.washington.edu:8080/testharness/hostname/portnum
where hostname is the DNS name of your server, and portnum is the port number it is running on. If all goes well, your server will experience some mild load, and then you'll see a report displayed. Search for "--" in the report; those are tests that your server failed. Anything that has "++" is a success case. The report is a little terse and cryptic, but hopefully it's helpful.

What to turn in

Your submission should be a single .tar.gz or .zip file, containing the following elements: