CSE logo University of Washington Computer Science & Engineering
 CSE333 12sp -- Systems Programming
  CSE Home   About Us    Search    Contact Info 

Course Home
 Home
Administration
 Overview
 Course email
 Anonymous feedback
Assignments
 Home Virtual Machines
 Homework Dropbox
 Class GoPost Forum
 Class Gradebook
Schedule
 Schedule
   

Homework #3

out: Monday May 13th, 2012
due: Monday June 4th, 2012 by 9:00pm.

[ summary | part a | part b | part c | bonus: part d | how to submit | grading ]

Summary.

In homework #3, you will finish an implementation of a multithreaded Web server. There are three parts to homework #3. In part A, you will read through and finish our implementation of several low-level utilities that the web server will make use of. In part B, you will implement the web server itself, including the ability to serve files from the local file system. In part C, you will integrate your twitter searching and twitter word cloud routines from homework #2, providing clients with a web form for searching twitter.

Here is the code you should download that you'll modify for this assignment. As before, we've provided you with our hw1 and hw2 libraries, in case yours has bugs in it. But, feel free to replace those libraries with your own libhw1.a and libhw2.a if you want to build on top of your solutions, rather than ours.

Part A -- HTTP plumbing

A Web server is fairly complex and depends upon a number of lower-level abstractions. In this part of the homework, you will read through the code of some of the abstractions that we built for you, and you will build several of your own:

  1. take a look at ThreadPool.h and ThreadPool.cc. This class manages a pool of threads and a queue of tasks. When there is an available thread in the pool and at least one task on the queue, the next available thread picks up a task and performs it. If all threads are busy performing tasks, then newly added tasks will queue up waiting for a thread to finish. We'll use this threadpool implementation to dispatch incoming connections, so that our server can process multiple requests concurrently.

  2. read through HttpUtils.h. This file defines a collection of helper routines, many of which you need to implement. Next, read through HttpUtils.cc and notice which of the routines are missing (search for STEP XXX, as usual). Implement the missing routines, and make sure that you pass the associated set of unit tests. Note that if the documentation in HttpUtils.h isn't enough for you to determine how to implement something, the unit tests also serve as useful documentation!

  3. read through ServerSocket.h. This file defines convenience routines for opening up a listening socket and accepting an incoming connection from a client. These routines should be pretty familiar to you from the lecture coding and exercises. Next, read through ServerSocket.cc and implement the missing code; you should feel free to cut and paste liberally from our coding examples in lecture. Make sure you pass the associated unit test.

  4. read through FileReader.h. This file defines a convenience routine for reading the contents of a file into memory. Next, read through FileReader.cc and implement the missing code. Make sure you pass the associated unit test.

Part B -- finish our Web server implementation

We have provided you with a fairly complete framework for a Web server. Your job in this part of the assignment is to finish that implementation, and to get your Web server to the point where it is able to parse requests for files, read those files into memory, and respond with the file contents.

  1. Start by reading through and understanding the code in HttpRequest.h. This file (which does not have an associated .cc file!) defines a class that represents an HTTP request. As you have already learned, a basic HTTP request is fairly simple: it contains a first line that specifies the URL the client is requesting, and then it contains a sequence of lines that contain "header" information provided by the browser. HttpRequest.h doesn't contain code for parsing requests, but rather just represents a fully parsed request.

  2. Next, read through and understand the code in HttpResponse.h. This file defines a class that represents an HTTP response, and also contains a method called GenerateResponseString() that generates the text of an HTTP response based on the other fields in the class. Customers build up the fields in the HttpResponse structure, then invoke GenerateResponseString() to generate a formatted HTTP response ready for writing to the client.

  3. Now, read through HttpConnection.h. Given a file descriptor representing an active connection to a client, this class has two methods. The first is responsible for reading data from the socket, buffering that data in the "buffer_" instance variable, detecting when a full HTTP request has been received, parsing that request into an HttpRequest structure, and returning that through an output parameter. The second is responsible for writing an HTTPResponse back to the client.

    Read through HttpConnection.cc. This class is largely unimplemented: implementing it is the largest piece of work you'll do in this assignment. You should probably design some helper private methods to get the job done (which will mean editing HttpConnection.h as well). We recommend you follow the steps in the comments. Once you're done, make sure you pass the associated unit test.

  4. Read through HttpServer.h and HttpServer.cc. This class implements the Web server itself, making use of all of the building blocks you've implemented so far. The tricky part of this class is how it uses the threadpool -- we've done that part for you. Implement the missing piece inside HttpServer.cc, namely the routine that tests a URL to see if it starts with the substring "/static", and if so, extracts a filename from the remaining part of the URL, reads that file into memory, and builds an HTTP response from it.

  5. Test your web server by launching it, and interacting with it via your browser. For example, if you are on attu4.cs, pick a port number to start your server on, such as 5488. (don't pick that one, or you'll collide with everybody else.) Then, give this command to launch the server, reading files from the hw3_htmldir/ directory:
           ./http333d 5488 ./hw3_htmldir
        
    Next, launch your browser. You should be able to connect to the following URL, replacing the port number with whichever one you picked, and replacing attu4.cs.washington.edu with the hostname of wherever you launched your servre:
           http://attu4.cs.washington.edu:5488/static/bikeapalooza_2011/index.html
        
    Click around and make sure the gallery works as expected. If not, figure out why not and fix it!

Part C -- add support for Twitter searching

Now that you have the basic web server running, it's time to have a little fun and add in support for Twitter searching.

  1. Read through the bottom part of the ProcessRequest routine in HttpServer.cc, and notice how it is testing URLs to see if they start with "/post", and if so, invokes a class called FormReader to handle that request.

  2. Read through FormReader.h and FormReader.cc. Note how it is parsing URLs using your URLParser class, and then invoking either ProcessTwitterSearch() in TwitterSearch.h/.cc or ProcessTwitterCloudQuery() in TwitterCloud.h/.cc, depending on the content of the URL.

  3. Visit the URL "/static/index.html" on your web server and notice the form that it presents. Try typing in a query and submitting it; notice that the server is not yet processing your query. To satisfy your curiosity, read through the file "hw3_files/index.html" to see how the form is actually implemented in HTML.

  4. Read through TwitterSearch.h and TwitterSearch.cc and complete the implementation, making use of your TwitterSearch class from HW2. Try using our solution binary http333d to see how we implemented support for twitter searching, and see if you can mimic it. Have fun with this; try creative ways of formatting your search results.

  5. Read through TwitterCloud.h and TwitterCloud.cc and complete the implementation, lifting code from your TwitterShell implementation from HW2. Try using our solution binarhy http333d to see how we implemented support for twitter word clouds, and see if you can mimic it.

Part D -- extra credit

Implement some other interesting feature in your web server. Some examples could be:

  • (very hard) the ability to search other sites than Twitter; for example, learn the Facebook API, implement some support for it, and allow people to search Facebook through your server.

  • (medium) find some code that implements the Eliza chatbot, and integrate support for it into your server.

  • (medium) add code to the server that maintains statistics about which URLs have been requested, which client IP addresses have interacted with the server, or other interesting diagnostics. Implement a web page that displays all of these diagnostics.

  • (hard) integrate your image histogram code from HW1 into the server: allow a user to upload a picture, then use your image histogram code to generate a histogram of the picture. Display the histogram in a result page.

What to turn in

When you're ready to turn in your assignment, do the following:

  1. In the hw3 directory, run "make clean" to clean out any object files and emacs detritus; what should be left are your source files.

  2. Create a TURNIN.TXT file in hw3 that contains your name, student number, and UW email address.

  3. cd up a directory so that hw3 is a subdirectory of your working directory, and run the following command to create your submission tarball, but replacing "UWEMAIL" with your uw.edu email account name.
    tar -cvzf hw3_submission_UWEMAIL.tar.gz hw3
    For example, since my uw.edu email account is "gribble", I would run the command:
    tar -cvzf hw3_submission_gribble.tar.gz hw3

  4. Use the course dropbox (there is a link on the course homepage) to submit that tarball.

  5. Fill out the following survey to give us feedback on the assignment:
    https://catalyst.uw.edu/webq/survey/gribble/168394

Grading

We will be basing your grade on several elements:

  • The degree to which your code passes the unit tests. If your code fails a test, we won't attempt to understand why: we're planning on just including the number of points that the test drivers print out.

  • We have some additional unit tests that test a few additional cases that aren't in the supplied test driver. We'll be checking to see if your code passes these as well.

  • Whether you were able to successfully get the Web server working, and whether you were able to get the Twitter search features working.

  • The quality of your code. We'll be judging this on several qualitative aspects, including whether you've sufficiently factored your code and whether there is any redundancy in your code that could be eliminated.

  • The readability of your code. For this assignment, we don't have formal coding style guidelines that you must follow; instead, attempt to mimic the style of code that we've provided you. Aspects you should mimic are conventions you see for capitalization and naming of variables, functions, and arguments, the use of comments to document aspects of the code, and how code is indented.


CSE logo Computer Science & Engineering
University of Washington
Box 352350
Seattle, WA  98195-2350
(206) 543-1695 voice, (206) 543-2969 FAX
[ comments to gribble ]