[ summary | part a | part b | part c bonus | how to submit | grading ]
Summary |
For homework #4, you will build on your homework #3 implementation to implement a multithreaded Web server front-end to your query processor. In Part A, you will read through some of our code to learn about the infrastructure we have built for you. In Part B, you will complete some of our classes and routines to finish the implementation of a simple Web server. In Part C, you will add several components to our Web server.
As before, pease read through this entire document before beginning the assignment, and please start early!
In HW4, as with HWs 2 and 3, you don't need to worry about propagating errors back to callers in all situations. You will use Assert333()'s to spot some kinds of errors and cause your program to crash out. However, no matter what a client does, your web server must handle that; only internal issues (such as out of memory) should cause your web server to crash out.
To help you schedule your time, here's a suggested schedule for this assignment. We're not going to enforce the schedule; it's up to you to manage your time.
- Read over the project specifications and understand which code is responsible for what.
- Finish ServerSocket.cc. Make sure to cover all functionality, not just what is in the unit tests.
- Implement FileReader.cc, which should be very easy, and GetNextRequest in HttpConnection.cc.
- Complete ParseRequest in HttpConnection.cc. This can be tricky as it involves both Boost and regular expressions.
- Finish the code for http333d.cc. Implement HttpServer_ThrFn in HttpServer.cc.
- Complete ProcessFileRequest and ProcessQueryRequest in HttpServer.cc. At this point, you should be able to search the "333gle" site and view the webpages available under /static/, e.g. http://localhost:5555/static/bikeapalooza_2011/index.html.
- Fix the security issues with the website, if you have any. Implement your interesting feature.
- Make sure everything works as it is supposed to.
Part A -- read through our code. |
Context.
Our web server is a fairly straightforward multithreaded
application. Every time a client connects to the server,
the server dispatches a thread to handle all interactions
with that client. Threads do not interact with each other
at all, which greatly simplifies the design of the server.
The figure to the right shows the high-level architecture of the server. There is a main class called "HttpServer" that uses a "ServerSocket" class to create a listening socket, and then sits in a loop waiting to accept new connections from clients. For each new connection that the HttpServer receives, it dispatches a thread from a ThreadPool class to handle the connection. The dispatched thread springs to life in a function called "HttpServer_ThrFn" within the HttpServer.cc file.
To read a request, the GetNextRequest method invokes WrappedRead() some number of times until it spots the end of the request. To parse a request, the method invokes the ParseRequest method (also within HttpConnection). At this point, the HttpServer_ThrFun has a fully parsed HttpRequest object (defined in HttpRequest.h).
Once those functions return an HttpResponse, the HttpServer_ThrFn invokes the WriteResponse method on the HttpConnection object to write the response back to the client.
What to do.
./http333d 5555 ../projdocs ../hw3/unit_test_indices/*(You might need to pick a different port than 5555 if another student is using that port on the same machine as you.)
Try using our solution_binaries server, and running it using a similar command line:
./solution_binaries/http333d 5555 ../projdocs ../hw3/unit_test_indices/*Next, launch Firefox or Chrome on that machine, visit http://localhost:5555/, and try issuing some searches. As well, visit http://localhost:5555/static/bikeapalooza_2011/Bikeapalooza.html and click around. This is what your finished web server will be capable of.
Part B -- get the basic web server working. |
Context.
You are now going to finish a basic implementation of the http333d web server. We'll have you implement some of the event handling routines at different layers of abstraction in the web server, culiminating with generating HTTP and HTML to send to the client.
What to do.
You'll need to make the code handle either IPv4 or IPv6 addresses. Run the test_suite to see if you make it past the server socket unit tests.
At this point, your web server should run correctly, and everything should compile with no warnings. Try running your web server and connecting to it from a browser. Also try running the test_suite under valgrind to make sure there are no memory issues. Finally, launch the web server under valgrind to make sure there are no issues or leaks; after the web server has launched, exercse it by issuing a few queries, then kill the web server. (You'll leak a few bytes by not shutting down the server cleanly, of course.)
Part C - add to the Web server. |
Context.
Now that the basic web server works, you are going to add some new functionality to it. There are two things you will do:
This part of the assignment is deliberately open-ended, with much less structure than earlier assignments. This is the culmination of the course; you're ready to handle this on your own!
What to do.
http://en.wikipedia.org/wiki/Cross-site_scriptingTry typing the following query into our example web server, and into your web server, and compare the two. (Note: do this with Firefox or Safari; it turns out that Chrome will attempt to help out web servers by preventing this attack from the client-side!)
hello <script>alert("Boo!");</script>To fix this flaw, you need "escape" untrusted input from the client before you relay it to output. We've provided you with an escape function in HttpUtils.
/static/../hw4/http333d.ccThis is called a directory traversal attack. Instead of trusting the file pathname provided by a client, you need to normalize the path and verify that it names a file within your test_tree/ subdirectory; if the file names something outside of that subdirectory, you should return an error message instead of the file contents. We've provided you with a function in HttpUtils to help you test to see if a path is safe or not.
Fix these two security flaws, assuming they do in fact exist in your server. As a point of reference, in solution_binaries/, we've provided a version of our web server that has both of these flaws in place (http333d_withflaws). Feel free to try it out, but DO NOT leave this server running, as it will potentially expose all of your files to anybody that connects to it.
x words + <bold>hit word</bold> + y wordsfor one or more of the query words that hit.
Congrats, you're done with the HW4 project sequence!!
Bonus |
There is one bonus task for this assignment. As before, you can do it, or not; if you don't, there will be no negative impact on your grade. Your task is to perform a performance analysis of your web server implementation, determining what throughput your server can handle (measured both in requests per second and bytes per second), what latency clients experience (measure in seconds per request), and what the performance bottleneck is. You might want to look at the "httperf" tool for Linux to generate synthetic load.
You should conduct this performance analysis for a few different usage scenarios; e.g., you could vary the size of the web page you request, and see its impact on the number of pages per second your server can deliver. If you choose to do the bonus, please commit / push a PDF file containing relevant performance graphs and analysis.
What to turn in |
When you're ready to turn in your assignment, do the following:
bash$ make clean bash$ cd .. bash$ tar czf hw4_<username>.tar.gz hw4 bash$ # make sure the tar file has no compiler output files in it, but bash$ # does have all your source and other files you intend to submit bash$ tar tzf hw4_<username>.tar.gz
Grading |
We will be basing your grade on several elements: