Out: Monday, August 12, 2019
Due: Wednesday, August 21, 2019 by 11:59 pm
You will build on top of HW3 to implement a multithreaded Web server front-end to your query processor. In Part A, you will read through some of our code to learn about the infrastructure we have built for you. In Part B, you will complete some of our classes and routines to finish the implementation of a simple Web server. In Part C, you will fix some security problems in our Web server.
As before, please read through this entire document before beginning the assignment, and please start early!
In HW4, as with HWs 2 and 3, you don't need to worry about propagating errors back to callers in all situations. You will use Verify333()'s to spot some kinds of errors and cause your program to crash out. However, no matter what a client does, your web server must handle that; only internal issues (such as out of memory) should cause your web server to crash out.
To help you schedule your time, here's a suggested order for the parts of this assignment. We're not going to enforce a schedule; it's up to you to manage your time.
Makefile
distributed with
the project.
In particular, there are reasonable ways to do the necessary string handling
without using the Boost Regex library.
Our web server is a fairly straightforward multithreaded
application. Every time a client connects to the server,
the server dispatches a thread to handle all interactions
with that client. Threads do not interact with each other
at all, which greatly simplifies the design of the server.
The figure to the right shows the high-level architecture of the server. There is a main class called "HttpServer" that uses a "ServerSocket" class to create a listening socket, and then sits in a loop waiting to accept new connections from clients. For each new connection that the HttpServer receives, it dispatches a thread from a ThreadPool class to handle the connection. The dispatched thread springs to life in a function called "HttpServer_ThrFn" within the HttpServer.cc file.
To read a request, the GetNextRequest method invokes WrappedRead() some number of times until it spots the end of the request. To parse a request, the method invokes the ParseRequest method (also within HttpConnection). At this point, the HttpServer_ThrFun has a fully parsed HttpRequest object (defined in HttpRequest.h).
Once those functions return an HttpResponse, the HttpServer_ThrFn invokes the WriteResponse method on the HttpConnection object to write the response back to the client.
Our web server isn't too complicated, but there is a fair amount of plumbing to get set up. In this part of the assignment, we want you to read through a bunch of lower-level code that we've provided for you. You need to understand how this code works to finish our web server implementation, but we won't have you modify this plumbing.
git pull
to retrieve
the new hw4 folder with the starter code for this assignment. You
will need the hw1, hw2, and hw3 directories in the same folder as
your new hw4 folder since hw4 links to files in those previous
directories. Also, as with previous parts of the project, you can
use the solution binary versions of the previous parts of the
project if you wish.
./http333d 5555 ../projdocs ../hw3/unit_test_indices/*(You might need to pick a different port than 5555 if someone else is using that port on the same machine as you.)
Try using our solution_binaries server, and running it using a similar command line:
./solution_binaries/http333d 5555 ../projdocs ../hw3/unit_test_indices/*Next, use a web browser to explore what the server should look like when finished:
attu
over an SSH connection: Follow the same steps as above, but navigate to the address for the instance
of attu
your code is running on. For example, if you are running your code on attu4
, you would visit the following addresses:
http://attu4.cs.washington.edu:5555/
http://attu4.cs.washington.edu:5555/static/bikeapalooza_2011/Bikeapalooza.html
When you are done with the http333d server, you may find that Control + C doesn't work to shut it down. In that case, use
another terminal window on the same machine and run the command
kill pidwhere pid is the server process number. Use the ps -u command to find that number.
You are now going to finish a basic implementation of the http333d web server. We'll have you implement some of the event handling routines at different layers of abstraction in the web server, culiminating with generating HTTP and HTML to send to the client.
At this point, your web server should run correctly, and everything should compile with no warnings. Try running your web server and connecting to it from a browser as described above. Also try running the test_suite under valgrind to make sure there are no memory issues. Finally, launch the web server under valgrind to make sure there are no issues or leaks; after the web server has launched, exercise it by issuing a few queries, then kill the web server. (The supplied code does have some leaks, but your code should not make things significantly worse.)
Now that the basic web server works, you will discover that your web server (probably) has two security vulnerabilities. We are going to point these out to you, and you will repair them.
It's likely at this point that your implementation has two security flaws. (However, please note: it is possible that the way you implemented things above means you have already dealt with these flaws).
hello <script>alert("Boo!");</script>To fix this flaw, you need "escape" untrusted input from the client before you relay it to output. We've provided you with an escape function in HttpUtils.
/static/../hw4/http333d.ccThis is called a directory traversal attack. Instead of trusting the file pathname provided by a client, you need to normalize the path and verify that it names a file within your document subdirectory tree (which would be projdocs/ if the example command shown in part A was used to start the server). If the file names something outside of that subdirectory, you should return an error message instead of the file contents. We've provided you with a function in HttpUtils to help you test to see if a path is safe or not.
Fix these two security flaws, assuming they do in fact exist in your server. As a point of reference, in solution_binaries/, we've provided a version of our web server that has both of these flaws in place (http333d_withflaws). Feel free to try it out, but DO NOT leave this server running, as it will potentially expose all of your files to anybody that connects to it.
Congrats, you're done with the HW4 project sequence!!
There are two bonus tasks for this assignment. As before, you can do them, or not; if you don't, there will be no negative impact on your grade. You should not attempt either bonus task unless and until the basic assignment is working properly. We will not award any bonus credit if the basic assignment is not substantially correct.
If you want to do any of the bonus parts, first create a
hw4-final
tag in your repository to mark the version of
the assignment with the required parts of the project.
That will allow us to more easily evaluate how well you did on the
basic requirements of the assignment.
Then, when you are done adding additional bonus parts, create a new
tag hw4-bonus
after committing and pushing your additions,
and push the new tag to your GitLab repository. If we find a
hw4-bonus
tag in your repository we'll grade the extra
credit parts; otherwise we'll assume that you just did the required
parts.
If you do any of the bonus parts, you should add a file named
readme_bonus.txt
in your top-level hw4
directory giving a brief summary of the additions in your project.
You should conduct this performance analysis for a few different usage scenarios; e.g., you could vary the size of the web page you request, and see its impact on the number of pages per second your server can deliver. If you choose to do the bonus, please include a PDF file in your submission containing relevant performance graphs and analysis.
x words + <bold>hit word</bold> + y wordsfor one or more of the query words that hit. If you do this part of the assignment, please add a readme_bonus.txt file to your hw4 directory describing what you've added and how to use it.
This part of the assignment is deliberately open-ended, with much less structure than earlier parts. The (small) amount of extra credit granted will depend on how interesting your extension is and how well it is implemented.
When you are ready to turn in your assignment, you should follow
the same procedures you used for previous assignments, except this
time tag the repository with hw4-final
. Remember to clean
up, commit, and push all necessary files to your repository before you
add the tag and push it. After you have created and pushed the tag,
remember to test everything by creating a
new clone of the repository in a separate, empty directory, checkout
the hw4-final
tag, and verify that everything works as
expected. Refer to the hw0 turnin instructions for details, and follow
those steps carefully.
If you do any of the bonus parts, create an additional
hw4-bonus
tag and push that after adding and pushing the
bonus code to the repository. Be sure to clone the repo, checkout that
tag, and verify that everything works as expected. Also verify that
the hw4-final
tag is still present and that it includes
(only) the required parts of the project.
As with previous parts of the project, when you clone your repository
to check it, it will normally not include previous solution files like
hw1/libhw1.a. You should either run make
in the hw1 through hw3
directories to recreate those archives, or else copy the versions from
the solution_binaries folders into the right places. These are needed in
order to build hw4 and test it.
We will be basing your grade on several elements: