Out: Monday, August 8, 2022
Due: Wednesday, August 17, 2022 by 11:59 pm
In this assignment, you will build on top of your Homework 3 implementation to complete a multithreaded web server front-end to your query processor.
As before, please read through this entire document before beginning the assignment, and please start early!
Makefiledistributed with the project. In particular, there are reasonable ways to do the necessary string handling without using the Boost Regex library.
To help you schedule your time, here's a suggested order for the parts of this assignment. We're not going to enforce a schedule; it's up to you to manage your time.
ServerSocket.cc. Make sure to cover all functionality, not just what is in the unit tests.
FileReader.cc, which should be very easy, and
HttpConnection.cc. This can be tricky, as it involves both Boost and string parsing.
HttpServer.cc. At this point, you should be able to search the "333gle" site and view the webpages available under
Our web server is a fairly straightforward multithreaded application. Every time a client connects to the server, the server dispatches a thread to handle all interactions with that client. Threads do not interact with each other at all, which greatly simplifies the design of the server.
The figure to the right shows the high-level architecture of the
There is a main class called
HttpServer that uses a
ServerSocket class to create a listening socket, and
then sits in a loop waiting to accept new connections from clients.
For each new connection that the
it dispatches a thread from a
ThreadPool class to
handle the connection.
The dispatched thread springs to life in a function called
HttpServer_ThrFn() within the
HttpServer_ThrFn() function handles reading
requests from one client.
For each request that the client sends, the
GetNextRequest() on the
object to read in the next request and parse it.
To read a request, the
GetNextRequest() method invokes
WrappedRead() some number of times until it spots the
end of the request.
To parse a request, the method invokes the
ParseRequest() method (also within
At this point, the
HttpServer_ThrFun() has a fully
HttpRequest object (defined in
The next job of
HttpServer_ThrFn() is to process the
To do this, it invokes the
which looks at the request URI to determine if this is a request
for a static file, or if it is a request associated with the search
Depending on what it discovers, it either invokes
Once those functions return an
HttpServer_ThrFn() invokes the
WriteResponse() method on the
HttpConnection object to write the response back to
Our web server isn't too complicated, but there is a fair amount of plumbing to get set up. In this part of the assignment, we want you to read through a bunch of lower-level code that we've provided for you. You need to understand how this code works to finish our web server implementation, but we won't have you modify this plumbing.
git pullto retrieve the new
hw4/folder for this assignment. You will need the
projdocs/directories in the same folder as your new
hw4/folder since hw4 links to files in those previous directories.
hw4/to familiarize yourself with the structure. Note that there are
libhw3/directories that contain symlinks to your
libhw3.a, respectively. You can replace your libraries with ours (from the appropriate
solution_binariesdirectories) if you prefer.
maketo compile the two HW4 binaries. One of which is the usual unit test binary. Run it, and you'll see the unit tests fail, crash out, and you won't yet earn the automated grading points tallied by the test suite.
http333d. Its usage message will reveal its command-line arguments; an example call looks like:
bash$ ./http333d 5555 ../projdocs unit_test_indices/*In the meantime, start up a working web server using the provided solution binary:
bash$ ./solution_binaries/http333d 5555 ../projdocs unit_test_indices/*
attu, note which specific machine you are running the web server on (e.g.,
attu4) and open http://attu4.cs.washington.edu:5555/ and http://attu4.cs.washington.edu:5555/static/bikeapalooza_2011/Bikeapalooza.html in different tabs, changing the attu number and port number as needed.
Enter a few search queries in the first tab and then click around the Bikeapalooza gallery in the second tab; this is what your finished web server will be capable of!
http333dserver, the most graceful way to shut it down is to open another terminal window on the same machine, run the command:
bash$ ps -uto find the process id (pid) of the web server, and then run:
bash$ kill pid
ThreadPool.cc. You don't need to implement anything in either, but several pieces of the project rely on this code. The header file is well-documented, so it ought to be clear how it's used. There's also a unit test file that you can peek at.
HttpUtils.cc. This class defines a number of utility functions that the rest of HW4 uses. You will have to implement some of these utilites while completing
test_suite. Make sure that you understand what each of the utilities do, and why we may want them.
HttpResponse.h. These files define the
HttpResponseclasses, which represent a parsed HTTP request and response, respectively.
You are now going to finish a basic implementation of the
http333d web server.
You will need to implement some of the event handling routines at
different layers of abstraction in the web server, culiminating
with generating HTTP and HTML to send to the client.
ServerSocket.h. This file contains a helpful class for creating a server-side listening socket, and accepting a new connection from a client. We've provided you with the class declaration in
ServerSocket.hbut no implementation in
ServerSocket.cc; your next job is to build it. You'll need to make the code handle both IPv4 and IPv6 clients. Run the test suite to see if you make it past the
FileReader.cc. Note that the implementation of
FileReader.ccis missing; go ahead and implement it. See if you make it past the
HttpConnection.cc. The two major functions in
HttpConnection.cchave their implementations missing, but have generous comments for you to follow. Implement the missing functions, and see if you make it past the
HttpUtils.cc. There are two functions in
HttpUtils.ccthat have their implementations missing, but have generous comments to help you figure out their implementation. Implement the missing functions, and see if you make it past the
http333d.cc. Note that some parts of
http333d.ccare missing. Go ahead and implement those missing functions. The only requirement here is that your web server mimics the same behavior (i.e., have a search bar, process files and queries correctly, and show their results similarly) as the solution binaries; although entirely optional, you are free to modify the look of your 333gle site:
Once you have the functions implemented, test your
http333d binary to see if it works by running the
web server and connecting to it from a browser (as described in
Part A Step 5 above), exercising both the web search and static
file serving functionalities.
test_suiteunder valgrind to make sure there are no memory issues. Finally, launch the web server under valgrind to make sure there are no issues or leaks: after the web server has launched, exercise it by issuing a few queries, then kill the web server.
Now that the basic web server works, you will discover that your web server (probably) has two security vulnerabilities. We are going to point these out to you, and you will repair them. Of course, it IS possible that the way you implemented things above means you have already dealt with these flaws.
HttpUtils.ccwill be very helpful in fixing these security flaws in your web server.
Fix the following two security flaws, if currently found in your
As a point of reference, we've provided a version of our web server
that has both of these flaws in place
Feel free to try it out, but DO NOT leave this server
running, as it will potentially expose all of your files to anybody
that connects to it.
hello <script>alert("Boo!");</script>To fix this flaw, you need "escape" untrusted input from the client before you relay it to output.
/static/../hw4/http333d.ccThis is called a directory traversal attack. Instead of trusting the file pathname provided by a client, you need to normalize the path and verify that it names a file within your document subdirectory tree (which would be
../projdocs/if the example command shown in Part A was used to start the server). If the provided path names something outside of that subdirectory, you should return an error message instead of the file contents.
hw4-final)! This way, your bonus tasks won't affect the HW4 grading.
For HW4 bonus grading, create a file
in your top-level
hw4 directory for summarizing the
When you are done adding additional bonus parts and have committed
and pushed them to your GitLab repository, tag that commit
If we find a
hw4-bonus tag in your repository, we'll
grade the bonus parts; otherwise we'll assume that you just did the
There are two bonus tasks for this assignment.
httperf tool for Linux can generate synthetic
You should conduct this performance analysis for a few
different usage scenarios; e.g., you could vary the
size of the web page you request, and see its impact on the
number of pages per second your server can deliver.
If you choose to do this bonus task, please include a PDF file
in your submission containing relevant performance graphs and
x words + <bold>hit word</bold> + y wordsfor one or more of the query words that hit.
If you choose to do this bonus task, describe your added
feature(s) and how to use them in
This part of the assignment is deliberately open-ended, with
much less structure than earlier parts.
The (small) amount of extra credit granted will depend on how
interesting your extension is and how well it is implemented.
As with the previous HWs, you can compile the your implementation
by using the
This will result in several output files, including an executable
After compiling your solution with
make, you can run
all of the tests for the homwork by running:
You can also run only specific tests by passing command-line
This is extremely helpful for debugging specific parts of the
assignment, especially since
test_suite can be run
with these settings through
bash$ ./test_suite --gtest_filter=Test_HttpConnection.*
bash$ tests/test_suite --gtest_filter=-Test_ServerSocket.*
You can specify which tests are run for any of the tests in the assignment — you just need to know the names of the tests! You can list them all out by running:
bash$ tests/test_suite --gtest_list_tests