Out: Friday, August 4, 2023
Due: Wednesday, August 16, 2023 by 11:59 pm PDT
In this assignment, you will build on top of your Homework 3 implementation to complete a multithreaded web server front-end to your query processor.
bug_journal.md
.As before, please read through this entire document before beginning the assignment, and please start early!
Makefile
distributed with
the project.
In particular, there are reasonable ways to do the necessary
string handling without using the Boost Regex library.To help you schedule your time, here's a suggested order for the parts of this assignment. We're not going to enforce a schedule; it's up to you to manage your time.
ServerSocket.cc
. Make sure to
cover all functionality, not just what is in the unit tests.
FileReader.cc
, which should be very
easy, and GetNextRequest()
in
HttpConnection.cc
.
ParseRequest()
in
HttpConnection.cc
. This can be tricky, as it
involves both Boost and string parsing.
http333d.cc
. Implement
HttpServer_ThrFn()
in HttpServer.cc
.
ProcessFileRequest()
and
ProcessQueryRequest()
in HttpServer.cc
.
At this point, you should be able to search the "333gle" site
and view the webpages available under /static/
,
e.g. http://localhost:5555/static/bikeapalooza_2011/index.html
.
Our web server is a fairly straightforward multithreaded application. Every time a client connects to the server, the server dispatches a thread to handle all interactions with that client. Threads do not interact with each other at all, which greatly simplifies the design of the server.
The figure to the right shows the high-level architecture of the
server.
There is a main class called HttpServer
that uses a
ServerSocket
class to create a listening socket, and
then sits in a loop waiting to accept new connections from clients.
For each new connection that the HttpServer
receives,
it dispatches a thread from a ThreadPool
class to
handle the connection.
The dispatched thread springs to life in a function called
HttpServer_ThrFn()
within the
HttpServer.cc
file.
The HttpServer_ThrFn()
function handles reading
requests from one client.
For each request that the client sends, the
HttpServer_ThrFn()
invokes
GetNextRequest()
on the HttpConnection
object to read in the next request and parse it.
To read a request, the GetNextRequest()
method invokes
WrappedRead()
some number of times until it spots the
end of the request.
To parse a request, the method invokes the
ParseRequest()
method (also within
HttpConnection
).
At this point, the HttpServer_ThrFun()
has a fully
parsed HttpRequest
object (defined in
HttpRequest.h
).
The next job of HttpServer_ThrFn()
is to process the
request.
To do this, it invokes the ProcessRequest()
function,
which looks at the request URI to determine if this is a request
for a static file, or if it is a request associated with the search
functionality.
Depending on what it discovers, it either invokes
ProcessFileRequest()
or
ProcessSearchRequest()
.
Once those functions return an HttpResponse
, the
HttpServer_ThrFn()
invokes the
WriteResponse()
method on the
HttpConnection
object to write the response back to
the client.
Our web server isn't too complicated, but there is a fair amount of plumbing to get set up. In this part of the assignment, we want you to read through a bunch of lower-level code that we've provided for you. You need to understand how this code works to finish our web server implementation, but we won't have you modify this plumbing.
git pull
to retrieve the new hw4/
folder for this assignment.
You will need the hw1/
, hw2
,
hw3/
, and projdocs/
directories in the
same folder as your new hw4/
folder since hw4 links
to files in those previous directories.hw4/
to
familiarize yourself with the structure.
Note that there are libhw1/
, libhw2/
,
and libhw3/
directories that contain symlinks to
your libhw1.a
, libhw2.a
, and
libhw3.a
, respectively.
You can replace your libraries with ours (from the appropriate
solution_binaries
directories) if you prefer.make
to compile the two
HW4 binaries.
One of which is the usual unit test binary.
Run it, and you'll see the unit tests fail, crash out, and you
won't yet earn the automated grading points tallied by the test
suite.http333d
.
Its usage message will reveal its command-line arguments; an
example call looks like:
bash$ ./http333d 5555 ../projdocs unit_test_indices/*In the meantime, start up a working web server using the provided solution binary:
bash$ ./solution_binaries/http333d 5555 ../projdocs unit_test_indices/*
http333d
.attu
, note which specific machine you are
running the web server on (e.g., attu4
)
and open
http://attu4.cs.washington.edu:5555/
and
http://attu4.cs.washington.edu:5555/static/bikeapalooza_2011/Bikeapalooza.html
in different tabs, changing the attu number and port number
as needed.Enter a few search queries in the first tab and then click around the Bikeapalooza gallery in the second tab; this is what your finished web server will be capable of!
http333d
server, the
most graceful way to shut it down is to open another terminal
window on the same machine, run the command:
bash$ ps -uto find the process id (pid) of the web server, and then run:
bash$ kill pid
ThreadPool.h
and
ThreadPool.cc
.
You don't need to implement anything in either, but several
pieces of the project rely on this code.
The header file is well-documented, so it ought to be clear how
it's used.
There's also a unit test file that you can peek at.HttpUtils.h
and
HttpUtils.cc
.
This class defines a number of utility functions that the rest of
HW4 uses.
You will have to implement some of these utilites while
completing test_suite
.
Make sure that you understand what each of the utilities do, and
why we may want them.HttpRequest.h
and HttpResponse.h
.
These files define the HttpRequest
and
HttpResponse
classes, which represent a parsed HTTP
request and response, respectively.
You are now going to finish a basic implementation of the
http333d
web server.
You will need to implement some of the event handling routines at
different layers of abstraction in the web server, culiminating
with generating HTTP and HTML to send to the client.
ServerSocket.h
.
This file contains a helpful class for creating a server-side
listening socket, and accepting a new connection from a client.
We've provided you with the class declaration in
ServerSocket.h
but no implementation in
ServerSocket.cc
; your next job is to build it.
You'll need to make the code handle both IPv4 and IPv6 clients.
Run the test suite to see if you make it past the
ServerSocket
unittests.FileReader.h
and
FileReader.cc
.
Note that the implementation of FileReader.cc
is
missing; go ahead and implement it.
See if you make it past the FileReader
unittests.HttpConnection.h
and
HttpConnection.cc
.
The two major functions in HttpConnection.cc
have
their implementations missing, but have generous comments for you
to follow.
Implement the missing functions, and see if you make it past the
HttpConnection
unittests.HttpUtils.h
and
HttpUtils.cc
.
There are two functions in HttpUtils.cc
that have
their implementations missing, but have generous comments to help
you figure out their implementation.
Implement the missing functions, and see if you make it past the
HttpUtils
unittests.HttpServer.cc
,
HttpServer.h
, and http333d.cc
.
Note that some parts of HttpServer.cc
and
http333d.cc
are missing.
Go ahead and implement those missing functions.
The only requirement here is that your web server mimics the same
behavior (i.e.,
have a search bar, process files and queries correctly, and show
their results similarly) as the solution binaries; although
entirely optional, you are free to modify the look of your 333gle
site:
Once you have the functions implemented, test your
http333d
binary to see if it works by running the
web server and connecting to it from a browser (as described in
Part A Step 5 above), exercising both the web search and static
file serving functionalities.
test_suite
under valgrind to make
sure there are no memory issues.
Finally, launch the web server under valgrind to make sure there
are no issues or leaks: after the web server has launched,
exercise it by issuing a few queries, then kill the web server.
Now that the basic web server works, you will discover that your web server (probably) has two security vulnerabilities. We are going to point these out to you, and you will repair them. Of course, it IS possible that the way you implemented things above means you have already dealt with these flaws.
HttpUtils.cc
will be very helpful in fixing these
security flaws in your web server.
Fix the following two security flaws, if currently found in your
server.
As a point of reference, we've provided a version of our web server
that has both of these flaws in place
(solution_binaries/http333d_withflaws
).
Feel free to try it out, but DO NOT leave this server
running, as it will potentially expose all of your files to anybody
that connects to it.
hello <script>alert("Boo!");</script>To fix this flaw, you need "escape" untrusted input from the client before you relay it to output.
/static/../hw4/http333d.ccThis is called a directory traversal attack. Instead of trusting the file pathname provided by a client, you need to normalize the path and verify that it names a file within your document subdirectory tree (which would be
../projdocs/
if the example command shown in Part A
was used to start the server).
If the provided path names something outside of that
subdirectory, you should return an error message instead of the
file contents.hw4-final
)!
This way, your bonus tasks won't affect the HW4 grading.
For HW4 bonus grading, create a file readme_bonus.txt
in your top-level hw4
directory for summarizing the
additions.
When you are done adding additional bonus parts and have committed
and pushed them to your GitLab repository, tag that commit
hw4-bonus
.
If we find a hw4-bonus
tag in your repository, we'll
grade the bonus parts; otherwise we'll assume that you just did the
required parts.
There are two bonus tasks for this assignment.
The httperf
tool for Linux can generate synthetic
load.
You should conduct this performance analysis for a few
different usage scenarios; e.g., you could vary the
size of the web page you request, and see its impact on the
number of pages per second your server can deliver.
If you choose to do this bonus task, please include a PDF file
in your submission containing relevant performance graphs and
analysis.
x words + <bold>hit word</bold> + y wordsfor one or more of the query words that hit.
If you choose to do this bonus task, describe your added
feature(s) and how to use them in
readme_bonus.txt
.
This part of the assignment is deliberately open-ended, with
much less structure than earlier parts.
The (small) amount of extra credit granted will depend on how
interesting your extension is and how well it is implemented.
As with the previous HWs, you can compile the your implementation
by using the make
command.
This will result in several output files, including an executable
called test_suite
.
After compiling your solution with make
, you can run
all of the tests for the homwork by running:
bash$ ./test_suite
You can also run only specific tests by passing command-line
arguments into test_suite
.
This is extremely helpful for debugging specific parts of the
assignment, especially since test_suite
can be run
with these settings through valgrind
and
gdb
!
Some examples:
HttpConnection
tests, enter:
bash$ ./test_suite --gtest_filter=Test_HttpConnection.*
ServerSocket
tests, enter:
bash$ ./test_suite --gtest_filter=-Test_ServerSocket.*
You can specify which tests are run for any of the tests in the assignment — you just need to know the names of the tests! You can list them all out by running:
bash$ ./test_suite --gtest_list_tests