Due: Tuesday, Dec 3rd, 2024 by 10:00 pm
In this assignment you will build on your HW3 implementation to implement a multithreaded Web server front-end to your query processor. In Part A, you will read through some of our code to learn about the infrastructure we have built for you. In Part B, you will complete some of our classes and routines to finish the implementation of a simple Web server. In Part C, you have the option to fix some security problems in our web server.
As before, please read through this entire document before beginning the assignment, and please start early!
Verify333()
's to spot some kinds of errors and cause
your program to crash. However, no matter what a client does,
or what input the web server reads,
your web server must handle that; only internal issues (such
as out of memory) should cause your web server to crash.
Makefile
distributed
with the project. In particular, there are reasonable ways to do
the necessary string handling without using the Boost Regex library.To help you schedule your time, here's a suggested order for the parts of this assignment. We're not going to enforce a schedule; it's up to you to manage your time.
ServerSocket.cc
. Make sure to
cover all functionality, not just what is in the unit tests.
FileReader.cc
, which should be very
easy, and GetNextRequest()
in
HttpConnection.cc
.
ParseRequest()
in
HttpConnection.cc
. This can be tricky, as it
involves both Boost and string parsing.
http333d.cc.
HttpServer_ThrFn()
in
HttpServer.cc
.
ProcessFileRequest()
and
ProcessQueryRequest()
in HttpServer.cc
.
At this point, you should be able to search the "333gle" site
and view the webpages available under /static/
,
e.g. http://localhost:5555/static/bikeapalooza_2011/index.html
.
Our web server is a fairly straightforward multithreaded
application. Every time a client connects to the server,
the server dispatches a thread to handle all interactions
with that client. Threads do not interact with each other
at all, which greatly simplifies the design of the server.
The figure to the right shows the high-level architecture of
the server. There is a main class called HttpServer
that uses a ServerSocket
class to create a listening
socket, and then sits in a loop waiting to accept new
connections from clients. For each new connection that the
HttpServer
receives, it dispatches a thread from
a ThreadPool
class to handle the connection. The
dispatched thread springs to life in a function
called HttpServer_ThrFn()
within the
HttpServer.cc
file.
The
HttpServer_ThrFn()
function handles reading
requests from one client. For each request that the client
sends, the HttpServer_ThrFn()
invokes
GetNextRequest()
on the HttpConnection
object to read in the next request and parse it.
To read a request, the GetNextRequest()
method
invokes WrappedRead()
some number of times until
it spots the end of the request. To parse a request, the method
invokes the ParseRequest()
method (also within
HttpConnection
). At this point, the
HttpServer_ThrFun()
has a fully parsed
HttpRequest
object (defined in
HttpRequest.h
).
The next job of
HttpServer_ThrFn()
is to process
the request. To do this, it invokes the
ProcessRequest()
function, which looks
at the request URI to determine if this is a request for
a static file, or if it is a request associated with the
search functionality. Depending on what it discovers,
it either invokes ProcessFileRequest()
or ProcessSearchRequest()
.
Once those functions return an HttpResponse
, the
HttpServer_ThrFn()
invokes the
WriteResponse()
method on the
HttpConnection
object to write the response
back to the client.
Our web server isn't too complicated, but there is a fair amount of plumbing to get set up. In this part of the assignment, we want you to read through a bunch of lower-level code that we've provided for you. You need to understand how this code works to finish our web server implementation, but we won't have you modify this plumbing.
hw1/
,
hw2/
, hw3/
, and
projdocs/
directories in it. Use
git pull
to retrieve the new hw4/
folder with the starter code for this assignment. As with
previous parts of the project, you can
use the solution_binaries/
versions of the
previous parts of the project if you wish.
make
to compile the HW4 binaries. One
of them is the usual unit test binary called
test_suite
. Run it to discover failing unit
tests that you'll need to fix. The second binary
is the web server itself: http333d
; try running
it to see its command line arguments. When you're ready to
run it for real, you can use a command like:
./http333d 5555 ../projdocs ../projdocs/unit_test_indices/*We STRONGLY suggest using a different port than 5555, since it's likely that multiple students will be testing their http333d on the same machine as you! You should also try our
solution_binaries
server, which
has fully implemented all the required functionality. It
can be run using a similar command line:
./solution_binaries/http333d 5555 ../projdocs ../projdocs/unit_test_indices/*
Next, use a web browser to explore the server's functionality:
attu
over
an SSH connection: Follow the same steps as above,
but navigate to the address for the instance of
attu
your code is running on. For example,
if you are running your code on attu4
, you
would visit the following addresses:
http://attu4.cs.washington.edu:5555/ and
http://attu4.cs.washington.edu:5555/static/bikeapalooza_2011/Bikeapalooza.html
When you are done with the http333d
server,
the most graceful way to shut it down is to
open another terminal window on the
same machine and run the command
kill pidwhere
pid
is the server process id.
Use the ps -u
command on the same
machine (attu or local VM) to find that process id.
You also can probably shut down the server by typing
control-C in the window where it is running, but this
isn't as graceful and doesn't always work as reliably
as a kill
command.
ThreadPool.h
and
ThreadPool.cc
. You don't need to implement
anything in either, but several pieces of the
project rely on this code. The header file is
well-documented, so it ought to be clear how it's used.
(There's also a unit test file that you can peek at.)
HttpUtils.h
and
HttpUtils.cc
. This class defines a number
of utility functions that the rest of HW4 uses.
You do not have to implement this file (the default
implementations are sufficient if you don't plan on doing
Part C), but make sure
that you understand what each of them does, and why.
HttpRequest.h
and
HttpResponse.h
. These files define the
HttpRequest
and HttpResponse
classes, which represent a parsed HTTP request and response,
respectively.
It's time to start coding in Part B.
You are now going to finish a basic implementation of the
http333d
web server. You will need to implement
some of the event handling routines at different layers of
abstraction in the web server, culiminating with generating
HTTP and HTML to send to the client.
ServerSocket.h
. This file
contains a helpful class for creating a server-side
listening socket, and accepting a new connection from a
client. We've provided you with the class declaration in
ServerSocket.h
but no implementation in
ServerSocket.cc
; your next job is to build
it. You'll need to make the code handle both IPv4 and IPv6
clients. Run the test_suite to see if you make it past
the ServerSocket
unittests.
FileReader.h
and
FileReader.cc
. Note that the
implementation of FileReader.cc
is missing; go
ahead and implement it. See if you make it past the
FileReader
unittests.
HttpConnection.h
and
HttpConnection.cc
. The two major functions in
HttpConnection.cc
have their implementations
missing, but have generous comments for you to follow.
Implement the missing functions, and see if you make it past
the HttpConnection
unittests.
HttpUtils.h
and
HttpUtils.cc
. There are two functions in
HttpUtils.cc
that have their implementations
missing, but have generous comments to help you figure out
their implementation if you choose to to Part C. If you
do choose to skip this part, you can ignore
the HttpUtils
unittests.
HttpServer.cc
, HttpServer.h
, and
http333d.cc
. Note that some parts of
HttpServer.cc
and http333d.cc
are
missing; go ahead and implement those missing functions.
Once you think you have them working, test your
http333d
; be sure to test both the web search
functionality as well as static file serving (eg,
bikeapalooza and Project Gutenberg books). Hint: you'll want
to look at our solution binary's generated HTML; focus on
the links and their link text.
At this point, your web server should run correctly, and everything should compile with no warnings. If you wish, you can change the appearance of the front page ("dark mode", different graphics, etc.) but please refrain from changing or adding to the functionality of the server.
As usual, run the test_suite
under valgrind to
ensure there are no memory issues. You should also launch
the web server under valgrind to make sure there are no
memory issues there; after the web server has launched, exercise
it by issuing a few queries, then kill the web server. (The
supplied code does have some leaks, but your code should not
make things significantly worse.)
Now that the basic web server works, you will discover that your web server (probably) has two security vulnerabilities. We are going to point these out to you, and you can fix them if you'd like! Note that this section is purely for fun; you are not required to implement it, and you will not earn extra credit if you do.
At this point, it's likely point that your implementation has two security flaws (however, it is possible that the way you implemented your server may have already dealth with the flaws).
hello <script>alert("Boo!");</script>Your browser will pop up a dialog box saying "Boo!" when you use your flawed server. To fix this flaw, you will need to "escape" (ie, replace) certain types of input from the client before you relay it to output. We've provided you with a function in
HttpUtils
that detects input which
requires escaping and performs any necessary replacement,
and you should implement it.
nc
to connect to your flawed web server
and to our solution binary. Manually send a request for the
following URL (note: browsers are smart enough to protect
webservers from this attack, so you can't just type
it into the URL bar. But nothing prevents attackers from
directly connecting to your server with a program of
their own!)
/static/../hw4/http333d.ccThis second flaw is called a directory traversal attack. Instead of trusting the file pathname provided by a client, you need to normalize the path and verify that it names a file within your document subdirectory tree (in our example command, that subdirectory is
../projdocs/
). If the file path names something
outside of that subdirectory, you should return an error
message instead of the file contents. We've provided you with
a second function in HttpUtils.h
to determine
whether a path is safe or not.
Fix these two security flaws, assuming they do in fact exist in
your server. As a point of reference, in
solution_binaries/
, we've provided a version of
our web server that has both of these flaws in place
(http333d_withflaws
). Feel free to
try it out, but DO NOT leave this server running, as it
will potentially expose all of your files to anybody that
connects to it.
Congrats, you're done with the CSE 333 project sequence!!
As with previous homeworks, you compile your implementation
using the make
command. This will result in several
output files, including an executable called
test_suite
. You can run all of the tests in that
suite with the usual command:
bash$ ./test_suite
You can also run only specific tests by passing command line
arguments into test_suite
. For example, to only
run the HttpConnection tests, you can type:
bash$ ./test_suite --gtest_filter=Test_HttpConnection.*
In general, you can specify which tests are run for any of the tests in the assignment; you just need to know the names of the tests, which can be obtained by running:
bash$ ./test_suite --gtest_list_tests
You can also run test_suite
and specify particular
tests that should NOT be run. For instance, the
ServerSocket
tests can take a while to run; to run all
tests expect for those, enter
bash$ ./test_suite --gtest_filter=-Test_ServerSocket.*
These settings can be helpful for debugging specific parts of the
assignment, especially since test_suite
can be run
with these settings through valgrind
and
gdb
! However, you should not debug your code using
only the supplied tests! The test setup and code are complex
enough that it can be hard to isolate problems effectively
without spending excessive amounts of time trying to
reverse-engineer the details of the test_suite
code.
Be sure to also run your code on small sample files and
directories where you can predict in advance exactly what data
structures should be created and what their contents should be, and
then use gdb
or other tools to verify that things are
working exactly as expected.
In addition to passing tests, your code should be high quality and readable. This includes several aspects:
static
) helper functions, be sure to provide good
comments that explain the function inputs, outputs, and
behavior. These comments can often be relatively brief as long
as they convey to the reader the information needed to
understand how to use the function and what it does when
executed.cpplint.py
and Valgrind
)
to look for common coding bugs and fix reported issues
before submitting your code. Exception: if
cpplint
reports style problems in the supplied
starter code, you should leave that code as-is.When you are ready to turn in your assignment, you should follow
the same procedures you used for previous assignments, except this
time tag the repository with hw4-final
. Remember to
clean up, commit, and push all necessary files to your
repository before you add the tag. After you have created and
pushed the tag, remember to
test everything in the CSE Linux environment
by creating a new clone of the repository in a separate, empty
directory, checking out the hw4-final
tag, and
verifying that everything works as expected. Refer to the
hw0 turnin instructions
for details, and follow those steps carefully.
It is YOUR responsibility to check your work. If your project doesn't build properly when the course staff does these exact steps to grade it, you may lose a huge amount of the possible credit ... even if almost absolutely everything is correct.
We will be basing your grade on several elements:
test_suite.cc
. If your code
fails a test, we won't attempt to understand why: we're
planning on just including the number of points that the
test drivers print out.
Remember: Both code correctness and code quality matter. Both are weighed significantly in the evaluation of your project.