Due: Wednesday, August 14th, 2024 by 11:00 pm
In this assignment you will build on your HW3 implementation to implement a multithreaded Web server front-end to your query processor. In Part A, you will read through some of our code to learn about the infrastructure we have built for you. In Part B, you will complete some of our classes and routines to finish the implementation of a simple Web server. In Part C, you will fix some security problems in our Web server.
As before, please read through this entire document before beginning the assignment, and please start early!
Verify333()
's to spot some kinds of errors and cause
your program to crash. However, no matter what a client does,
or what input the web server reads,
your web server must handle that; only internal issues (such
as out of memory) should cause your web server to crash out.
Makefile
distributed
with the project. In particular, there are reasonable ways to do
the necessary string handling without using the Boost Regex library.To help you schedule your time, here's a suggested order for the parts of this assignment. We're not going to enforce a schedule; it's up to you to manage your time.
ServerSocket.cc
. Make sure to
cover all functionality, not just what is in the unit tests.
FileReader.cc
, which should be very
easy, and GetNextRequest()
in
HttpConnection.cc
.
ParseRequest()
in
HttpConnection.cc
. This can be tricky, as it
involves both Boost and string parsing.
http333d.cc
. Implement
HttpServer_ThrFn()
in HttpServer.cc
.
ProcessFileRequest()
and
ProcessQueryRequest()
in HttpServer.cc
.
At this point, you should be able to search the "333gle" site
and view the webpages available under /static/
,
e.g. http://localhost:5555/static/bikeapalooza_2011/index.html
.
Our web server is a fairly straightforward multithreaded application. Every time a client connects to the server, the server dispatches a thread to handle all interactions with that client. Threads do not interact with each other at all, which greatly simplifies the design of the server.
The figure to the right shows the high-level architecture of
the server. There is a main class called HttpServer
that uses a ServerSocket
class to create a listening
socket, and then sits in a loop waiting to accept new
connections from clients. For each new connection that the
HttpServer
receives, it dispatches a thread from
a ThreadPool
class to handle the connection. The
dispatched thread springs to life in a function
called HttpServer_ThrFn()
within the
HttpServer.cc
file.
The HttpServer_ThrFn()
function handles reading
requests from one client. For each request that the client
sends, the HttpServer_ThrFn()
invokes
GetNextRequest()
on the HttpConnection
object to read in the next request and parse it.
To read a request, the GetNextRequest()
method
invokes WrappedRead()
some number of times until
it spots the end of the request. To parse a request, the method
invokes the ParseRequest()
method (also within
HttpConnection
). At this point, the
HttpServer_ThrFun()
has a fully parsed
HttpRequest
object (defined in
HttpRequest.h
).
The next job of HttpServer_ThrFn()
is to process
the request. To do this, it invokes the
ProcessRequest()
function, which looks
at the request URI to determine if this is a request for
a static file, or if it is a request associated with the
search functionality. Depending on what it discovers,
it either invokes ProcessFileRequest()
or ProcessSearchRequest()
.
Once those functions return an HttpResponse
, the
HttpServer_ThrFn()
invokes the
WriteResponse()
method on the
HttpConnection
object to write the response
back to the client.
Our web server isn't too complicated, but there is a fair amount of plumbing to get set up. In this part of the assignment, we want you to read through a bunch of lower-level code that we've provided for you. You need to understand how this code works to finish our web server implementation, but we won't have you modify this plumbing.
hw1/
,
hw2/
, hw3/
, and
projdocs/
directories in it. Use
git pull
to retrieve the new hw4/
folder with the starter code for this assignment. You will
need the hw1/
, hw2
, and
hw3/
directories in the same folder as your
new hw4/
folder since hw4 links
to files in those previous directories. Also, as with
previous parts of the project, you can
use the solution_binaries/
versions of the
previous parts of the project if you wish.
make
to compile the HW4 binaries. One
of them is the usual unit test binary called
test_suite
. Run it, and you'll see the unit
tests fail, crash out, and you won't yet earn the automated
grading points tallied by the test suite. The second binary
is the web server itself: http333d
. Try running
it to see its command line arguments. When you're ready to
run it for real, you can use a command like:
./http333d 5555 ../projdocs unit_test_indices/*(You might need to pick a different port than 5555 if someone else is using that port on the same machine as you.)
Try using our solution_binaries
server, and
running it using a similar command line:
./solution_binaries/http333d 5555 ../projdocs unit_test_indices/*
Next, use a web browser to explore what the server should look like when it's finished:
attu
over
an SSH connection: Follow the same steps as above,
but navigate to the address for the instance of
attu
your code is running on. For example,
if you are running your code on attu4
, you
would visit the following addresses:
http://attu4.cs.washington.edu:5555/ and
http://attu4.cs.washington.edu:5555/static/bikeapalooza_2011/Bikeapalooza.html
When you are done with the http333d
server,
the most graceful way to shut it down is to
open another terminal window on the
same machine and run the command
kill pidwhere
pid
is the server process id.
Use the ps -u
command on the same
machine (attu or local VM) to find that process id.
You also can probably shut down the server by typing control-C
in the window where it is running, but this isn't as graceful
and doesn't always work as reliably as a kill
command.
ThreadPool.h
and
ThreadPool.cc
. You don't need to implement
anything in either, but several pieces of the
project rely on this code. The header file is
well-documented, so it ought to be clear how it's used.
(There's also a unit test file that you can peek at.)
HttpUtils.h
and
HttpUtils.cc
. This class defines a number
of utility functions that the rest of HW4 uses.
You will have to implement some of these utilities. Make sure
that you understand what each of them does, and why.
HttpRequest.h
and
HttpResponse.h
. These files define the
HttpRequest
and HttpResponse
classes, which represent a parsed HTTP request and response,
respectively.
It's time to start coding in Part B.
You are now going to finish a basic implementation of the
http333d
web server. You will need to implement
some of the event handling routines at different layers of
abstraction in the web server, culiminating with generating
HTTP and HTML to send to the client.
ServerSocket.h
. This file
contains a helpful class for creating a server-side
listening socket, and accepting a new connection from a
client. We've provided you with the class declaration in
ServerSocket.h
but no implementation in
ServerSocket.cc
; your next job is to build
it. You'll need to make the code handle both IPv4 and IPv6
clients. Run the test_suite to see if you make it past
the ServerSocket
unittests.
FileReader.h
and
FileReader.cc
. Note that the
implementation of FileReader.cc
is missing; go
ahead and implement it. See if you make it past the
FileReader
unittests.
HttpConnection.h
and
HttpConnection.cc
. The two major functions in
HttpConnection.cc
have their implementations
missing, but have generous comments for you to follow.
Implement the missing functions, and see if you make it past
the HttpConnection
unittests.
HttpUtils.h
and
HttpUtils.cc
. There are two functions in
HttpUtils.cc
that have their implementations
missing, but have generouts comments to help you figure out
their implementation.
Implement the missing functions, and see if you make it past
the HttpUtils
unittests.
HttpServer.cc
, HttpServer.h
, and
http333d.cc
. Note that some parts of
HttpServer.cc
and http333d.cc
are
missing. Go ahead and implement those missing functions.
Once you have them working, test your http333d
binary to see if it works. Make sure you exercise both the
web search functionality as well as the static file serving
functionality. You'll probably need to look at the source
of pages that our solution binary serves and emulate that
HTML to get the same "look and feel" to your server as ours.
If you wish, you can change the appearance of the front page
("dark mode", different graphics, etc.) but you should not
change or add to the functionality of the server beyond the
appearance now.
If you want to do more, see the Bonus section, below.
At this point, your web server should run correctly, and
everything should compile with no warnings. Try running your
web server and connecting to it from a browser as described
above. Also try running the test_suite
under
valgrind to make sure there are no memory issues. Finally,
launch the web server under valgrind to make sure there are no
issues or leaks; after the web server has launched, exercise
it by issuing a few queries, then kill the web server. (The
supplied code does have some leaks, but your code should not
make things significantly worse.)
Now that the basic web server works, you will discover that your web server (probably) has two security vulnerabilities. We are going to point these out to you, and you will repair them.
It's likely at this point that your implementation has
two security flaws. (However, please note: it is possible
that the way you implemented things above means you have already
dealt with these flaws).
You may find that some of the functions defined in
HttpUtils.cc
will be very helpful in fixing these
security flaws in your web server.
hello <script>alert("Boo!");</script>To fix this flaw, you need "escape" untrusted input from the client before you relay it to output. We've provided you with an escape function in HttpUtils.
telnet
or nc
to connect to your web server, and manually send a
request for the following URL. (Browsers are smart enough
to help defend against this attack, so you can't just type
it into the URL bar, but nothing prevents attackers from
directly connecting to your server with a program of
their own!)
/static/../hw4/http333d.ccThis is called a directory traversal attack. Instead of trusting the file pathname provided by a client, you need to normalize the path and verify that it names a file within your document subdirectory tree (which would be
../projdocs/
if the example command shown in part
A was used to start the server). If the file path names something
outside of that subdirectory, you should return an error
message instead of the file contents. We've provided you with
a function in HttpUtils.h
to help you test to
see if a path is safe or not.
Fix these two security flaws, assuming they do in fact exist in
your server. As a point of reference, in
solution_binaries/
, we've provided a version of
our web server that has both of these flaws in place
(http333d_withflaws
). Feel free to
try it out, but DO NOT leave this server running, as it
will potentially expose all of your files to anybody that
connects to it.
Congrats, you're done with the CSE 333 project sequence!!
There are two bonus tasks for this assignment. As before, you can do them, or not; if you don't, there will be no negative impact on your grade. You should not attempt either bonus task unless and until the basic assignment is working properly. We will not award any bonus credit if the basic assignment is not substantially correct.
If you want to do any of the bonus parts, first
create a hw4-final
tag in your repository to mark
the version of the assignment with the required parts of
the project. That will allow us to more easily evaluate
how well you did on the basic requirements of the assignment.
Then, when you are done adding additional bonus parts, create
a new tag hw4-bonus
after committing and pushing
your additions, and push the new tag to your GitLab repository.
If we find a hw4-bonus
tag in your repository
we'll grade the extra credit parts; otherwise we'll assume
that you just did the required parts.
If you do any of the bonus parts, you should add a file named
readme_bonus.txt
in your top-level hw4
directory giving a brief summary of the additions in your
project and how to use them..
httperf
tool for Linux to generate
synthetic load.
You should conduct this performance analysis for a few different usage scenarios; e.g., you could vary the size of the web page you request, and see its impact on the number of pages per second your server can deliver. If you choose to do this bonus task, please include a PDF file in your submission containing relevant performance graphs and analysis.
x words + <bold>hit word</bold> + y wordsfor one or more of the query words that hit.
This part of the assignment is deliberately open-ended, with much less structure than earlier parts. The (small) amount of extra credit granted will depend on how interesting your extension is and how well it is implemented.
As with previous homeworks, you compile your implementation
using the make
command. This will result in several
output files, including an executable called test_suite
.
You can run all of the tests in that suite with the usual command:
bash$ ./test_suite
You can also run only specific tests by passing command line
arguments into test_suite
. For example, to only
run the HttpConnection tests, you can type:
bash$ ./test_suite --gtest_filter=Test_HttpConnection.*
In general, you can specify which tests are run for any of the tests in the assignment; you just need to know the names of the tests, which can be obtained by running:
bash$ ./test_suite --gtest_list_tests
You can also run test_suite
and specify particular
tests that should NOT be run. For instance, the
ServerSocket
tests can take a while to run; to run all
tests expect for those, enter
bash$ ./test_suite --gtest_filter=-Test_ServerSocket.*
These settings can be helpful for debugging specific parts of the
assignment, especially since test_suite
can be run with
these settings through valgrind
and gdb
!
In addition to passing tests, your code should be high quality and readable. This includes several aspects:
static
) helper functions, be sure to provide good
comments that explain the function inputs, outputs, and
behavior. These comments can often be relatively brief as long
as they convey to the reader the information needed to
understand how to use the function and what it does when
executed.cpplint.py --clint
tool to check for style
issues and common coding bugs. Be sure to fix issues
reported before submitting your code. Exception: if
cpplint
reports style problems in the supplied
starter code, you should leave that code as-is.
When you are ready to turn in your assignment, you should follow
the same procedures you used for previous assignments, except this
time tag the repository with hw4-final
. Remember to
clean up, commit, and push all necessary files to your
repository before you add the tag and push it. After you
have created and pushed the tag,
remember to test everything by creating
a new clone of the repository in a separate, empty directory,
checkout the hw4-final
tag, and verifying that
everything works as expected on the CSE Linux environment.
Refer to the hw0 turnin
instructions for details, and follow those steps carefully.
If you fail to check your work and your project doesn't build properly when the same steps are done by the course staff to grade it, you may lose a huge amount of the possible credit for the assignment even if almost absolutely everything is actually correct.
If you do any of the bonus parts, create an additional
hw4-bonus
tag and push that after adding and
pushing the bonus code to the repository. Be sure to clone
the repo, checkout that tag, and verify that everything works
as expected. Also verify that the hw4-final
tag
is still present and that it includes (only) the required
parts of the project.
As with previous parts of the project, when you clone your
repository to check it, it will normally not include previous
solution files like hw1/libhw1.a. You should either run
make
in the hw1 through hw3 directories to recreate
those archives, or else copy the versions from the
solution_binaries folders into the right places. These are needed
for you to build and test hw4.
We will be basing your grade on several elements: