Out: Fri, Nov 14
Due: Thu, Dec 5 by 8:59 pm
You will build on top of HW3 to implement a multithreaded Web server front-end to your query processor. In Part A, you will read through some of our code to learn about the infrastructure we have built for you. In Part B, you will complete some of our classes and routines to finish the implementation of a simple Web server. In Part C, you will fix some security problems in our Web server.
As before, please read through this entire document before beginning the assignment, and please start early!
In HW4, as with HWs 2 and 3, you don't need to worry about
propagating errors back to callers in all situations. You will use
Verify333()
's to spot some kinds of errors and cause
your program to crash. However, no matter what a client does,
your web server must handle that; only internal issues (such
as out of memory) should cause your web server to crash out.
To help you schedule your time, here's a suggested order for the parts of this assignment. We're not going to enforce a schedule; it's up to you to manage your time.
ServerSocket.cc
. Make sure to
cover all functionality, not just what is in the unit tests.
FileReader.cc
, which should be very
easy, and GetNextRequest()
in
HttpConnection.cc
.
ParseRequest()
in
HttpConnection.cc
. This can be tricky, as it
involves both Boost and string parsing.
http333d.cc
. Implement
HttpServer_ThrFn()
in HttpServer.cc
.
ProcessFileRequest()
and
ProcessQueryRequest()
in HttpServer.cc
.
At this point, you should be able to search the "333gle" site
and view the webpages available under /static/
,
e.g. http://localhost:5555/static/bikeapalooza_2011/index.html
.
Makefile
distributed
with the project. In particular, there are reasonable ways to do
the necessary string handling without using the Boost Regex library.
Our web server is a fairly straightforward multithreaded application. Every time a client connects to the server, the server dispatches a thread to handle all interactions with that client. Threads do not interact with each other at all, which greatly simplifies the design of the server.
The figure to the right shows the high-level architecture of
the server. There is a main class called HttpServer
that uses a ServerSocket
class to create a listening
socket, and then sits in a loop waiting to accept new
connections from clients. For each new connection that the
HttpServer
receives, it dispatches a thread from
a ThreadPool
class to handle the connection. The
dispatched thread springs to life in a function
called HttpServer_ThrFn()
within the
HttpServer.cc
file.
The HttpServer_ThrFn()
function handles reading
requests from one client. For each request that the client
sends, the HttpServer_ThrFn()
invokes
GetNextRequest()
on the HttpConnection
object to read in the next request and parse it.
To read a request, the GetNextRequest()
method
invokes WrappedRead()
some number of times until
it spots the end of the request. To parse a request, the method
invokes the ParseRequest()
method (also within
HttpConnection
). At this point, the
HttpServer_ThrFun()
has a fully parsed
HttpRequest
object (defined in
HttpRequest.h
).
The next job of HttpServer_ThrFn()
is to process
the request. To do this, it invokes the
ProcessRequest()
function, which looks
at the request URI to determine if this is a request for
a static file, or if it is a request associated with the
search functionality. Depending on what it discovers,
it either invokes ProcessFileRequest()
or ProcessSearchRequest()
.
Once those functions return an HttpResponse
, the
HttpServer_ThrFn()
invokes the
WriteResponse()
method on the
HttpConnection
object to write the response
back to the client.
Our web server isn't too complicated, but there is a fair amount of plumbing to get set up. In this part of the assignment, we want you to read through a bunch of lower-level code that we've provided for you. You need to understand how this code works to finish our web server implementation, but we won't have you modify this plumbing.
hw1/
,
hw2/
, hw3/
, and
projdocs/
directories in it. Use
git pull
to retrieve the new hw4/
folder with the starter code for this assignment. You will
need the hw1/
, hw2
, and
hw3/
directories in the same folder as your
new hw4/
folder since hw4 links
to files in those previous directories. Also, as with
previous parts of the project, you can
use the solution_binaries/
versions of the
previous parts of the project if you wish.
make
to compile the HW4 binaries. One
of them is the usual unit test binary called
test_suite
. Run it, and you'll see the unit
tests fail, crash out, and you won't yet earn the automated
grading points tallied by the test suite. The second binary
is the web server itself: http333d
. Try running
it to see its command line arguments. When you're ready to
run it for real, you can use a command like:
./http333d 5555 ../projdocs unit_test_indices/*(You might need to pick a different port than 5555 if someone else is using that port on the same machine as you.)
Try using our solution_binaries
server, and
running it using a similar command line:
./solution_binaries/http333d 5555 ../projdocs unit_test_indices/*
Next, use a web browser to explore what the server should look like when it's finished:
attu
over
an SSH connection: Follow the same steps as above,
but navigate to the address for the instance of
attu
your code is running on. For example,
if you are running your code on attu4
, you
would visit the following addresses:
http://attu4.cs.washington.edu:5555/ and
http://attu4.cs.washington.edu:5555/static/bikeapalooza_2011/Bikeapalooza.html
When you are done with the http333d
server,
you may find that Control-C doesn't work to shut it
down. In that case, use another terminal window on the
same machine and run the command
kill pidwhere
pid
is the server process id.
Use the ps -u
command on the same
attu
to find that process id.
ThreadPool.h
and
ThreadPool.cc
. You don't need to implement
anything in either, but several pieces of the
project rely on this code. The header file is
well-documented, so it ought to be clear how it's used.
(There's also a unit test file that you can peek at.)
HttpUtils.h
and
HttpUtils.cc
. This class defines a number
of utility functions that the rest of HW4 uses. Make sure
that you understand what each of them does, and why.
HttpRequest.h
and
HttpResponse.h
. These files define the
HttpRequest
and HttpResponse
classes, which represent a parsed HTTP request and response,
respectively.
It's time to start coding in Part B.
You are now going to finish a basic implementation of the
http333d
web server. We'll have you implement
some of the event handling routines at different layers of
abstraction in the web server, culiminating with generating
HTTP and HTML to send to the client.
ServerSocket.h
. This file
contains a helpful class for creating a server-side
listening socket, and accepting a new connection from a
client. We've provided you with the class declaration in
ServerSocket.h
but no implementation in
ServerSocket.cc
; your next job is to build
it. You'll need to make the code handle either IPv4 or IPv6
clients. Run the test_suite to see if you make it past
the ServerSocket
unittests.
FileReader.h
and
FileReader.cc
. Note that the
implementation of FileReader.cc
is missing; go
ahead and implement it. See if you make it past the
FileReader
unittests.
HttpConnection.h
and
HttpConnection.cc
. The two major functions in
HttpConnection.cc
have their implementations
missing, but have generous comments for you to follow.
Implement the missing functions, and see if you make it past
the HttpConnection
unittests.
HttpServer.cc
, HttpServer.h
, and
http333d.cc
. Note that some parts of
HttpServer.cc
and http333d.cc
are
missing. Go ahead and implement those missing functions.
Once you have them working, test your http333d
binary to see if it works. Make sure you exercise both the
web search functionality as well as the static file serving
functionality. You'll probably need to look at the source
of pages that our solution binary serves and emulate that
HTML to get the same "look and feel" to your server as ours.
At this point, your web server should run correctly, and
everything should compile with no warnings. Try running your
web server and connecting to it from a browser as described
above. Also try running the test_suite
under
valgrind to make sure there are no memory issues. Finally,
launch the web server under valgrind to make sure there are no
issues or leaks; after the web server has launched, exercise
it by issuing a few queries, then kill the web server. (The
supplied code does have some leaks, but your code should not
make things significantly worse.)
Now that the basic web server works, you will discover that your web server (probably) has two security vulnerabilities. We are going to point these out to you, and you will repair them.
It's likely at this point that your implementation has two security flaws. (However, please note: it is possible that the way you implemented things above means you have already dealt with these flaws).
hello <script>alert("Boo!");</script>To fix this flaw, you need "escape" untrusted input from the client before you relay it to output. We've provided you with an escape function in HttpUtils.
/static/../hw4/http333d.ccThis is called a directory traversal attack. Instead of trusting the file pathname provided by a client, you need to normalize the path and verify that it names a file within your document subdirectory tree (which would be
../projdocs/
if the example command shown in part
A was used to start the server). If the file names something
outside of that subdirectory, you should return an error
message instead of the file contents. We've provided you with
a function in HttpUtils.h
to help you test to
see if a path is safe or not.
Fix these two security flaws, assuming they do in fact exist in
your server. As a point of reference, in
solution_binaries/
, we've provided a version of
our web server that has both of these flaws in place
(http333d_withflaws
). Feel free to
try it out, but DO NOT leave this server running, as it
will potentially expose all of your files to anybody that
connects to it.
Congrats, you're done with the HW4 project sequence!!
There are two bonus tasks for this assignment. As before, you can do them, or not; if you don't, there will be no negative impact on your grade. You should not attempt either bonus task unless and until the basic assignment is working properly. We will not award any bonus credit if the basic assignment is not substantially correct.
If you want to do any of the bonus parts, first
create ahw4-final
tag in your repository to mark
the version of the assignment with the required parts of
the project. That will allow us to more easily evaluate
how well you did on the basic requirements of the assignment.
Then, when you are done adding additional bonus parts, create
a new tag hw4-bonus
after committing and pushing
your additions, and push the new tag to your GitLab repository.
If we find a hw4-bonus
tag in your repository
we'll grade the extra credit parts; otherwise we'll assume
that you just did the required parts.
If you do any of the bonus parts, you should add a file named
readme_bonus.txt
in your top-level hw4
directory giving a brief summary of the additions in your
project.
httperf
tool for Linux to generate
synthetic load.
You should conduct this performance analysis for a few different usage scenarios; e.g., you could vary the size of the web page you request, and see its impact on the number of pages per second your server can deliver. If you choose to do the bonus, please include a PDF file in your submission containing relevant performance graphs and analysis.
x words + <bold>hit word</bold> + y wordsfor one or more of the query words that hit. If you do this part of the assignment, please add a
readme_bonus.txt
file to your hw4 directory
describing what you've added and how to use it.
This part of the assignment is deliberately open-ended, with much less structure than earlier parts. The (small) amount of extra credit granted will depend on how interesting your extension is and how well it is implemented.
When you are ready to turn in your assignment, you should follow
the same procedures you used for previous assignments, except this
time tag the repository with hw4-final
. Remember to
clean up, commit, and push all necessary files to your
repository before you add the tag and push it. After you
have created and pushed the tag,
remember to test everything by creating
a new clone of the repository in a separate, empty directory,
checkout the hw4-final
tag, and verifying that
everything works as expected. Refer to the hw0 turnin
instructions for details, and follow those steps carefully.
If you do any of the bonus parts, create an additional
hw4-bonus
tag and push that after adding and
pushing the bonus code to the repository. Be sure to clone
the repo, checkout that tag, and verify that everything works
as expected. Also verify that the hw4-final
tag
is still present and that it includes (only) the required
parts of the project.
As with previous parts of the project, when you clone your
repository to check it, it will normally not include previous
solution files like hw1/libhw1.a. You should either run
make
in the hw1 through hw3 directories to recreate
those archives, or else copy the versions from the
solution_binaries folders into the right places. These are needed
for you to build and test hw4.
We will be basing your grade on several elements: