CSE 550 Problem Set 1

Contents

Overview
- Common Specification For Both A and B
- Submission specification
Part A: Piping Shell
Part B: AMTED Server

Overview

For this problem set, you will be implementing two (hopefully simple!) pieces of systems software: (A) a basic UNIX shell, and (B) a basic AMTED server.

The requirements for your software are specified using the following conventions:

MUST: The minimum requirements a solution should meet to receive a passing grade.
SHOULD: Describes a well-executed solution. The extent to which a solution follows the SHOULD guidelines determines the points received beyond a passing grade.
CAN: Optional.

Common Specification For Both A and B

MUST

Be implemented in C or C++, but C might be easier in some ways.
Compile and run on the UW CSE Linux distribution. If you like, you can use one of the UW CSE home VMs to develop on; note that VMware is free for UW students.
Document borrowed code.

SHOULD

Compile without warnings.
Avoid memory errors. Use a tool such as valgrind for detection and debugging. (Look for "definitely lost" messages from valgrind.)
Not leak memory, processes, descriptors, etc.
Be well-structured code: clean module interfaces, a nice decomposition, good comments.
Check for error conditions and take reasonable action (exiting the program is generally reasonable).

CAN

Use variants of the system calls we suggest, as long as MUST-2 is met.

Submission specification

MUST

Your submission must be under a single directory called problemset1. This directory must contain two directories, parta and partb, and a README.txt file.
The README.txt file must include the name, student number, and UW email address for all team members. README.txt should also document anything of interest about your solution such as known issues or extra features available.
parta must contain the source code for Part A, and partb must contain the source code for Part B.
Both parta and partb must have their own Makefile. Running make in parta must produce an executable called 550shell. Running make in partb must produce an executable called 550server.
One submission per team.
Submission format is a tarball named problemset1_submission_${UWNETID}.tar.gz. For teams a single uwnetid in the tarball name is sufficient. You can create a submission tarball by running the following command from the parent directory of problemset1 (replacing ${UWNETID} with your email account name):
```
tar -cvzf problemset1_submission_${UWNETID}.tar.gz problemset1
```
For example, since my email account is "arvind", I would run the command:
```
tar -cvzf problemset1_submission_arvind.tar.gz problemset1
```
Use the course dropbox to submit the tarball. There is a link on the course website.

Part A: Piping Shell

Your job is to implement a very simple UNIX shell capable of executing commands through a pipe. Our goal for this assignment is two-fold: to exercise your C/C++/Linux skills, and to get you to think about a clean design that handles the concurrency aspect of the shell. Because of this, you shouldn't worry about any of the more complex aspects of shells, like supporting command line arguments, environment variables, or other shell features (&, >, etc.).

All you need to do is handle the execution of programs and being able to pipe stdout from one program to the stdin of another. More specifically, if the user enters the line ls | wc, your shell program should fork off the two programs, which together calculate the number of files in the current directory. Similarly, if the user enters ls | head | wc, you'll need to arrange to fork three processes.

Part A: Specification

MUST

Read commands to execute from stdin. A command is terminated by a newline character ('\n') and consists of one or more sequence of programs separated by the string " | ".
Continue reading and executing commands until stdin returns EOF.
Wait for the current command to terminate before starting the next command.
The child programs of a command must execute in parallel.
Programs can be named by either an absolute path or just by the program name (exec*p should handle this for you).

CAN

Print an easy-to-parse prompt to stdout.
Use reasonable, hard-coded maximum sizes to simplify your code.

Part A: Hints

To solve this programming exercise, you'll need to use some (but perhaps not all) of the following Linux system calls: fork, exec*p, close, pipe, dup2, and wait. The semantics of these system calls are very particular, and fork/wait have different semantics than in the Mesa paper, so read their man pages carefully. As a hint, you should use pipe to create a pipe, fork to execute the programs, and dup2 to replace stdin or stdout, and wait to wait for the processes to terminate, probably in that order.

You might also want to look at the readline() system call (man 3 readline) available on Linux, as well as strtok() in libC. Together, those calls will simplify the job of parsing user input in your shell.

Part B: AMTED Server

Part B: Overview

Your job is to implement in C/C++/Linux a simple asymmetric multi-threaded event driven file server (AMTED), similar to the AMPED server architecture described in the Flash paper. Your server will read an absolute file path from a client, return the file contents over TCP, then close the connection.

In a nutshell, your implementation should have three pieces:

The main thread of your server handles all network I/O. It is responsible for accepting incoming TCP network connections, reading the name of a file (absolute file path) from a network connection, arranging to read the content of the file into memory, writing the file content over the network connection, then closing the network connection.
A threadpool implementation; this module should permit customers to dispatch a thread from the threadpool to a worker function. The threadpool is made up of a fixed number of long-lived threads. You should use the pthreads interface on Linux for managing threads and synchronizing access to shared data.
A worker function that receives the name of a file, reads the file content into the address space of the process, and arranges to return (a pointer to) the file content to whoever dispatched the thread to the worker function.

Part B: Specification

MUST

Your executable, 550server, must take two command line arguments that specify, respectively, the network address (IPv4) and port that the server opens a listening TCP socket on. For example, the command line:
```
$ ./550server 127.0.0.1 9000
```
should run your server so that it listens to address 127.0.0.1 and port 9000.
The 550server protocol is as follows. After connecting, the client will send an absolute path to a file followed by a newline character ('\n'). Each character in the path is a single byte. The server will send the file contents in response then close the connection.
Serve multiple clients concurrently.
The main thread must be event-driven and use non-blocking network I/O. Disk I/O must happen in a worker thread and should be blocking.
Avoid concurrency bugs. Use synchronization appropriately to protect data shared between threads. If you decide to let multiple threads read and write the same location in memory without any synchronization, then provide evidence with references in your README.txt for why it is safe and correct.

SHOULD

All threads (including the main thread), should wait in a blocking call when there is nothing for that thread to do instead of spinning.
A thread should not continue waiting if there is an event for that thread (i.e. there is work for that thread to do).
Should be robust to invalid client input (say, a path that doesn't exist) by closing the client connection instead of crashing.
Minimize reading and copying of buffers, i.e. client messages and file data.

CAN

Use reasonable, hard-coded maximum sizes to simplify your code. Here are some example maximums that won't interfere with our tests:
- 16 concurrent clients
- 254 characters in a path
- 2 MB file size
Use an open-source networking library as long as your submission contains everything needed to compile and execute on the UW CSE Linux distribution.

Part B: Hints

Read the man pages of system calls carefully to make sure you are using the correct flags and understand the semantics.

If you are unfamiliar with C network programming, we suggest starting with Beej's Guide to Network Programming.

We suggest using the program nc to test your server.

This assignment doesn't require a massive amount of code, but if you've never implemented an event-driven server before, it can be a little mind-bending. You need to use asynchronous I/O to handle all network interactions, and you'll need to keep some kind of finite state machine representation of each connection to know whether you are waiting for network input, waiting for file input from the thread pool, waiting to be able to write data back over the network, etc.

Use Google and stackoverflow.com to get hints on all the things you'll need to do. Some hints:

We suggest using epoll() or poll() to manage a set of file descriptors and receive event notifications associated with them.
You can use sigaction() to tell Linux not to send your process SIGPIPE signals whenever a client closes its end of a TCP connection.
You can use fcntl() twice to set a socket to non-blocking mode, so you can do asynchronous I/O. (The first call to get some flags; then you modify the flags, and call it again to set them in the OS.) Note that on Linux a socket created by accept() does not inherit it's parent socket's flags.
You'll need to read up on how to set up a listening socket, accept an incoming connection, read/write over a socket, and close the connection. There are many, many web pages that describe how to do this in C, and you might be able to find some open source libraries that simplify this task. (You'll need to make sure they support non-blocking I/O, or you'll need to enhance them to use support it.)
A worker thread will need to figure out how to notify the main event thread that it has finished reading a file. One way to solve this is by using one or more pipes to communicate between threads, and adding these pipes to the set of polled file descriptors.