Project 0 asks you to implement what might be described as a client-server data transfer application. We don't actually need dozens of implementations of data transfer, though, so the real point is the experience of implementing communication protocols. While you won't directly re-use your Project 0 code in later projects, you will have to implement very similar functionality in all of them. This is a chance to explore implementation approaches, because your choice could have a big effect on how much time those later projects will take.
The operation of communication protocols always involves asynchrony, which can be an implementation challenge. Project 0 gives you experience implementing protocols using two approaches:
To achieve the goals of this project, you must work in pairs. That is, there are two different developers involved. Project 0 has distinct client and server sides, so there are two distinct pieces of code involved in a full solution. We take advantage of that by dividing the work in this way:
This schedule allows you to (a) experience both implementation approaches, (b) make progress independently of your partner (since your own client and your server should interoperate, even if implemented in different languages), and (c) experience having your code interoperate with code written by someone else. When you're done, all four combinations of client and server choices should work together.
P0P, the Project 0 Protocol, defines what messages are sent between the client and server, and how they are encoded. (There is a mail protocol, POP, with a similar-looking name, but the two have nothing to do with each other.) P0P is designed to support the Project 0 application. The Project 0 application transfers lines of input from the client's stdin to the server, which then prints them on its stdout.
Protocol Headers
P0P is much more realistic than the protocols
shown in sections and class, in large part because it defines a
message as a header plus data (rather than just data). Without a
header, you can have only one kind of message, and so can't do simple
things you almost certainly need to do, even if you don't realize it yet.
(One example is returning an error indication, for instance, even
though we don't do that in this project.)
Protocols you design should always include headers in message encodings.
Protocol Sessions
P0P supports the notion of a session.
A session is a related sequence of messages coming from
a single client.
Sessions allow the server to maintain state about each individual client.
For instance, the server could, in theory, print out how many
messages it has received in each session, for instance, or
it could maintain a shopping cart for each session.
(We don't actually implement either of those.)
Unlike TCP (which has "connections"), UDP doesn't have any notion related to sessions, so we build them as part of our protocol.
Protocol Messages and Format
P0P defines four message types: HELLO,
DATA, ALIVE, and GOODBYE.
All message encodings include a header. The header is filled
with binary values. The header bytes are the initial
bytes of the message. They look like this, with fields sent in order
from left to right:
In DATA messages, the header is followed by arbitrary data; the other messages do not have any data. The receiver can determine the amount of data that was sent by subtracting the known header length from the length of the UDP packet, something the language/package you use will provide some way of obtaining.
Only one P0P message may be sent in a single UDP packet, and all P0P messages must fit in a single UDP packet. P0P itself does not define either maximum or minimum DATA payload sizes. It expects that all reasonable implementations will accept data payloads that are considerably larger than a typical single line of typed input.
Server
The server sits waiting for input from the network or from stdin. Execution of the server ends when either end of file is detected on stdin or the input line is "q". When the server quits, it sends GOODBYE messages to all sessions (described shortly) thought to be currently active.
When a new packet arrives, the magic number and version are checked. The packet is silently discarded unless those fields match the expected values.
Next, the server examines the session id field. If a session with that id has already been established, it hands the packet to that session. Otherwise, it creates a new session and hands the packet to it.
Server Session
Server sessions operate according to the following FSA diagram:
Here transition labels are of the form event / action(s), meaning "when an event of the specified type occurs while in the source state, take the actions specified and transition to the destination state." For instance, the transition HELLO / HELLO; Set timer means that when a HELLO message arrives, reply with a HELLO message and also set a timer that will raise an event at some later time (unless it is canceled), and then transition to state Receive.
Sessions are created (by the server) after receiving a message with a new session id. The newly created session checks that this initial message is a HELLO, and terminates if not. Because the client may never send a GOODBYE, sessions garbage collect themselves by setting an inactivity timer. If no message is received from the client for too long, a GOODBYE is sent and the session terminates.
When a DATA message is received from the client, its data payload is printed to stdout. To give the client a way to determine that the session is still up, the session sends an ALIVE message in response to each DATA message it receives.
The server session should keep track of the client sequence number it expects next. That is, if it has just processed a packet with sequence number 10, it should remember that the next sequence number expected is 11. If the next packet received has a sequence number greater than the expected number, a "lost packet" message should be printed for each missing sequence number. If the next packet's sequence number repeats the last received packet number, a "duplicate packet" message is printed and the packet is discarded. If the next packet's sequence number is less than the last packet's sequence number, we assume it is caused by a protocol error and the session closes: it sends a GOODBYE and transitions to DONE. (Sequence numbers "from the past" can be caused by the Internet delivering packets "out of order." That can occur, and more realistic protocols would want to deal with it more gracefully.)
If the server session receives a message for which there is no transition in its current state, it is a protocol error. In that case, it closes. For instance, receiving a HELLO while in Receive is an indication something is seriously wrong, and the only option is to close.
Client Behavior
The client follows the following FSA:
Basically, the client sends lines of input to the server. If no packets were ever lost, the client would receive a packet back for every one it sends: a HELLO in response to a sent HELLO, an ALIVE in response to a DATA, and a GOODBYE in response to a GOODBYE. If no response is received within a timeout, the client takes that as an indication the server is not running, and so the client terminates. (Note that this is not a very good assumption as the problem could just be a single dropped packet. We'll look at how to do a better job of guessing whether or not the server is really there later in the course.)
Session termination normally starts with the client, and involves a GOODBYE message in each direction. However, if the client receives a GOODBYE, it believes that the server is gone, no matter what the client's current state, and so it transitions immediately to the Closed state.
The client shuts down when it detects end of file on stdin and the input is coming from a tty, or when the the input line is 'q' and input is from a tty. If stdin is connected to a file and eof is reached, the client should try to shut down after all outgoing messages have been put on the network. (Whether or not you can do that might depend on the implementation language/package you're using.)
Suppose you run two client instances, back to back, like this:
$ ./client localhost 1234 one two three eof $ ./client localhost 1234 foo bar eof
Here eof is printed by the client program to indicate that end of file has occurred on stdin. (On Linux, type ctrl-d.) The other lines are what the user typed.
The server's output should look like this:
$ ./server 1234 Waiting on port 1234... 0x736f0b1f [0] Session created 0x736f0b1f [1] one 0x736f0b1f [2] two 0x736f0b1f [3] three 0x736f0b1f [4] GOODBYE from client. 0x736f0b1f Session closed 0x545537a9 [0] Session created 0x545537a9 [1] foo 0x545537a9 [2] bar 0x545537a9 [3] GOODBYE from client. 0x545537a9 Session closed
The first value on each interesting line is the session id. The number in square brackets is the sequence number of the packet that caused this output line.
Redirecting stdin to read from a file makes it possible to offer so much input so fast that packets are lost. For example:
$ ./client localhost 1234 <Dostoyevsky.txt eof
produced this server output:
Waiting on port 1234... 0x149780c3 [0] Session created 0x149780c3 [1] The Project Gutenberg EBook of The Brothers Karamazov by Fyodor 0x149780c3 [2] Dostoyevsky 0x149780c3 [3] 0x149780c3 [4] 0x149780c3 [5] 0x149780c3 [6] This eBook is for the use of anyone anywhere at no cost and with almost no 0x149780c3 [7] restrictions whatsoever. You may copy it, give it away or re-use it under 0x149780c3 [8] the terms of the Project Gutenberg License included with this eBook or 0x149780c3 [9] online at http://www.gutenberg.org/license 0x149780c3 [10] 0x149780c3 [11] 0x149780c3 [12] 0x149780c3 [13] Title: The Brothers Karamazov 0x149780c3 [14] 0x149780c3 [15] Author: Fyodor Dostoyevsky 0x149780c3 [16] ...eliding many lines... 0x149780c3 [456] unhappy young woman, kept in terror from her childhood, fell into that 0x149780c3 [457] kind of nervous disease which is most frequently found in peasant women 0x149780c3 [458] who are said to be “possessed by devils.” At times after terrible fits of 0x149780c3 [459] hysterics she even lost her reason. Yet she bore Fyodor Pavlovitch two 0x149780c3 [460] sons, Ivan and Alexey, the eldest in the first year of marriage and the 0x149780c3 [461] Lost packet! 0x149780c3 [462] Lost packet! 0x149780c3 [463] Lost packet! 0x149780c3 [464] Lost packet! 0x149780c3 [465] Lost packet! 0x149780c3 [466] looked after by the same Grigory and lived in his cottage, where they were 0x149780c3 [467] Lost packet! 0x149780c3 [468] Lost packet! 0x149780c3 [469] Lost packet! ...eliding many lines...
Exactly which packets are dropped is non-deterministic. Your output should match the above, except for differences caused by the non-determinism.
You can cause two clients to be concurrently active with a single server using a shell script like this one:
#!/bin/bash ./client localhost 1234 <Dostoyevsky.txt >dual-c1.out 2>&1 & ./client localhost 1234 <Dostoyevsky.txt >dual-c2.out 2>&1 &
Start the server, redirecting its output to some file, and then execute the script.
Doing that should produce server output that shows the two clients are running concurrently, as in this example output file. (This output is also non-deterministic.)
For the testing component of grading, we will run on attu. You should verify that your program can be invoked there in the fashion described above, and that it runs successfully. You should verify that your submission works across the Internet by running client and server on different machines (e.g., attu1 and attu2).
Note that it is very hard to verify that your code will run for us on attu. It may be that it works for you, because you have some environment variable setting you code relies on, but fails for us, because we don't. We suggest testing by launching a shell that has a minimal environment and running there:
$ env -i bash # launch a shell with a minimalm environment $ ./server 1234 # now use that shell.. ... # ctrl-d to terminate the minimal environment shell
[This is perhaps outdated. It's likely we will switch to git turn in. However, we'll want the directory structure shown here even if we do.]
There should be only one submission per team. You should turn in a .tar.gz file that, when unpacked, creates the directory structure shown here:
A: Nat Guy B: John ZahorjanThose lines may be followed by any number of lines giving additional comments or information.
We'll do things like this:
$ tar xf guy-zahorjan-proj0.tar.gz $ cd guy-zahorjan-proj0 $ cat readme.txt $ cd A $ ./server 5432All those commands should succeed. (They won't if you don't follow the details of this section.)