CSE 451 Autumn 2000
|
Out: 9 October, 2000
Due: 19 October, 2000
- Highlight the special problems in writing multi-threaded programs (as contrasted with the single-threaded programs you're accustomed to).
- Employ some common synchronization mechanisms used to solve these problems
- In particular, experience race conditions and the potential for deadlock, and use locks, condition variables, and/or semaphores.
- Use multi-threading to simplify programming a responsive (asynchronous) user interface.
- Have some tangential exposure to sockets and client/server programming.
You will start from a working ("far as I can tell") version of a "chat monitor." The program connects to an ongoing, multi-person conversation. Each chatter in the conversation is typing messages for everyone to see. The chat monitor displays these messages as they are being typed. (It's a "chat monitor" because, unlike a chat program, you are not yourself a chatter - nothing you type will be displayed by anyone, nor will your program display anything typed by the person sitting next to you.)
The initial version of the chat monitor is single threaded. Basically, it does this:
- pick a chatter to listen to
- read the characters the chatter is typing, and display them on the screen
- when the chatter finishes with his/her current message, switch to a different chatter and repeat
Additionally, the program allows the viewer to scroll the output up the screen using the "down arrow" key. Because the program is single-threaded, there is only a certain point in its execution during which it attempts to read any typed keys. In the current implementation, this means that scrolling works only while the chat monitor is active reading and displaying characters typed by some chatter.
The assignment is to make the chat monitor multi-threaded in a way that allows the chat monitor's behavior to be changed so that:
- Characters are added to the displayed messages of all users as they are typed, not just some single user who has been picked as the next one to be displayed.
- Because chatters may type messages that are wider than the screen width, and because multiple, partial messages will be displayed on successive lines of the screen as they are updated, this means that you have to change the program so that lines can be inserted between existing lines on the screen. (Don't worry, there is a well-designed library to manage the screen, so this isn't all that hard, except for the race conditions and deadlock problems.)
- Scroll up commands will always be enabled (and so have immediate effect).
It wouldn't be useful to have a chat monitor program if no one was chatting. Worse, debugging would be quite difficult.
To address these problems, we built a "server" that produces synthetic chatters. Your "client," the chat monitor program, connects to the server when it starts. The server then sends it messages telling it to connect to individual (synthetic) chatters. The chat monitor connects to each of these chatters, reads and displays what they're typing, and disconnects when they're done.
We are running servers on both greer and baugh. There is code in the chat monitor program (hereafter called "the client") that will automatically locate and connect to one of these servers, so long as (a) you have a working net connection when you run the client, and (b) the servers haven't crashed. We will also supply the server code so that you can run a server on the same machine as your client, if you wish (e.g., you have no net connection, or all the servers have crashed). The client program already has code that will cause it to connect to a server on the same machine that it's running on, if there is one, in preference to making a connection to a remote server.
Here's what things look like. "Client A" and "Client B" are two of you, each running the client (chat monitor). "Server" is a server running on greer or baugh. "Bill," "Hillary," et al. are various synthetic chatters.
When your client program starts, it finds the server and tells it that it (the client program) exists. The server now sends messages telling the client to connect to chatters, as chatters arrive. The client does this, and the chatters start sending typed data (the thicker arrows), continuing to do that until they're done typing. In the multi-threaded version of your program it listens to all all chatters at once, and updates the screen for all of them each time it receives new data.
Communication between your client and the server, for announcement by the server of new chatters, takes place on one socket. Communication of data from a chatter to the client takes place on a second socket. (See the next section for more on sockets.)
There are things you must know, and things you don't need to know. The client code has been written in a way that attempts to minimize what you must know.
What you must know is that the client and server communicate using "sockets." Once a socket has been set up by each side, and the two sockets connected (something that happens in clientNetworking.c, and something you don't have to know how to do), the important thing is that you get data out of the connection using read() and/or can send it to the other side using write():
retcode = read (socketNumber, buf, bufLen);
retcode = write (socketNumber, buf, bufLen);socketNumber is technically an integer, but it doesn't really matter - the code to set up sockets, and a read() that uses it are already in the program you'll start from. buf is a void* pointer to the data you want to write, or the place you want the read data to be put. bufLen is the number of bytes to send, or the size of the buf for read (so the maximum number of bytes that should be read).
Both functions return the number of bytes moved (read or written).
For your client code, you are nearly exclusively concerned with read, as the clients don't say anything to the server. (The "Hi" in the figure above is implicit: the server knows the client is there because it knows when a client connects to it. No data is actually sent (as far as we're concerned).) [Note: This is a slight lie, as hinted at later. The lie is a detail, and irrelevant to what you have to do, though.]
You do not need to:
Write any new code to create sockets. (But, you will have to have more socket variables in your program, because you'll be talking to more than just one synthetic chatter at a time.)
Understand socket creation or connection.
Write any "write()" calls.
Write any "read()" calls, other than the ones already in the code.
Worry about communication errors, except to preserve the existing check for a return value of 0 from read, which means that the other end of the connection has shut down (the synthetic chatter is done for that socket, or the server has crashed for the other socket, and in either case all conversation over that socket is over).
Ever install the server code (unless the server is constantly crashing, in which case you might want to run your own copy).
Ever read the server code.
Ever fetch the server code.
I'm not saying it is perforce wrong to do those things, just that I don't see any reason right now why you will be required to be able to do them to complete this assignment.
You will need to:
Understand how to create threads and how to create and use synchronization variables. "man -k pthread | more" will list all new routines in the pthreads library, which provides both threads and synchronization variables. (There are more routines than you need - don't worry.)
Carefully analyze your program for race conditions and deadlock potential as you change it to multi-threading.
Realize that your program can be totally incorrect but still operate just fine for hours or days. Race conditions and deadlocks are both timing dependent; you can be lucky for hours or days, and never observe the problem in repeated runs. More than with the single threaded programs you've worked with in the past, convincing yourself that your program really works requires analyzing the program, not just running it.
Understand the communication protocol shown in the figure above.
Do a little reading about ncurses (via man ncurses), which is the library that let's you manipulate writing text on the screen.
Multi-threading and synchronization are intimately related. Your goal is to build a program that has enough synchronization, but not too much. Enough synchronization is what is required for the multi-threaded program to be correct; all race conditions must be addressed. Too much synchronization is synchronization that needless restricts the amount of concurrency that can be realized. This is especially important to performance on multiprocessors, but is relevant to uniprocessors as well.
An example of too much synchronization would be a single lock that all threads had to hold to do anything. (This is roughly how Linux was originally ported to run on multiprocessors. The latest version fixes this problem, though.) Handling race conditions is trivial, because only one thread is really running at a time.
Because this assignment deals with displaying characters typed at human speeds, it will be hard to notice performance impacts of synchronization decisions. But, the structure of the application is just like that of many, many other applications where these decisions could have a major effect, so pretend like this is one of those. Try, for the purposes of the assignment, to have the "finest-grained" synchronization possible for a correct program; i.e., build the program towards the end of the spectrum that no thread is precluded from making progress unless absolutely necessary. (Following this to the extreme might be hard. If you're uncertain about choosing between two or more alternatives, write them up and send them to one of the staff. (Note the "two or more" - don't just send "Is this okay?" because all we'll be able to say is "What else could you do?".)
One last thing: Your application should not saturate the CPU of the machine you run on. It's easy to write a thread that simply goes into a loop waiting for something ("busy waiting"). Such programs will run correctly, except perhaps for performance issues, because the scheduler will make sure that all threads get some allocation of CPU time, not just the thread in a loop. Just like "too much synchronization," though, you should think of this as an incorrect solution. This is particularly relevant to you in this assignment because: (a) you're starting with a single threaded version, and (b) in the single threaded version the code that checks for user keyboard input cannot block (because the entire program would hang if the user hadn't typed anything). So, if you do the simplest possible thing when converting to multi-threading, you'll end up with a busy waiting thread.
Unfortunately, figuring out what the race conditions are, and how to fix them, is the hard part of the assignment, and it won't operate correctly as a multi-threaded program until you do.
Despite that, my advice is to start by creating a thread to handle user input, and get that "working" without worrying about correctness - just see what it's like to create a thread, and work through those mechanics. The screen output is likely to be a mess, at least under some conditions, but fixing that (as well as creating more threads) is the second step.
First of all, this assignment is a user-level program precisely because it makes debugging easier (than if we were to step immediately into dealing with multi-threading in the kernel). So, use the available debugger, gdb.
Next, there are two special debugging aids built in to the software, one for displaying messages about what the client is doing, and one for the server.
Client Side Trace Messages
The client code is instrumented already with DEBUGPRINT(DEBUGFILE,...) and DEBUGFLUSH() calls. The former has syntax exactly like fprintf(), and prints a message. The latter is a "flush." Normally, output is buffered inside the system, and not necessary immediately written to its final destination. If you're trying to debug something that is crashing, this can be a problem, because the last thing you tried to write with DEBUGPRINT(DEBUGFILE,...) may or may not have been displayed before crashing, even though the DEBUGPRINT(DEBUGFILE,...) was executed. DEBUGFLUSH() executes an fflush(), which pauses execution until the output has been written. (So, is it clear why this is useful?)
The client side debugging info is not displayed on the screen, though, because the window you're using is already in use to display the chatters' messages. Instead, it's written to a log file named DEBUGLOG.
You can view the trace as the file is being written using the Unix command "tail -f DEBUGLOG".
You can, of course, add new DEBUGPRINT() and DEBUGFLUSH() statements to the code as you modify it.
Server Side Trace Messages
The server also produces a trace of status messages. But, it's running on another machine, so it can't just flash them up on your screen or write them to a file on your machine.
To see them, do the following:
Find out what server your client has connected to by examining DEBUGLOG.
Now execute this command, telnet <servername> 2452, where <servername> is the server your client is connecting to.
You will be prompted for a username. Type it in and hit <cr>. The username you should give is the username you're logged in as on the machine on which you're running the client.
Server trace messages appear in the telnet window.
Type ctrl-] and then "quit" to close the telnet window.
A couple more points:
The debug messages don't format correctly in a Windows telnet window. Telnet from a Unix machine.
The procedure above establishes the debugging window connection after your client has started. This means you'll miss the first few messages printed. If this becomes important, you can in fact do the telnet BEFORE you start your client - you just need to know to which server it will connect when it starts. Debug messages will start once your client connects.
That last idea sounds good, but... I purposely randomize the order in which the client looks for servers (for load balancing reasons). So, just because your client connected to greer.cs last time doesn't mean it will connect there next time you run it. To get around that, you can force the client to look for a server on exactly one machine by naming the machine as the (only) parameter on the command line starting the client (e.g., client greer.cs.washington.edu). Don't do this unless you're trying the "pre-connected server debug window," though, for load balancing reasons.
If more than one person runs a client from an account with the same username, behavior is "unpredictable." This means that if you all run as root on the SPL machines, no one will be happy trying to use the server trace window. (You have absolutely no reason to run as root for this assignment, though.)
N.B. In step 2 above, "greer" is not enough of a server name to avoid problems. greer (and similarly baughm) has two names: To the machines in the SPL it is known as greer.cse451.washington.edu. To the rest of the world, it is known as greer.cs.washington.edu. If you're running your client on greer, it's unknown to me how to predict which side of the fence that runs through the middle of greer your client will run on before it starts. (But, there is a server running on both sides, so your client will find one or the other.)
N.B. Only one instance of the server can be running on any one machine (where by "one machine" I mean one machine name, which is how greer/baughm can run two copies). If you try to start a server where one is already running, like on greer, it will (a) never come up, but (b) also never come down because the "automatic crash recovery" mechanism I built into the server will cause your copy to sit there trying to restart forever. ("What doesn't go up doesn't necessarily come down.")
The storage required for this assignment is small, so keep your files in your instructional space (i.e., NOT in the un-backed up space in /scratch/yourname on greer/baugh.)
You might want to work (compile and debug) on one of the SPL machines, though, especially if greer/baughm get busy. This is totally plausible - we're not hacking the kernel right now, so just boot the stable version of Linux on those machines. Copy your files from your instructional space to the local machine when you begin, and copy them back when you're done. (scp -r will recursively copy directories and their contents.)
Finally, a makefile is included with both the client code (which you must fetch) and the server code (which you needn't ever go near). Just say make. (Okay, it's not much of a makefile, as it recompiles everything, but I included it because to link successfully you need to include a couple of libraries, and there's no reason for you to spend hours figuring that out.)
The software is packaged in two tar files:
/cse/courses/cse451/00au/chatMonitorClientSide.tar
/cse/courses/cse451/00au/chatMonitorServerSide.tar
As stated over and over above, you don't need the server side code.
When you tar xvf chatMonitorClientSide.tar, it will expand into assign2/clientFiles/<a bunch of files>. Similarly for the server side tar file.
You should hand in a write-up of what you did, plus a printout of all files that you modified (significantly - use your judgement).
Tell us in your write-up:
What the high-level structure of your multi-threading is:
How many threads are there?
What does each do?
When are they created/destroyed?
What are (important) child-parent relationships, if any?
What are the critical sections in your program (and why are they critical sections)?
What synchronization you implemented? (What kind of synchronization mechanism you used? For semaphores and condition variables, under what conditions do threads block, and which thread is responsible for "waking up" any blocked threads? )
Argue that your program is correct. (This can be done informally, but try to be thorough.)
Describe any "incidental changes" you made in places we might not expect.
This is all due in section on 10/19/00.