CSE451 Spring 2008 Project #3

Out: May 1, 2008

Due: May 15, 2008 

Objectives 

In the first two projects you learned about some of the internals of the Windows Operating System.  Your third project, done in groups of two, will practice using threads and synchronization to write a multi-threaded multi-buffered copy file program.  The principal objectives of this project are: 

  1. To practice thinking and writing multi-threaded applications
  2. Design and implement a multi-buffering
  3. Working on a group project
  4. Look at performance aspects
 

Getting Started 

Unlike the previous projects this project involves writing a single user mode program Windows program in C.  You may use Visual Studio to do the development.  Located in O:\cse\courses\cse451\08sp\Project3 is a very simple skeleton code base to help you get started.

Things you will need to know how to do are: 

  1. Manipulate files with open, create, read, write, and close.
  2. Start, and synchronize threads
  3. Time your operation
 

Your assignment: 

You are doing an enhanced command line program to copy one or more files to another location. It should behave similar to the command line COPY program in Windows except with a different set of switches, slightly different syntax and other limitations as noted below.  The syntax for your program is: 

MTCOPY  [/T:number] [/B:number] [/V] <source>+ destination 

/T:number - Specifies the number of threads to use in the copy function.  By default value is 1. 

/B:number - Specifies the number of buffers to use in the copy function.  The default value is 1. 

/V – Verbose switch used to report the total time needed to copy the file(s).  It also prints the MB per second throughput. 

  source - Specifies one or more source files to be copied.  Unlike the regular COPY program you can specify multiple source files, and the last file name is treated as either the destination directory or destination file.  Your code does not need to handle wildcard name expansion, because there is already a procedure to handle wildcard expansion in main’s argc and argv arguments (Details forth coming).  For example, “MTCOPY *.c” can be made to expand to “MTCOPY first.c second.c third.c ...” before main is called. 

  destination - Specifies the directory and/or filename for the new file(s). 

Your program essentially does a set of read and writes of data between the source and destination.  Internally you should use 64KB buffers to copy the data.  Obviously if the file is smaller than 64KB the amount transferred will be less. 

MTCOPY is to utilize up to the specified number of threads and buffers to complete the task with as much parallelism and efficiency as possible.  The program should be smart enough to not use 10 threads and 20 buffers to copy a single two byte file.  But it might use all the threads and all the buffers to copy a single 1GB file or to copy 100 small files. 

The /V switch is used by MTCOPY to report its total throughput rate in terms of MB per second, and its read and write throughput rate. 

After completing MTCOPY your continued assignment is to analyze its performance against the standard windows COPY command using different thread and buffer counts.  See how both perform when copying single large files, and multiple smaller files.  For example, copy ws03esp1.vhd as the large file and maybe all of \windows\system32 directory for a set of small to medium sized files.  Try this using various number of threads and buffers.  In addition you should also try copy to and from the local hard-drives, flash drives, and network drives.  You will need to do a write-up reporting the results of this experiment. 

How you actually divide up and conquer the copy task between multiple threads and buffers is up to you.  Be creative, there are many good solutions.  For example, you might dedicate multiple threads to read and write data for a large file, and use a single thread to read and write data for smaller files.  You also might divide up the work along the lines of reader and writer threads.   

Windows allows you do to synchronous I/O (i.e., each thread blocks waiting for I/O to complete) and asynchronous I/O (i.e., a thread issues the I/O and is notified later the I/O completes).  For your program you may limit yourself to just synchronous I/O.  We are looking for parallelism and accuracy.  Accuracy will be easy to determine when we “diff” the files after they’re copied. 

Treat this assignment as if you are writing an actual copy utility, meaning that good error messages, program behavior, and diagnostics all contribute to the fit and finish of the program. 

Project limitations (aka Extra Credit): 

In the CMD copy command if you specify a directory as the source then files within the directory are copied to this destination.  For this project this is not necessary and you can limit the specified source to only files and not directories.  You program should report an error if the source is a directory.  But for extra credit you can implement shallow directory copies.  To do this you will need to enumerate the files in a directory.  

Another limitation (or weirdness) in CMD is that you can specify multiple source files being concatenated into one destination file.  Concatenation is straightforward for a single threaded application but for multiple threads it gets dicey.  Your program should report an error in this case or for extra credit you can implement concatenation. 

Turn-in: 

Be prepared to turn in the following 

  1. Executables images of your test program.
  2. Source code for your test program
  3. A write up listing the two students who worked on the project and summarizing your performance analysis.

You'll be submitting the source code, executables, and write-up to Catalyst.