CSE451 Spring 2008 Project #3
Out: May 1, 2008
Due: May 15, 2008
Objectives
In the first two projects you
learned about some of the internals of the Windows Operating System.
Your third project, done in groups of two, will practice using threads
and synchronization to write a multi-threaded multi-buffered copy file
program. The principal objectives of this project are:
Getting Started
Unlike the previous projects
this project involves writing a single user mode program Windows program
in C. You may use Visual Studio to do the development. Located
in O:\cse\courses\cse451\08sp
Things you will need to know
how to do are:
Your assignment:
You are doing an enhanced command
line program to copy one or more files to another location. It should
behave similar to the command line COPY program in Windows except with
a different set of switches, slightly different syntax and other limitations
as noted below. The syntax for your program is:
MTCOPY [/T:number] [/B:number]
[/V] <source>+ destination
/T:number - Specifies the number
of threads to use in the copy function. By default value is 1.
/B:number - Specifies the number
of buffers to use in the copy function. The default value is 1.
/V – Verbose switch used to
report the total time needed to copy the file(s). It also prints
the MB per second throughput.
source - Specifies one or
more source files to be copied. Unlike the regular COPY program
you can specify multiple source files, and the last file name is treated
as either the destination directory or destination file. Your
code does not need to handle wildcard name expansion, because there
is already a procedure to handle wildcard expansion in main’s argc
and argv arguments (Details forth coming). For example, “MTCOPY
*.c” can be made to expand to “MTCOPY first.c second.c third.c ...”
before main is called.
destination - Specifies
the directory and/or filename for the new file(s).
Your program essentially does
a set of read and writes of data between the source and destination.
Internally you should use 64KB buffers to copy the data. Obviously
if the file is smaller than 64KB the amount transferred will be less.
MTCOPY is to utilize up to
the specified number of threads and buffers to complete the task with
as much parallelism and efficiency as possible. The program should
be smart enough to not use 10 threads and 20 buffers to copy a single
two byte file. But it might use all the threads and all the buffers
to copy a single 1GB file or to copy 100 small files.
The /V switch is used by MTCOPY
to report its total throughput rate in terms of MB per second, and its
read and write throughput rate.
After completing MTCOPY your
continued assignment is to analyze its performance against the standard
windows COPY command using different thread and buffer counts.
See how both perform when copying single large files, and multiple smaller
files. For example, copy ws03esp1.vhd as the large file and maybe
all of \windows\system32 directory for a set of small to medium sized
files. Try this using various number of threads and buffers.
In addition you should also try copy to and from the local hard-drives,
flash drives, and network drives. You will need to do a write-up
reporting the results of this experiment.
How you actually divide up
and conquer the copy task between multiple threads and buffers is up
to you. Be creative, there are many good solutions. For
example, you might dedicate multiple threads to read and write data
for a large file, and use a single thread to read and write data for
smaller files. You also might divide up the work along the lines
of reader and writer threads.
Windows allows you do to synchronous
I/O (i.e., each thread blocks waiting for I/O to complete) and asynchronous
I/O (i.e., a thread issues the I/O and is notified later the I/O completes).
For your program you may limit yourself to just synchronous I/O.
We are looking for parallelism and accuracy. Accuracy will be
easy to determine when we “diff” the files after they’re copied.
Treat this assignment as if
you are writing an actual copy utility, meaning that good error messages,
program behavior, and diagnostics all contribute to the fit and finish
of the program.
Project limitations (aka
Extra Credit):
In the CMD copy command if
you specify a directory as the source then files within the directory
are copied to this destination. For this project this is not necessary
and you can limit the specified source to only files and not directories.
You program should report an error if the source is a directory.
But for extra credit you can implement shallow directory copies.
To do this you will need to enumerate the files in a directory.
Another limitation (or weirdness)
in CMD is that you can specify multiple source files being concatenated
into one destination file. Concatenation is straightforward for
a single threaded application but for multiple threads it gets dicey.
Your program should report an error in this case or for extra credit
you can implement concatenation.
Turn-in:
Be prepared to turn in the
following
You'll be submitting the source code, executables, and write-up to Catalyst.