CSE 326 - Winter 2003
Assignment 4 Hints and Clarifications
Clarification: You should NOT to output the compressed
file.
Correction: You need to encode spaces and the new line
symbols, in addition to 'a'-'z'. If you observe two symbols that end
the line (like, CR and LF), treat them as one "new line" symbol.
Hint: it might be easier to count the number of lines, rather
than the number of new line symbols. Assume that the count is the same
in either case.
Correction: Furthermore, the output produced by your
program must be in the same format as in the following example
(for an imaginary input file):
nl 10
space 110
a 11101
b 1111001
...
z 111101
compression: 23%
The first two lines report the codes for the new line symbol and
space, correspondingly. The reported compression rate should be rounded
to an integer percentage.
Extra credit opportunity Here are some extra credit options.
You can work on any subset of these for the corresponding credit:
- (10 Extra credits) Use a binomial queue instead of a heap for implementing the
priority queue. You should compare the performance between the two
types of queues for files of different sizes. You may need to run the
program on huge files (much larger then the example file) to see the
difference.
- (15 Extra credits) Write out the compressed file from the Huffman
code. Note that you will also have to store the Huffman code in the
file so that a decompresser can later decompress it.
- (15 Extra credits) Write out the decompresser program. The
program should read the file produced by the compressor and generate
back the original file.