A8 - Huffman Coding

Due by Thursday 03/10 at 11:59 pm.

Note: This assignment is worth 30 points instead of the regular 20.

Specification Intro Video Submit Code on Ed

You may submit any part of the assignment as many times as you want before the initial submission. To submit on EdStem, you should use the Mark button to submit your code. You can view your past submissions using the “Submissions” button.

Please make sure you are familiar with the late work policy on the syllabus.

Developing at Home

You are welcome to use Ed as your environment to work on the homework, but we recommend setting up a local environment following our Desktop Software instructions. This will allow you to work offline, and access the great debugger provided by jGrasp! When you want to submit, upload your code to Ed and then press Mark to submit.

Useful Resources

Files

Below we include a table of files that you will find helpful for the assignment. On the left column is the uncompressed input files, in the middle are the compressed output files, and the right shows the debug output (the last prompt of the client program asks if you want to view debug output).

Warning

It does not work to copy/paste the contents of these files. Different operating systems and different browsers can affect copy/paste behavior. Instead, you should right click on the links below and click “Save as” or “Save Link as” (or equivalent).

Input File Compressed Files Debug Output
tiny.txt tiny.short, tiny.code tiny.debug
taxi.txt taxi.short, taxi.code taxi.debug
simple.txt simple.short, simple.code simple.debug
short.txt short.short, short.code short.debug
short256.txt short256.short, short256.code short256.debug
hamlet.txt hamlet.short, hamlet.code hamlet.debug

All files can also be downloaded here: a8.zip

Debugging

Remember that in jGRASP you can use a structure viewer to see what your tree looks like. You do so by dragging one of your fields from the debug window outside the window and jGRASP will launch a viewer. This viewer will show you the structure of the tree, but may not show you the contents of the nodes. You can fix this by selecting the wrench icon (“Configure the structure to view mapping”). Under “Value Expressions” say:

_node_.<field>

where <field> is the name of the field you want to view. You can also say

_node_.<field1>#_node_.<field2>

Helpful Diagrams

Your jGRASP debugger won’t look exactly like this, but the characters, frequencies, and positions of nodes in the tree should match.

Frequently Asked Questions (FAQ)

Q: I don’t understand what is going on in this assignment.

A: Take a look at the pictures in the assignment writeup and lecture slides. They explain how the priority queue works with the alogrithm we’ve given you. Be sure to also look over your section handouts for help on how to attach each node of the Huffman Tree to the new parent node.

Q: “My tree doesn’t get created correctly. How can I tell what’s going on?”

A: Try adding debugging code or use the structure viewer in jGRASP to view your Huffman Tree. See if your tree is adding in nodes in the correct manner (pull off two, make a new node and attach them to it, re-insert into the priority queue). Take a look at the writeup for how to set up jGRASP in order to Look at the HuffmanTree.

Add a println to the constructor of your node class that will report the character value for every leaf node that it constructs. Remember that nodes are supposed to be added to the PriorityQueue in increasing character order. If you see any values that are out of order, then you have a bug.

Be aware of the problem with r characters that occur as new lines in some text-editors. Avoid such characters by making sure to right-click the input text files when you save them, rather than doing select-all, copy-paste into your editor.

Q: “The contents of my priority queue don’t seem to be in sorted order. Why?”

A: A PriorityQueue’s toString behavior (as well as the result of using an iterator/foreach, or viewing the PQ in jGRASP) does not show the elements in their sorted order, so it might be confusing to use these methods of debugging. Try looking at the Lectures page for more information on how to deal with priority queues.

You can write some testing code that repeatedly calls remove on your PriorityQueue, printing the frequency of each node as it is removed from the PriorityQueue. You should see an increasing sequence of values (from low frequency to high).

Q: “How can I tell what bits are getting written to my compressed file?”

A: When prompted by HuffmanCompressor if you want to see the debug output, say “y”. This will print out the bits it is writing in the log of the output.

Q: “Why do I have some unexpected characters in my Huffman tree that were not in the sample output?”

A: Maybe you saved the input files improperly to your computer. Don’t select-all and copy/paste. Instead, right-click the link to each file and choose Save As.

Q: “My program works for most files, but when I try to decompress hamlet.txt, I get a StackOverflowError. Why?”

A: Your algorithm is nesting too many recursive calls. Once you are done making one recursive walk down the tree, you should let the call stack unwind rather than making another recursive call to get back to the top of the tree.

Q: “My program runs really slowly on large files like hamlet.txt. How can I speed it up?”

A: This can happen if you try to build up one huge string for the entire file and then print that string. Try to print out one character or one binary encoding at a time. Your program also might be slow because you’re running it on a slow disk drive such as a USB thumb drive.

Q: “My priority queue crashes with a ClassCastException: Comparable. Why?”

A: The HuffmanNode class must implement the Comparable interface in order to perform effectively.

Q: “What should the Comparable ordering be if the two nodes’ counts are equal?”

A: As the spec says, compare only the counts. If the counts are the same, consider the two nodes to be equal.

Q: “What should it do if the file to decompress is empty or only has 1 character?”

A: We will not test that case.

Q: “What is the default value for a char? What char value can I use to represent nothing, or the lack of a character?”

A: The default char value is ‘0‘, sometimes called the ‘null character’. (Not the same as null, the null object reference.) But it doesn’t really matter what character you use in a node where the character is meaningless. The program won’t examine that character anyway.