CSE373 Data Structures Wi06, Homework 4, Parts 1 and 2

Programming problems Part 1 due online Thursday, 2/16/06, at 11 pm.
Written problems Part 1 due at the beginning of class, Friday, 2/17/06, 2:30 pm.

Programming problems Part 2 due online Thursday, 2/23/06, at 11 pm.
Written problems Part 2 due at the beginning of class, Friday, 2/24/06, 2:30 pm.
No late assignments will be accepted.

This assignment delves more into trees, covering balanced binary trees and applications. In particular, it works with real-world data, nutritional information for thousands of food items courtesy the USDA.

The assignment has many files provided:

As usual, your code should not only work properly, but it also should be high-quality. In particular:

What to do - Written Problems Part 1

Answer these questions on a separate sheet of paper. We suggest you do these problems early as part of your preparation for the programming part of the assignment.

  1. It is possible to insert a permutation of 1,2,3,4,5,6,7 into an initially empty splay tree such that, upon completion, the root is 1, its right child is 2, and so on -- a linear chain, until a leaf of 7. State the order of insertion and draw the steps along the way. (The inserts must be done in the style of the book and lecture, which may be different from online applets.)
  2. Suppose the numbers 1 to n are inserted into a splay tree in the fashion described above, resulting in a linear chain. In terms of big-Oh, how much total work was done? How much work is it now to access and splay the leaf n? (This is an example of how splay trees have good amortized performance.)
  3. Draw an AVL tree that requires three sets of rotations to balance following a delete. You should strive to find an example that is not needlessly complex. (The slides show an example requiring two sets of rotations, but it is not the simplest.) Mark the node to be deleted.
  4. Look at the code in BasicOrderedSubTreeSet.java. What is the big-Oh of size()? How could it be made faster? What changes would you have to make regarding access and state? (This was a case where I went for the slow easy implementation rather than the fast hard implementation.)
  5. Include your output from demo3 and demo4 along with a description of why your demo4 is cool.

What to do - Written Problems Part 2

Answer these questions on a separate sheet of paper. We suggest you do these problems early as part of your preparation for the programming part of the assignment.

  1. What does the code (node.parent.right == node) say about node? (This is a big hint to making a more efficient Splay tree)
  2. Draw a red-black tree that does not satisfy the AVL height property. List the properties that must hold for a tree to be a red-black tree and explain why the tree satisfies these properties. Show where the tree violates the AVL height property and explain why.
  3. Benchmarking (This question is worth more than the other questions, so spend some quality time here.)
    1. List your results from the benchmarking tests. Discuss the differences between trees that are roughly balanced and are unbalanced, both theoretically and according to your findings. Discuss the performance of the splay tree as compared to the vanilla BST for the insertions. Do your findings make sense?
    2. Discuss the performance of the splay tree as compared to the BST tree for the uniformly distributed comparisons. Why does the Splay tree perform better than the BST for the tree created from ascending-order data?
    3. Discuss the performance of the splay tree as compared to the BST tree for a smaller set of comparisons. Why does the Splay tree perform dramatically better?
  4. Book problems 9.5, 9.6

What to do - Programming Part 1

In this assignment, we'll get to play with some real data! But first we need to build some tools so we can do interesting things. In the process of building these tools, it's best to test/experiment with simple examples before using the full data. Then, once our tools are working, we'll try doing some interesting queries.

First, we're going to update our code from last week. It might sound like a lot of items, but they should be very little code once you understand what's going on.

  1. Correct anything broken with your add, contains, and size from last week. Change your BasicOrderedTreeSet to implement BasicOrderedSet instead of BasicSet. Get help early if you're having trouble. You will not do well on the assignment if this code is broken.
  2. Read about Comparators. Look up String's Comparator that ignores case and think about why it's useful. Add a state that's a Comparator. Add a constructor to BasicOrderedTreeSet that takes in as a parameter a Comparator. Write a method that returns the comparator for the set. For the no-parameter constructor, use the default comparator: private static class DefaultComparator implements Comparator { public int compare(Object a, Object b) { return ((Comparable) a).compareTo(b); } } public BasicOrderedTreeSet() { this(new DefaultComparator()); }
  3. Replace calls to compareTo with calls to the comparator's compare.
  4. Add a parent link to your TreeNode (or whatever you call it). For any TreeNode in the tree, it should point to its parent, except for the root node, whose parent link is null.
  5. Modify add to assign the parent link correctly. (You're encouraged update remove too, but we won't grade you on it.)
  6. Rewrite the iterator. Now that there are parent links, the only state you'll need is a single TreeNode. No stack. Your iterator's constructor should be O(height), .hasNext() should be O(1), and .next() O(height).
  7. Add a method subSet to the implementation that returns a BasicOrderedSet that is the subset within the specified range. Basically create a new BasicOrderedSubTreeSet with the proper parameters and return it. Read the Java SDK for more details on how subSet and subsetted sets work.

Now let's play!

Run the main in TreesAreFun.java. Look at the output and what the demo1 and demo2 methods are doing. Fill in the demo3 method. It should make a new BasicOrderedSet containing the subset of items from McDonald's. Then it should iterate that subset into a set with a different Comparator so that you can then extract the subset of those McDonald's items with between 100mg and 1000mg of cholesterol. It should then iterate through the final subset, printing out the name of the item, the number of calories, the fat content, and the amount of cholesterol.

Fill in the demo4 method that does something cool and finds something interesting in the data. Calories in alcohol? Lycopene content?

What to do - Programming Part 2

Now we're going to implement the Splay tree, which does self-balancing, and benchmark how it compares to the unbalanced tree in different situations. Make a new file BasicOrderedSplayTreeSet.java where the class BasicOrderedSplayTreeSet extends BasicOrderedTreeSet. (Please don't misspell/change the name of the class, because it makes your TAs grumpy.)
  1. Write a method that takes in a TreeNode (or whatever you call it) and splays the tree containing that node such that the node is now the root. (You'll likely need several helper methods.)
  2. Override add and contains to perform the appropriate splay after a contains and an add. You should not splay because of an iterator or calling subSet().
  3. Write a benchmarking program using sets of ComparisonCounters. Make an example of insertion of items in ascending order and random order (for example use the numbers 1 to 100000, but tweak the size so that it's not too fast nor too slow. you might need to go down to 10000 or less if you're getting stack overflow). Time how long it takes to do each in milliseconds and note for each how many times compareTo was called. Do the example on both the BasicOrderedTreeSet and the BasicOrderedSplayTreeSet.
  4. For all four cases, BST ascending, BST random, Splay ascending, Splay random, do a large number of random contains (try a million and tweak from there). The contains calls should be uniformly distributed over the entire range of items in the sets. Obtain timing results and compareTo counts.
  5. For all four cases, BST ascending, BST random, Splay ascending, Splay random, do a large number of contains. The contains calls should be chosen randomly from numbers that are multiples of 1000 (change this to 100 or 10000 as you feel appropriate so that between 5 and 20 items are selected repeatedly). Obtain timing results and compareTo counts. Turnin your code for all these tests.

What to turn in

Turn in your programming assignment online using the turnin form linked from the course web site (when available). You should include the Java source code for the interfaces, implementations, and testing code.

Turn in your Part 1 written questions at the beginning of class, Friday, 2/17.

Turn in your Part 2 written questions at the beginning of class, Wednesday, 2/22.